Spaces:

alrahrooh
/

cgt-llm-chatbot-v2

Runtime error

App Files Files Community

arahrooh commited on 12 days ago

Commit

c9adae0

1 Parent(s): 2fed471

Initial deployment: CGT-LLM-Beta RAG Chatbot

Browse files

Files changed (11) hide show

.gitignore +7 -0
README.md +53 -6
app.py +943 -0
bot.py +1777 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/data_level0.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/header.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/index_metadata.pickle +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/length.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/link_lists.bin +3 -0
chroma_db/chroma.sqlite3 +3 -0
requirements.txt +56 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+__pycache__/
+*.py[cod]
+*.log
+results/
+*.csv
+.DS_Store
+*.pyc

README.md CHANGED Viewed

@@ -1,12 +1,59 @@
 ---
-title: Cgt Llm Chatbot V2
-emoji: 📉
-colorFrom: green
-colorTo: red
 sdk: gradio
-sdk_version: 6.0.1
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CGT-LLM-Beta RAG Chatbot
+emoji: 🧬
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: mit
 ---
+# CGT-LLM-Beta: Genetic Counseling RAG Chatbot
+A Retrieval-Augmented Generation (RAG) chatbot for genetic counseling and cascade genetic testing questions.
+## Features
+- **Evidence-based answers** from medical literature
+- **Multiple education levels**: Middle School, High School, College, and Doctoral
+- **Source document citations** with full chunk text
+- **Similarity scoring** for transparency
+- **Flesch-Kincaid readability scores** for all answers
+- **Multiple LLM models** to choose from
+- **100+ example questions** for testing
+## How to Use
+1. **Select a model** from the dropdown (default: Llama-3.2-3B-Instruct)
+2. **Choose your education level** for personalized answers
+3. **Enter your question** or select from example questions
+4. **View the answer** with readability score, sources, and similarity scores
+## Education Levels
+- **Middle School**: Simplified version for ages 12-14
+- **High School**: Simplified version for ages 15-18
+- **College**: Professional version for undergraduate level
+- **Doctoral**: Advanced version for medical professionals
+## Models Available
+- Llama-3.2-3B-Instruct
+- Mistral-7B-Instruct-v0.2
+- Llama-4-Scout-17B-16E-Instruct
+- MediPhi-Instruct
+- MediPhi
+- Phi-4-reasoning
+## Important Notes
+⚠️ **This chatbot provides informational answers based on medical literature. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult with qualified healthcare providers for medical decisions.**
+## Technical Details
+- **Vector Database**: ChromaDB with sentence-transformers embeddings
+- **RAG System**: Retrieval-Augmented Generation with semantic search
+- **Source Attribution**: Full document tracking with chunk-level citations

app.py ADDED Viewed

	@@ -0,0 +1,943 @@

+"""
+Gradio Chatbot Interface for CGT-LLM-Beta RAG System
+This application provides a web interface for the RAG chatbot, allowing users to:
+- Select different LLM models from a dropdown
+- Choose education level for personalized answers (Middle School, High School, Professional, Improved)
+- View answers with Flesch-Kincaid grade level scores
+- See source documents and similarity scores for every answer
+Usage:
+    python app.py
+IMPORTANT: Before using, update the MODEL_MAP dictionary with correct HuggingFace paths
+for models that currently have placeholder paths (Llama-4-Scout, MediPhi, Phi-4-reasoning).
+For Hugging Face Spaces:
+    - Ensure vector database is built (run bot.py with indexing first)
+    - Model will be loaded on startup
+    - Access via the Gradio interface
+"""
+import gradio as gr
+import argparse
+import sys
+import os
+from typing import Tuple, Optional, List
+import logging
+import textstat
+import torch
+# Import from bot.py
+from bot import RAGBot, parse_args, Chunk
+# Set up logging first (before any logger usage)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# For Hugging Face Inference API
+try:
+    from huggingface_hub import InferenceClient
+    HF_INFERENCE_AVAILABLE = True
+except ImportError:
+    HF_INFERENCE_AVAILABLE = False
+    logger.warning("huggingface_hub not available, InferenceClient will not work")
+# Model mapping: short name -> full HuggingFace path
+MODEL_MAP = {
+    "Llama-3.2-3B-Instruct": "meta-llama/Llama-3.2-3B-Instruct",
+    "Mistral-7B-Instruct-v0.2": "mistralai/Mistral-7B-Instruct-v0.2",
+    "Llama-4-Scout-17B-16E-Instruct": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
+    "MediPhi-Instruct": "microsoft/MediPhi-Instruct",
+    "MediPhi": "microsoft/MediPhi",
+    "Phi-4-reasoning": "microsoft/Phi-4-reasoning",
+}
+# Education level mapping
+EDUCATION_LEVELS = {
+    "Middle School": "middle_school",
+    "High School": "high_school",
+    "College": "college",
+    "Doctoral": "doctoral"
+}
+# Example questions from the results CSV (hardcoded for easy access)
+EXAMPLE_QUESTIONS = [
+    "Can a BRCA2 variant skip a generation?",
+    "Can a PMS2 variant skip a generation?",
+    "Can an EPCAM/MSH2 variant skip a generation?",
+    "Can an MLH1 variant skip a generation?",
+    "Can an MSH2 variant skip a generation?",
+    "Can an MSH6 variant skip a generation?",
+    "Can I pass this MSH2 variant to my kids?",
+    "Can only women carry a BRCA inherited mutation?",
+    "Does GINA cover life or disability insurance?",
+    "Does having a BRCA1 mutation mean I will definitely have cancer?",
+    "Does having a BRCA2 mutation mean I will definitely have cancer?",
+    "Does having a PMS2 mutation mean I will definitely have cancer?",
+    "Does having an EPCAM/MSH2 mutation mean I will definitely have cancer?",
+    "Does having an MLH1 mutation mean I will definitely have cancer?",
+    "Does having an MSH2 mutation mean I will definitely have cancer?",
+    "Does having an MSH6 mutation mean I will definitely have cancer?",
+    "Does this BRCA1 genetic variant affect my cancer treatment?",
+    "Does this BRCA2 genetic variant affect my cancer treatment?",
+    "Does this EPCAM/MSH2 genetic variant affect my cancer treatment?",
+    "Does this MLH1 genetic variant affect my cancer treatment?",
+    "Does this MSH2 genetic variant affect my cancer treatment?",
+    "Does this MSH6 genetic variant affect my cancer treatment?",
+    "Does this PMS2 genetic variant affect my cancer treatment?",
+    "How can I cope with this diagnosis?",
+    "How can I get my kids tested?",
+    "How can I help others with my condition?",
+    "How might my genetic test results change over time?",
+    "I don't talk to my family/parents/sister/brother. How can I share this with them?",
+    "I have a BRCA pathogenic variant and I want to have children, what are my options?",
+    "Is genetic testing for my family members covered by insurance?",
+    "Is new research being done on my condition?",
+    "Is this BRCA1 variant something I inherited?",
+    "Is this BRCA2 variant something I inherited?",
+    "Is this EPCAM/MSH2 variant something I inherited?",
+    "Is this MLH1 variant something I inherited?",
+    "Is this MSH2 variant something I inherited?",
+    "Is this MSH6 variant something I inherited?",
+    "Is this PMS2 variant something I inherited?",
+    "My relative doesn't have insurance. What should they do?",
+    "People who test positive for a genetic mutation are they at risk of losing their health insurance?",
+    "Should I contact my male and female relatives?",
+    "Should my family members get tested?",
+    "What are the Risks and Benefits of Risk-Reducing Surgeries for Lynch Syndrome?",
+    "What are the recommendations for my family members if I have a BRCA1 mutation?",
+    "What are the recommendations for my family members if I have a BRCA2 mutation?",
+    "What are the recommendations for my family members if I have a PMS2 mutation?",
+    "What are the recommendations for my family members if I have an EPCAM/MSH2 mutation?",
+    "What are the recommendations for my family members if I have an MLH1 mutation?",
+    "What are the recommendations for my family members if I have an MSH2 mutation?",
+    "What are the recommendations for my family members if I have an MSH6 mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have a BRCA mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have an EPCAM/MSH2 mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have an MSH2 mutation?",
+    "What does a BRCA1 genetic variant mean for me?",
+    "What does a BRCA2 genetic variant mean for me?",
+    "What does a PMS2 genetic variant mean for me?",
+    "What does an EPCAM/MSH2 genetic variant mean for me?",
+    "What does an MLH1 genetic variant mean for me?",
+    "What does an MSH2 genetic variant mean for me?",
+    "What does an MSH6 genetic variant mean for me?",
+    "What if I feel overwhelmed?",
+    "What if I want to have children and have a hereditary cancer gene? What are my reproductive options?",
+    "What if a family member doesn't want to get tested?",
+    "What is Lynch Syndrome?",
+    "What is my cancer risk if I have BRCA1 Hereditary Breast and Ovarian Cancer syndrome?",
+    "What is my cancer risk if I have BRCA2 Hereditary Breast and Ovarian Cancer syndrome?",
+    "What is my cancer risk if I have MLH1 Lynch syndrome?",
+    "What is my cancer risk if I have MSH2 or EPCAM-associated Lynch syndrome?",
+    "What is my cancer risk if I have MSH6 Lynch syndrome?",
+    "What is my cancer risk if I have PMS2 Lynch syndrome?",
+    "What other resources are available to help me?",
+    "What screening tests do you recommend for BRCA1 carriers?",
+    "What screening tests do you recommend for BRCA2 carriers?",
+    "What screening tests do you recommend for EPCAM/MSH2 carriers?",
+    "What screening tests do you recommend for MLH1 carriers?",
+    "What screening tests do you recommend for MSH2 carriers?",
+    "What screening tests do you recommend for MSH6 carriers?",
+    "What screening tests do you recommend for PMS2 carriers?",
+    "What steps can I take to manage my cancer risk if I have Lynch syndrome?",
+    "What types of cancers am I at risk for with a BRCA1 mutation?",
+    "What types of cancers am I at risk for with a BRCA2 mutation?",
+    "What types of cancers am I at risk for with a PMS2 mutation?",
+    "What types of cancers am I at risk for with an EPCAM/MSH2 mutation?",
+    "What types of cancers am I at risk for with an MLH1 mutation?",
+    "What types of cancers am I at risk for with an MSH2 mutation?",
+    "What types of cancers am I at risk for with an MSH6 mutation?",
+    "Where can I find a genetic counselor?",
+    "Which of my relatives are at risk?",
+    "Who are my first-degree relatives?",
+    "Who do my family members call to have genetic testing?",
+    "Why do some families with Lynch syndrome have more cases of cancer than others?",
+    "Why should I share my BRCA1 genetic results with family?",
+    "Why should I share my BRCA2 genetic results with family?",
+    "Why should I share my EPCAM/MSH2 genetic results with family?",
+    "Why should I share my MLH1 genetic results with family?",
+    "Why should I share my MSH2 genetic results with family?",
+    "Why should I share my MSH6 genetic results with family?",
+    "Why should I share my PMS2 genetic results with family?",
+    "Why would my relatives want to know if they have this? What can they do about it?",
+    "Will my insurance cover testing for my parents/brother/sister?",
+    "Will this affect my health insurance?",
+]
+class InferenceAPIBot:
+    """Wrapper that uses Hugging Face Inference API instead of loading models locally"""
+    def __init__(self, bot: RAGBot, hf_token: str):
+        """Initialize with a RAGBot (for vector DB) and HF token for Inference API"""
+        self.bot = bot  # Use bot for vector DB and formatting
+        self.client = InferenceClient(api_key=hf_token)
+        self.current_model = bot.args.model
+        # Don't set args as attribute - access via bot.args instead
+        logger.info(f"InferenceAPIBot initialized with model: {self.current_model}")
+    @property
+    def args(self):
+        """Access args from the wrapped bot"""
+        return self.bot.args
+    def generate_answer(self, prompt: str, **kwargs) -> str:
+        """Generate answer using Inference API"""
+        try:
+            # Convert prompt to chat format
+            messages = [{"role": "user", "content": prompt}]
+            # Call Inference API
+            completion = self.client.chat.completions.create(
+                model=self.current_model,
+                messages=messages,
+                max_tokens=kwargs.get('max_new_tokens', 512),
+                temperature=kwargs.get('temperature', 0.2),
+                top_p=kwargs.get('top_p', 0.9),
+            )
+            answer = completion.choices[0].message.content
+            return answer
+        except Exception as e:
+            logger.error(f"Error calling Inference API: {e}", exc_info=True)
+            return f"Error generating answer: {str(e)}"
+    def enhance_readability(self, answer: str, target_level: str = "middle_school") -> Tuple[str, float]:
+        """Enhance readability using Inference API"""
+        try:
+            # Define prompts for different reading levels (same as bot.py)
+            if target_level == "middle_school":
+                level_description = "middle school reading level (ages 12-14, 6th-8th grade)"
+                instructions = """
+- Use simpler medical terms or explain them
+- Medium-length sentences
+- Clear, structured explanations
+- Keep important medical information accessible"""
+            elif target_level == "high_school":
+                level_description = "high school reading level (ages 15-18, 9th-12th grade)"
+                instructions = """
+- Use appropriate medical terminology with context
+- Varied sentence length
+- Comprehensive yet accessible explanations
+- Maintain technical accuracy while ensuring clarity"""
+            elif target_level == "college":
+                level_description = "college reading level (undergraduate level, ages 18-22)"
+                instructions = """
+- Use standard medical terminology with brief explanations
+- Professional and clear writing style
+- Include relevant clinical context
+- Maintain scientific accuracy and precision
+- Appropriate for undergraduate students in health sciences"""
+            elif target_level == "doctoral":
+                level_description = "doctoral/professional reading level (graduate level, medical professionals)"
+                instructions = """
+- Use advanced medical and scientific terminology
+- Include detailed clinical and research context
+- Reference specific mechanisms, pathways, and evidence
+- Provide comprehensive technical explanations
+- Appropriate for medical professionals, researchers, and graduate students
+- Include nuanced discussions of clinical implications and research findings"""
+            else:
+                raise ValueError(f"Unknown target_level: {target_level}")
+            # Create messages for chat API
+            system_message = f"""You are a helpful medical assistant who specializes in explaining complex medical information at appropriate reading levels. Rewrite the following medical answer for {level_description}:
+{instructions}
+- Keep the same important information but adapt the complexity
+- Provide context for technical terms
+- Ensure the answer is informative yet understandable"""
+            user_message = f"Please rewrite this medical answer for {level_description}:\n\n{answer}"
+            messages = [
+                {"role": "system", "content": system_message},
+                {"role": "user", "content": user_message}
+            ]
+            # Call Inference API
+            completion = self.client.chat.completions.create(
+                model=self.current_model,
+                messages=messages,
+                max_tokens=512 if target_level in ["college", "doctoral"] else 384,
+                temperature=0.4 if target_level in ["college", "doctoral"] else 0.3,
+            )
+            enhanced_answer = completion.choices[0].message.content
+            # Clean the answer (same as bot.py)
+            cleaned = self.bot._clean_readability_answer(enhanced_answer, target_level)
+            # Calculate Flesch score
+            try:
+                flesch_score = textstat.flesch_kincaid_grade(cleaned)
+            except:
+                flesch_score = 0.0
+            return cleaned, flesch_score
+        except Exception as e:
+            logger.error(f"Error enhancing readability: {e}", exc_info=True)
+            return answer, 0.0
+    # Delegate other methods to bot
+    def format_prompt(self, context_chunks: List[Chunk], question: str) -> str:
+        return self.bot.format_prompt(context_chunks, question)
+    def retrieve_with_scores(self, query: str, k: int) -> Tuple[List[Chunk], List[float]]:
+        return self.bot.retrieve_with_scores(query, k)
+    def _categorize_question(self, question: str) -> str:
+        return self.bot._categorize_question(question)
+    @property
+    def args(self):
+        return self.bot.args
+    @property
+    def vector_retriever(self):
+        return self.bot.vector_retriever
+class GradioRAGInterface:
+    """Wrapper class to integrate RAGBot with Gradio"""
+    def __init__(self, initial_bot: RAGBot, use_inference_api: bool = False):
+        # Check if we should use Inference API (on Spaces)
+        if use_inference_api and HF_INFERENCE_AVAILABLE:
+            hf_token = os.getenv("HF_TOKEN") or os.getenv("HUGGING_FACE_HUB_TOKEN")
+            if hf_token:
+                self.bot = InferenceAPIBot(initial_bot, hf_token)
+                self.use_inference_api = True
+                logger.info("Using Hugging Face Inference API")
+            else:
+                logger.warning("HF_TOKEN not found, falling back to local model")
+                self.bot = initial_bot
+                self.use_inference_api = False
+        else:
+            self.bot = initial_bot
+            self.use_inference_api = False
+        # Get current model from bot args (not a direct attribute)
+        self.current_model = self.bot.args.model if hasattr(self.bot, 'args') else getattr(self.bot, 'current_model', None)
+        if self.current_model is None and hasattr(self.bot, 'bot'):
+            # If using InferenceAPIBot, get from the wrapped bot
+            self.current_model = self.bot.bot.args.model
+        self.data_dir = initial_bot.args.data_dir
+        logger.info("GradioRAGInterface initialized")
+    def _find_file_path(self, filename: str) -> str:
+        """Find the full file path for a given filename"""
+        from pathlib import Path
+        data_path = Path(self.data_dir)
+        if not data_path.exists():
+            return ""
+        # Search for the file recursively
+        for file_path in data_path.rglob(filename):
+            return str(file_path)
+        return ""
+    def reload_model(self, model_short_name: str) -> str:
+        """Reload the model when user selects a different one"""
+        if model_short_name not in MODEL_MAP:
+            return f"Error: Unknown model '{model_short_name}'"
+        new_model_path = MODEL_MAP[model_short_name]
+        # If same model, no need to reload
+        if new_model_path == self.current_model:
+            return f"Model already loaded: {model_short_name}"
+        try:
+            logger.info(f"Switching model from {self.current_model} to {new_model_path}")
+            if self.use_inference_api:
+                # For Inference API, just update the model name
+                self.bot.current_model = new_model_path
+                self.current_model = new_model_path
+                return f"✓ Model switched to: {model_short_name} (using Inference API)"
+            else:
+                # For local model, reload it
+                # Update args
+                self.bot.args.model = new_model_path
+                # Clear old model from memory
+                if hasattr(self.bot, 'model') and self.bot.model is not None:
+                    del self.bot.model
+                    del self.bot.tokenizer
+                    torch.cuda.empty_cache() if torch.cuda.is_available() else None
+                # Load new model
+                self.bot._load_model()
+                self.current_model = new_model_path
+                return f"✓ Model loaded: {model_short_name}"
+        except Exception as e:
+            logger.error(f"Error reloading model: {e}", exc_info=True)
+            return f"✗ Error loading model: {str(e)}"
+    def process_question(
+        self,
+        question: str,
+        model_name: str,
+        education_level: str,
+        k: int,
+        temperature: float,
+        max_tokens: int
+    ) -> Tuple[str, str, str, str, str]:
+        """
+        Process a single question and return formatted results
+        Returns:
+            Tuple of (answer, flesch_score, sources, similarity_scores, question_category)
+        """
+        import time
+        if not question or not question.strip():
+            return "Please enter a question.", "N/A", "", "", ""
+        try:
+            start_time = time.time()
+            logger.info(f"Processing question: {question[:50]}...")
+            # Reload model if changed (this can take 1-3 minutes)
+            if model_name in MODEL_MAP:
+                model_path = MODEL_MAP[model_name]
+                if model_path != self.current_model:
+                    logger.info(f"Model changed, reloading from {self.current_model} to {model_path}")
+                    reload_status = self.reload_model(model_name)
+                    if reload_status.startswith("✗"):
+                        return f"Error: {reload_status}", "N/A", "", "", ""
+                    logger.info(f"Model reloaded in {time.time() - start_time:.1f}s")
+            # Update bot args for this query
+            self.bot.args.k = k
+            self.bot.args.temperature = temperature
+            # Limit max_tokens for faster generation in Gradio
+            self.bot.args.max_new_tokens = min(max_tokens, 512)  # Cap at 512 for faster responses
+            # Categorize question
+            logger.info("Categorizing question...")
+            question_group = self.bot._categorize_question(question)
+            # Retrieve relevant chunks with similarity scores
+            logger.info("Retrieving relevant documents...")
+            retrieve_start = time.time()
+            context_chunks, similarity_scores = self.bot.retrieve_with_scores(question, k)
+            logger.info(f"Retrieved {len(context_chunks)} chunks in {time.time() - retrieve_start:.2f}s")
+            if not context_chunks:
+                return (
+                    "I don't have enough information to answer this question. Please try rephrasing or asking about a different topic.",
+                    "N/A",
+                    "No sources found",
+                    "No matches found",
+                    question_group
+                )
+            # Format similarity scores
+            similarity_scores_str = ", ".join([f"{score:.3f}" for score in similarity_scores])
+            # Format sources with chunk text and file paths
+            sources_list = []
+            for i, (chunk, score) in enumerate(zip(context_chunks, similarity_scores)):
+                # Try to find the file path
+                file_path = self._find_file_path(chunk.filename)
+                source_info = f"""
+{'='*80}
+SOURCE {i+1} | Similarity: {score:.3f}
+{'='*80}
+📄 File: {chunk.filename}
+📍 Path: {file_path if file_path else 'File path not found (search in Data Resources directory)'}
+📊 Chunk: {chunk.chunk_id + 1}/{chunk.total_chunks} (Position: {chunk.start_pos}-{chunk.end_pos})
+📝 Full Chunk Text:
+{chunk.text}
+"""
+                sources_list.append(source_info)
+            sources = "\n".join(sources_list)
+            # Generation kwargs
+            gen_kwargs = {
+                'max_new_tokens': min(max_tokens, 512),  # Cap for faster responses
+                'temperature': temperature,
+                'top_p': self.bot.args.top_p,
+                'repetition_penalty': self.bot.args.repetition_penalty
+            }
+            # Generate answer based on education level
+            answer = ""
+            flesch_score = 0.0
+            # Generate original answer first (needed for all enhancement levels)
+            logger.info("Generating original answer...")
+            gen_start = time.time()
+            prompt = self.bot.format_prompt(context_chunks, question)
+            original_answer = self.bot.generate_answer(prompt, **gen_kwargs)
+            logger.info(f"Original answer generated in {time.time() - gen_start:.1f}s")
+            # Enhance based on education level
+            logger.info(f"Enhancing answer for {education_level} level...")
+            enhance_start = time.time()
+            if education_level == "middle_school":
+                # Simplify to middle school level
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="middle_school")
+            elif education_level == "high_school":
+                # Simplify to high school level
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="high_school")
+            elif education_level == "college":
+                # Enhance to college level
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="college")
+            elif education_level == "doctoral":
+                # Enhance to doctoral/professional level
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="doctoral")
+            else:
+                answer = "Invalid education level selected."
+                flesch_score = 0.0
+            logger.info(f"Answer enhanced in {time.time() - enhance_start:.1f}s")
+            total_time = time.time() - start_time
+            logger.info(f"Total processing time: {total_time:.1f}s")
+            # Clean the answer - remove special tokens and formatting
+            import re
+            cleaned_answer = answer
+            # Remove special tokens (case-insensitive)
+            special_tokens = [
+                "<|end|>",
+                "<|endoftext|>",
+                "<|end_of_text|>",
+                "<|eot_id|>",
+                "<|start_header_id|>",
+                "<|end_header_id|>",
+                "<|assistant|>",
+                "<|endoftext|>",
+                "<|end_of_text|>",
+            ]
+            for token in special_tokens:
+                # Remove case-insensitive
+                cleaned_answer = re.sub(re.escape(token), '', cleaned_answer, flags=re.IGNORECASE)
+            # Remove any remaining special token patterns like <|...|>
+            cleaned_answer = re.sub(r'<\|[^|]+\|>', '', cleaned_answer)
+            # Remove any markdown-style headers that might have been added
+            cleaned_answer = re.sub(r'^\*\*.*?\*\*.*?\n', '', cleaned_answer, flags=re.MULTILINE)
+            # Clean up extra whitespace and newlines
+            cleaned_answer = re.sub(r'\n\s*\n\s*\n+', '\n\n', cleaned_answer)  # Multiple newlines to double
+            cleaned_answer = re.sub(r'^\s+|\s+$', '', cleaned_answer, flags=re.MULTILINE)  # Trim lines
+            cleaned_answer = cleaned_answer.strip()
+            # Return just the clean answer (no headers or metadata)
+            return (
+                cleaned_answer,
+                f"{flesch_score:.1f}",
+                sources,
+                similarity_scores_str,
+                question_group  # Add question category as 5th return value
+            )
+        except Exception as e:
+            logger.error(f"Error processing question: {e}", exc_info=True)
+            return (
+                f"An error occurred while processing your question: {str(e)}",
+                "N/A",
+                "",
+                "",
+                "Error"
+            )
+def create_interface(initial_bot: RAGBot, use_inference_api: bool = False) -> gr.Blocks:
+    """Create and configure the Gradio interface"""
+    # Use Inference API on Spaces, local model otherwise
+    if use_inference_api is None:
+        use_inference_api = os.getenv("SPACE_ID") is not None or os.getenv("SYSTEM") == "spaces"
+    interface = GradioRAGInterface(initial_bot, use_inference_api=use_inference_api)
+    # Get initial model name from bot
+    initial_model_short = None
+    for short_name, full_path in MODEL_MAP.items():
+        if full_path == initial_bot.args.model:
+            initial_model_short = short_name
+            break
+    if initial_model_short is None:
+        initial_model_short = list(MODEL_MAP.keys())[0]
+    with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+        gr.Markdown("""
+        # 🧬 CGT-LLM-Beta: Genetic Counseling RAG Chatbot
+        Ask questions about genetic counseling, cascade genetic testing, hereditary cancer syndromes, and related topics.
+        The chatbot uses a Retrieval-Augmented Generation (RAG) system to provide evidence-based answers from medical literature.
+        """)
+        with gr.Row():
+            with gr.Column(scale=2):
+                question_input = gr.Textbox(
+                    label="Your Question",
+                    placeholder="e.g., What is Lynch Syndrome? What screening is recommended for BRCA1 carriers?",
+                    lines=3
+                )
+                with gr.Row():
+                    model_dropdown = gr.Dropdown(
+                        choices=list(MODEL_MAP.keys()),
+                        value=initial_model_short,
+                        label="Select Model",
+                        info="Choose which LLM model to use for generating answers"
+                    )
+                    education_dropdown = gr.Dropdown(
+                        choices=list(EDUCATION_LEVELS.keys()),
+                        value=list(EDUCATION_LEVELS.keys())[0],
+                        label="Education Level",
+                        info="Select your education level for personalized answers"
+                    )
+                with gr.Accordion("Advanced Settings", open=False):
+                    k_slider = gr.Slider(
+                        minimum=1,
+                        maximum=10,
+                        value=5,
+                        step=1,
+                        label="Number of document chunks to retrieve (k)"
+                    )
+                    temperature_slider = gr.Slider(
+                        minimum=0.1,
+                        maximum=1.0,
+                        value=0.2,
+                        step=0.1,
+                        label="Temperature (lower = more focused)"
+                    )
+                    max_tokens_slider = gr.Slider(
+                        minimum=128,
+                        maximum=1024,
+                        value=512,
+                        step=128,
+                        label="Max Tokens (lower = faster responses)"
+                    )
+                submit_btn = gr.Button("Ask Question", variant="primary", size="lg")
+            with gr.Column(scale=3):
+                answer_output = gr.Textbox(
+                    label="Answer",
+                    lines=20,
+                    interactive=False,
+                    elem_classes=["answer-box"]
+                )
+                with gr.Row():
+                    flesch_output = gr.Textbox(
+                        label="Flesch-Kincaid Grade Level",
+                        value="N/A",
+                        interactive=False,
+                        scale=1
+                    )
+                    similarity_output = gr.Textbox(
+                        label="Similarity Scores",
+                        value="",
+                        interactive=False,
+                        scale=1
+                    )
+                    category_output = gr.Textbox(
+                        label="Question Category",
+                        value="",
+                        interactive=False,
+                        scale=1
+                    )
+                sources_output = gr.Textbox(
+                    label="Source Documents (with Chunk Text)",
+                    lines=15,
+                    interactive=False,
+                    info="Shows the retrieved document chunks with full text. File paths are shown for easy access."
+                )
+        # Example questions - all questions from the results CSV (scrollable)
+        gr.Markdown("### 💡 Example Questions")
+        gr.Markdown(f"Select a question below to use it in the chatbot ({len(EXAMPLE_QUESTIONS)} questions - scrollable dropdown):")
+        # Use Dropdown which is naturally scrollable with many options
+        example_questions_dropdown = gr.Dropdown(
+            choices=EXAMPLE_QUESTIONS,
+            label="Example Questions",
+            value=None,
+            info="Open the dropdown and scroll through all questions. Select one to use it.",
+            interactive=True,
+            container=True,
+            scale=1
+        )
+        # Update question input when dropdown selection changes
+        def update_question_from_dropdown(selected_question):
+            return selected_question if selected_question else ""
+        example_questions_dropdown.change(
+            fn=update_question_from_dropdown,
+            inputs=example_questions_dropdown,
+            outputs=question_input
+        )
+        # Footer
+        gr.Markdown("""
+        ---
+        **Note:** This chatbot provides informational answers based on medical literature.
+        It is not a substitute for professional medical advice, diagnosis, or treatment.
+        Always consult with qualified healthcare providers for medical decisions.
+        """)
+        # Connect the submit button
+        def process_with_education_level(question, model, education, k, temp, max_tok):
+            education_key = EDUCATION_LEVELS[education]
+            return interface.process_question(question, model, education_key, k, temp, max_tok)
+        submit_btn.click(
+            fn=process_with_education_level,
+            inputs=[
+                question_input,
+                model_dropdown,
+                education_dropdown,
+                k_slider,
+                temperature_slider,
+                max_tokens_slider
+            ],
+            outputs=[
+                answer_output,
+                flesch_output,
+                sources_output,
+                similarity_output,
+                category_output
+            ]
+        )
+        # Also allow Enter key to submit
+        question_input.submit(
+            fn=process_with_education_level,
+            inputs=[
+                question_input,
+                model_dropdown,
+                education_dropdown,
+                k_slider,
+                temperature_slider,
+                max_tokens_slider
+            ],
+            outputs=[
+                answer_output,
+                flesch_output,
+                sources_output,
+                similarity_output,
+                category_output
+            ]
+        )
+    return demo
+def main():
+    """Main function to launch the Gradio app"""
+    # Parse arguments with defaults suitable for Gradio
+    parser = argparse.ArgumentParser(description="Gradio Interface for CGT-LLM-Beta RAG Chatbot")
+    # Model and database settings
+    parser.add_argument('--model', type=str, default='meta-llama/Llama-3.2-3B-Instruct',
+                       help='HuggingFace model name')
+    parser.add_argument('--vector-db-dir', default='./chroma_db',
+                       help='Directory for ChromaDB persistence')
+    parser.add_argument('--data-dir', default='./Data Resources',
+                       help='Directory containing documents (for indexing if needed)')
+    # Generation parameters
+    parser.add_argument('--max-new-tokens', type=int, default=1024,
+                       help='Maximum new tokens to generate')
+    parser.add_argument('--temperature', type=float, default=0.2,
+                       help='Generation temperature')
+    parser.add_argument('--top-p', type=float, default=0.9,
+                       help='Top-p sampling parameter')
+    parser.add_argument('--repetition-penalty', type=float, default=1.1,
+                       help='Repetition penalty')
+    # Retrieval parameters
+    parser.add_argument('--k', type=int, default=5,
+                       help='Number of chunks to retrieve per question')
+    # Other settings
+    parser.add_argument('--skip-indexing', action='store_true',
+                       help='Skip document indexing (use existing vector DB)')
+    parser.add_argument('--verbose', action='store_true',
+                       help='Enable verbose logging')
+    parser.add_argument('--share', action='store_true',
+                       help='Create a public Gradio share link')
+    parser.add_argument('--server-name', type=str, default='127.0.0.1',
+                       help='Server name (0.0.0.0 for public access)')
+    parser.add_argument('--server-port', type=int, default=7860,
+                       help='Server port')
+    args = parser.parse_args()
+    # Set logging level
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+    logger.info("Initializing RAGBot for Gradio interface...")
+    logger.info(f"Model: {args.model}")
+    logger.info(f"Vector DB: {args.vector_db_dir}")
+    try:
+        # Initialize bot
+        bot = RAGBot(args)
+        # Check if vector database exists and has documents
+        collection_stats = bot.vector_retriever.get_collection_stats()
+        if collection_stats.get('total_chunks', 0) == 0:
+            logger.warning("Vector database is empty. You may need to run indexing first:")
+            logger.warning("  python bot.py --data-dir './Data Resources' --vector-db-dir './chroma_db'")
+            logger.warning("Continuing anyway - the chatbot will work but may not find relevant documents.")
+        # Create and launch Gradio interface
+        demo = create_interface(bot)
+        # For local use, launch it
+        # (On Spaces, the demo is already created at module level)
+        logger.info(f"Launching Gradio interface on http://{args.server_name}:{args.server_port}")
+        demo.launch(
+            server_name=args.server_name,
+            server_port=args.server_port,
+            share=args.share
+        )
+    except KeyboardInterrupt:
+        logger.info("Interrupted by user")
+        sys.exit(0)
+    except Exception as e:
+        logger.error(f"Error launching Gradio app: {e}", exc_info=True)
+        sys.exit(1)
+# For Hugging Face Spaces: create demo at module level
+# Following the HF Spaces pattern: create the Gradio app directly at module level
+# Spaces will import this module and look for a Gradio Blocks/Interface object
+# Pattern: demo = gr.Interface(...) or demo = gr.Blocks(...)
+# DO NOT call demo.launch() - Spaces handles that automatically
+# Check if we're on Spaces (be more permissive - check multiple env vars)
+IS_SPACES = (
+    os.getenv("SPACE_ID") is not None or
+    os.getenv("SYSTEM") == "spaces" or
+    os.getenv("HF_SPACE_ID") is not None
+)
+# Create demo at module level (like HF docs example)
+# Initialize demo variable to None first (safety measure)
+demo = None
+# Create demo at module level (like HF docs example)
+# This ensures Spaces can always find it when importing the module
+try:
+    if IS_SPACES:
+        logger.info("Initializing for Hugging Face Spaces...")
+    else:
+        logger.info("Initializing for local execution...")
+    # Initialize with default args
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model', type=str, default='meta-llama/Llama-3.2-3B-Instruct')
+    parser.add_argument('--vector-db-dir', default='./chroma_db')
+    parser.add_argument('--data-dir', default='./Data Resources')
+    parser.add_argument('--max-new-tokens', type=int, default=1024)
+    parser.add_argument('--temperature', type=float, default=0.2)
+    parser.add_argument('--top-p', type=float, default=0.9)
+    parser.add_argument('--repetition-penalty', type=float, default=1.1)
+    parser.add_argument('--k', type=int, default=5)
+    parser.add_argument('--skip-indexing', action='store_true', default=True)
+    parser.add_argument('--verbose', action='store_true', default=False)
+    parser.add_argument('--share', action='store_true', default=False)
+    parser.add_argument('--server-name', type=str, default='0.0.0.0')
+    parser.add_argument('--server-port', type=int, default=7860)
+    parser.add_argument('--seed', type=int, default=42)
+    args = parser.parse_args([])  # Empty args
+    args.skip_model_loading = IS_SPACES  # Skip model loading on Spaces, use Inference API
+    # Create bot - handle initialization errors gracefully
+    try:
+        bot = RAGBot(args)
+        if bot.vector_retriever is None:
+            raise Exception("Vector database not available")
+        # Check if vector database has documents
+        collection_stats = bot.vector_retriever.get_collection_stats()
+        if collection_stats.get('total_chunks', 0) == 0:
+            logger.warning("Vector database is empty. The chatbot may not find relevant documents.")
+            logger.warning("This is OK for initial deployment - documents can be indexed later.")
+        # Create the demo interface directly at module level (like HF docs example)
+        demo = create_interface(bot, use_inference_api=IS_SPACES)
+    except Exception as bot_error:
+        logger.error(f"Error initializing RAGBot: {bot_error}", exc_info=True)
+        # Create a demo that shows the error but still allows the interface to load
+        with gr.Blocks() as demo:
+            gr.Markdown(f"""
+            # ⚠️ Initialization Error
+            The chatbot encountered an error during initialization:
+            **Error:** {str(bot_error)}
+            This might be due to:
+            - Missing vector database (chroma_db directory)
+            - Missing dependencies
+            - Configuration issues
+            Please check the logs for more details.
+            """)
+        raise  # Re-raise to be caught by outer try/except
+    logger.info(f"Demo created successfully: {type(demo)}")
+    # Explicitly verify it's a valid Gradio object
+    if not isinstance(demo, (gr.Blocks, gr.Interface)):
+        raise TypeError(f"Demo is not a valid Gradio object: {type(demo)}")
+    logger.info("Demo validation passed - ready for Spaces")
+except Exception as e:
+    logger.error(f"Error creating demo: {e}", exc_info=True)
+    import traceback
+    logger.error(f"Traceback: {traceback.format_exc()}")
+    # Create a fallback error demo so Spaces doesn't show blank
+    with gr.Blocks() as demo:
+        gr.Markdown(f"# Error Initializing Chatbot\n\nAn error occurred while initializing the chatbot.\n\nError: {str(e)}\n\nPlease check the logs for details.")
+    logger.info(f"Error demo created: {type(demo)}")
+# Final verification - ensure demo exists and is valid
+if demo is None:
+    logger.error("CRITICAL: Demo variable is None!")
+    with gr.Blocks() as demo:
+        gr.Markdown("# Error: Demo was not created properly\n\nPlease check the logs for details.")
+elif not isinstance(demo, (gr.Blocks, gr.Interface)):
+    logger.error(f"CRITICAL: Demo is not a valid Gradio object: {type(demo)}")
+    with gr.Blocks() as demo:
+        gr.Markdown(f"# Error: Invalid demo type\n\nDemo type: {type(demo)}\n\nPlease check the logs for details.")
+else:
+    logger.info(f"✅ Final demo check passed: demo type={type(demo)}")
+    # Explicit print to ensure demo is accessible (Spaces might check this)
+    print(f"DEMO_VARIABLE_SET: {type(demo)}")
+# For local execution only (not on Spaces)
+if __name__ == "__main__":
+    if not IS_SPACES:
+        main()

bot.py ADDED Viewed

	@@ -0,0 +1,1777 @@

+#!/usr/bin/env python3
+"""
+RAG Chatbot Implementation for CGT-LLM-Beta with Vector Database
+Production-ready local RAG system with ChromaDB and MPS acceleration for Apple Silicon
+"""
+import argparse
+import csv
+import json
+import logging
+import os
+import re
+import sys
+import time
+import hashlib
+from pathlib import Path
+from typing import List, Tuple, Dict, Any, Optional, Union
+from dataclasses import dataclass
+from collections import defaultdict
+import textstat
+import torch
+import numpy as np
+import pandas as pd
+from tqdm import tqdm
+# Optional imports with graceful fallbacks
+try:
+    import chromadb
+    from chromadb.config import Settings
+    CHROMADB_AVAILABLE = True
+except ImportError:
+    CHROMADB_AVAILABLE = False
+    print("Warning: chromadb not available. Install with: pip install chromadb")
+try:
+    from sentence_transformers import SentenceTransformer
+    SENTENCE_TRANSFORMERS_AVAILABLE = True
+except ImportError:
+    SENTENCE_TRANSFORMERS_AVAILABLE = False
+    print("Warning: sentence-transformers not available. Install with: pip install sentence-transformers")
+try:
+    import pypdf
+    PDF_AVAILABLE = True
+except ImportError:
+    PDF_AVAILABLE = False
+    print("Warning: pypdf not available. PDF files will be skipped.")
+try:
+    from docx import Document
+    DOCX_AVAILABLE = True
+except ImportError:
+    DOCX_AVAILABLE = False
+    print("Warning: python-docx not available. DOCX files will be skipped.")
+try:
+    from rank_bm25 import BM25Okapi
+    BM25_AVAILABLE = True
+except ImportError:
+    BM25_AVAILABLE = False
+    print("Warning: rank-bm25 not available. BM25 retrieval disabled.")
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.StreamHandler(),
+        logging.FileHandler('rag_bot.log')
+    ]
+)
+logger = logging.getLogger(__name__)
+@dataclass
+class Document:
+    """Represents a document with metadata"""
+    filename: str
+    content: str
+    filepath: str
+    file_type: str
+    chunk_count: int = 0
+    file_hash: str = ""
+@dataclass
+class Chunk:
+    """Represents a text chunk with metadata"""
+    text: str
+    filename: str
+    chunk_id: int
+    total_chunks: int
+    start_pos: int
+    end_pos: int
+    metadata: Dict[str, Any]
+    chunk_hash: str = ""
+class VectorRetriever:
+    """ChromaDB-based vector retrieval"""
+    def __init__(self, collection_name: str = "cgt_documents", persist_directory: str = "./chroma_db"):
+        if not CHROMADB_AVAILABLE:
+            raise ImportError("ChromaDB is required for vector retrieval")
+        self.collection_name = collection_name
+        self.persist_directory = persist_directory
+        # Initialize ChromaDB client
+        self.client = chromadb.PersistentClient(path=persist_directory)
+        # Get or create collection
+        try:
+            self.collection = self.client.get_collection(name=collection_name)
+            logger.info(f"Loaded existing collection '{collection_name}' with {self.collection.count()} documents")
+        except:
+            self.collection = self.client.create_collection(
+                name=collection_name,
+                metadata={"description": "CGT-LLM-Beta document collection"}
+            )
+            logger.info(f"Created new collection '{collection_name}'")
+        # Initialize embedding model
+        if SENTENCE_TRANSFORMERS_AVAILABLE:
+            self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
+            logger.info("Loaded sentence-transformers embedding model")
+        else:
+            self.embedding_model = None
+            logger.warning("Sentence-transformers not available, using ChromaDB default embeddings")
+    def add_documents(self, chunks: List[Chunk]) -> None:
+        """Add document chunks to the vector database"""
+        if not chunks:
+            return
+        logger.info(f"Adding {len(chunks)} chunks to vector database...")
+        # Prepare data for ChromaDB
+        documents = []
+        metadatas = []
+        ids = []
+        for chunk in chunks:
+            chunk_id = f"{chunk.filename}_{chunk.chunk_id}"
+            documents.append(chunk.text)
+            metadata = {
+                "filename": chunk.filename,
+                "chunk_id": chunk.chunk_id,
+                "total_chunks": chunk.total_chunks,
+                "start_pos": chunk.start_pos,
+                "end_pos": chunk.end_pos,
+                "chunk_hash": chunk.chunk_hash,
+                **chunk.metadata
+            }
+            metadatas.append(metadata)
+            ids.append(chunk_id)
+        # Add to collection
+        try:
+            self.collection.add(
+                documents=documents,
+                metadatas=metadatas,
+                ids=ids
+            )
+            logger.info(f"Successfully added {len(chunks)} chunks to vector database")
+        except Exception as e:
+            logger.error(f"Error adding documents to vector database: {e}")
+    def search(self, query: str, k: int = 5) -> List[Tuple[Chunk, float]]:
+        """Search for similar chunks using vector similarity"""
+        try:
+            # Perform vector search
+            results = self.collection.query(
+                query_texts=[query],
+                n_results=k
+            )
+            chunks_with_scores = []
+            if results['documents'] and results['documents'][0]:
+                for i, (doc, metadata, distance) in enumerate(zip(
+                    results['documents'][0],
+                    results['metadatas'][0],
+                    results['distances'][0]
+                )):
+                    # Convert distance to similarity score (ChromaDB uses cosine distance)
+                    similarity_score = 1 - distance
+                    chunk = Chunk(
+                        text=doc,
+                        filename=metadata['filename'],
+                        chunk_id=metadata['chunk_id'],
+                        total_chunks=metadata['total_chunks'],
+                        start_pos=metadata['start_pos'],
+                        end_pos=metadata['end_pos'],
+                        metadata={k: v for k, v in metadata.items()
+                                if k not in ['filename', 'chunk_id', 'total_chunks', 'start_pos', 'end_pos', 'chunk_hash']},
+                        chunk_hash=metadata.get('chunk_hash', '')
+                    )
+                    chunks_with_scores.append((chunk, similarity_score))
+            return chunks_with_scores
+        except Exception as e:
+            logger.error(f"Error searching vector database: {e}")
+            return []
+    def get_collection_stats(self) -> Dict[str, Any]:
+        """Get statistics about the collection"""
+        try:
+            count = self.collection.count()
+            return {
+                "total_chunks": count,
+                "collection_name": self.collection_name,
+                "persist_directory": self.persist_directory
+            }
+        except Exception as e:
+            logger.error(f"Error getting collection stats: {e}")
+            return {}
+class RAGBot:
+    """Main RAG chatbot class with vector database"""
+    def __init__(self, args):
+        self.args = args
+        self.device = self._setup_device()
+        self.model = None
+        self.tokenizer = None
+        self.vector_retriever = None
+        # Load model (unless skipping for Inference API)
+        if not hasattr(args, 'skip_model_loading') or not args.skip_model_loading:
+            self._load_model()
+        # Initialize vector retriever
+        self._setup_vector_retriever()
+    def _setup_device(self) -> str:
+        """Setup device with MPS support for Apple Silicon"""
+        if torch.backends.mps.is_available():
+            device = "mps"
+            logger.info("Using device: mps (Apple Silicon)")
+        elif torch.cuda.is_available():
+            device = "cuda"
+            logger.info("Using device: cuda")
+        else:
+            device = "cpu"
+            logger.info("Using device: cpu")
+        return device
+    def _load_model(self):
+        """Load the specified LLM model and tokenizer"""
+        try:
+            model_name = self.args.model
+            logger.info(f"Loading model: {model_name}...")
+            from transformers import AutoTokenizer, AutoModelForCausalLM
+            # Get Hugging Face token from environment (for gated models)
+            hf_token = os.getenv("HF_TOKEN") or os.getenv("HUGGING_FACE_HUB_TOKEN")
+            # Load tokenizer
+            tokenizer_kwargs = {
+                "trust_remote_code": True
+            }
+            if hf_token:
+                tokenizer_kwargs["token"] = hf_token
+                logger.info("Using HF_TOKEN for authentication")
+            self.tokenizer = AutoTokenizer.from_pretrained(
+                model_name,
+                **tokenizer_kwargs
+            )
+            # Determine appropriate torch dtype based on device and model
+            # Use float16 for MPS/CUDA, float32 for CPU
+            # Some models work better with bfloat16
+            if self.device == "mps":
+                torch_dtype = torch.float16
+            elif self.device == "cuda":
+                torch_dtype = torch.float16
+            else:
+                torch_dtype = torch.float32
+            # Load model with appropriate settings
+            model_kwargs = {
+                "torch_dtype": torch_dtype,
+                "trust_remote_code": True,
+            }
+            # Add token if available (for gated models)
+            if hf_token:
+                model_kwargs["token"] = hf_token
+            # Use 8-bit quantization on CPU to reduce memory usage
+            # This reduces memory by ~50% with minimal quality loss
+            if self.device == "cpu":
+                try:
+                    from transformers import BitsAndBytesConfig
+                    # Use 8-bit quantization for CPU (reduces memory significantly)
+                    model_kwargs["load_in_8bit"] = False  # 8-bit not available on CPU
+                    # Instead, use float16 even on CPU to save memory
+                    model_kwargs["torch_dtype"] = torch.float16
+                    logger.info("Using float16 on CPU to reduce memory usage")
+                except ImportError:
+                    # Fallback: use float16 anyway
+                    model_kwargs["torch_dtype"] = torch.float16
+                    logger.info("Using float16 on CPU to reduce memory usage (fallback)")
+            # For MPS, use device_map; for CUDA, let it auto-detect
+            if self.device == "mps":
+                model_kwargs["device_map"] = self.device
+            elif self.device == "cuda":
+                model_kwargs["device_map"] = "auto"
+            # For CPU, don't specify device_map
+            self.model = AutoModelForCausalLM.from_pretrained(
+                model_name,
+                **model_kwargs
+            )
+            # Move to device if not using device_map
+            if self.device == "cpu":
+                self.model = self.model.to(self.device)
+            # Set pad token if not already set
+            if self.tokenizer.pad_token is None:
+                if self.tokenizer.eos_token is not None:
+                    self.tokenizer.pad_token = self.tokenizer.eos_token
+                else:
+                    # Some models might need a different approach
+                    self.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
+            logger.info(f"Model {model_name} loaded successfully on {self.device}")
+        except Exception as e:
+            logger.error(f"Failed to load model {self.args.model}: {e}")
+            logger.error("Make sure the model name is correct and you have access to it on HuggingFace")
+            logger.error("For gated models (like Llama), you need to:")
+            logger.error("  1. Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct")
+            logger.error("  2. Add HF_TOKEN as a secret in your Hugging Face Space settings")
+            logger.error("  3. Get your token from: https://huggingface.co/settings/tokens")
+            logger.error("For local use, ensure you're logged in: huggingface-cli login")
+            sys.exit(2)
+    def _setup_vector_retriever(self):
+        """Setup the vector retriever"""
+        try:
+            self.vector_retriever = VectorRetriever(
+                collection_name="cgt_documents",
+                persist_directory=self.args.vector_db_dir
+            )
+            logger.info("Vector retriever initialized successfully")
+        except Exception as e:
+            logger.error(f"Failed to setup vector retriever: {e}")
+            sys.exit(2)
+    def _calculate_file_hash(self, filepath: str) -> str:
+        """Calculate hash of file for change detection"""
+        try:
+            with open(filepath, 'rb') as f:
+                return hashlib.md5(f.read()).hexdigest()
+        except:
+            return ""
+    def _calculate_chunk_hash(self, text: str) -> str:
+        """Calculate hash of chunk text"""
+        return hashlib.md5(text.encode('utf-8')).hexdigest()
+    def load_corpus(self, data_dir: str) -> List[Document]:
+        """Load all documents from the data directory"""
+        logger.info(f"Loading corpus from {data_dir}")
+        documents = []
+        data_path = Path(data_dir)
+        if not data_path.exists():
+            logger.error(f"Data directory {data_dir} does not exist")
+            sys.exit(1)
+        # Supported file extensions
+        supported_extensions = {'.txt', '.md', '.json', '.csv'}
+        if PDF_AVAILABLE:
+            supported_extensions.add('.pdf')
+        if DOCX_AVAILABLE:
+            supported_extensions.add('.docx')
+            supported_extensions.add('.doc')
+        # Find all files recursively
+        files = []
+        for ext in supported_extensions:
+            files.extend(data_path.rglob(f"*{ext}"))
+        logger.info(f"Found {len(files)} files to process")
+        # Process files with progress bar
+        for file_path in tqdm(files, desc="Loading documents"):
+            try:
+                content = self._read_file(file_path)
+                if content.strip():  # Only add non-empty documents
+                    file_hash = self._calculate_file_hash(file_path)
+                    doc = Document(
+                        filename=file_path.name,
+                        content=content,
+                        filepath=str(file_path),
+                        file_type=file_path.suffix.lower(),
+                        file_hash=file_hash
+                    )
+                    documents.append(doc)
+                    logger.debug(f"Loaded {file_path.name} ({len(content)} chars)")
+                else:
+                    logger.warning(f"Skipping empty file: {file_path.name}")
+            except Exception as e:
+                logger.error(f"Failed to load {file_path.name}: {e}")
+                continue
+        logger.info(f"Successfully loaded {len(documents)} documents")
+        return documents
+    def _read_file(self, file_path: Path) -> str:
+        """Read content from various file types"""
+        suffix = file_path.suffix.lower()
+        try:
+            if suffix == '.txt':
+                return file_path.read_text(encoding='utf-8')
+            elif suffix == '.md':
+                return file_path.read_text(encoding='utf-8')
+            elif suffix == '.json':
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    data = json.load(f)
+                    if isinstance(data, dict):
+                        return json.dumps(data, indent=2)
+                    else:
+                        return str(data)
+            elif suffix == '.csv':
+                df = pd.read_csv(file_path)
+                return df.to_string()
+            elif suffix == '.pdf' and PDF_AVAILABLE:
+                text = ""
+                with open(file_path, 'rb') as f:
+                    pdf_reader = pypdf.PdfReader(f)
+                    for page in pdf_reader.pages:
+                        text += page.extract_text() + "\n"
+                return text
+            elif suffix in ['.docx', '.doc'] and DOCX_AVAILABLE:
+                doc = Document(file_path)
+                text = ""
+                for paragraph in doc.paragraphs:
+                    text += paragraph.text + "\n"
+                return text
+            else:
+                logger.warning(f"Unsupported file type: {suffix}")
+                return ""
+        except Exception as e:
+            logger.error(f"Error reading {file_path}: {e}")
+            return ""
+    def chunk_documents(self, docs: List[Document], chunk_size: int, overlap: int) -> List[Chunk]:
+        """Chunk documents into smaller pieces"""
+        logger.info(f"Chunking {len(docs)} documents (size={chunk_size}, overlap={overlap})")
+        chunks = []
+        for doc in docs:
+            doc_chunks = self._chunk_text(
+                doc.content,
+                doc.filename,
+                chunk_size,
+                overlap
+            )
+            chunks.extend(doc_chunks)
+            # Update document metadata
+            doc.chunk_count = len(doc_chunks)
+        logger.info(f"Created {len(chunks)} chunks from {len(docs)} documents")
+        return chunks
+    def _chunk_text(self, text: str, filename: str, chunk_size: int, overlap: int) -> List[Chunk]:
+        """Split text into overlapping chunks"""
+        # Clean text
+        text = re.sub(r'\s+', ' ', text.strip())
+        # Simple token-based chunking (approximate)
+        words = text.split()
+        chunks = []
+        for i in range(0, len(words), chunk_size - overlap):
+            chunk_words = words[i:i + chunk_size]
+            chunk_text = ' '.join(chunk_words)
+            if chunk_text.strip():
+                chunk_hash = self._calculate_chunk_hash(chunk_text)
+                chunk = Chunk(
+                    text=chunk_text,
+                    filename=filename,
+                    chunk_id=len(chunks),
+                    total_chunks=0,  # Will be updated later
+                    start_pos=i,
+                    end_pos=i + len(chunk_words),
+                    metadata={
+                        'word_count': len(chunk_words),
+                        'char_count': len(chunk_text)
+                    },
+                    chunk_hash=chunk_hash
+                )
+                chunks.append(chunk)
+        # Update total_chunks for each chunk
+        for chunk in chunks:
+            chunk.total_chunks = len(chunks)
+        return chunks
+    def build_or_update_index(self, chunks: List[Chunk], force_rebuild: bool = False) -> None:
+        """Build or update the vector index"""
+        if not chunks:
+            logger.warning("No chunks provided for indexing")
+            return
+        # Check if we need to rebuild
+        collection_stats = self.vector_retriever.get_collection_stats()
+        existing_count = collection_stats.get('total_chunks', 0)
+        if existing_count > 0 and not force_rebuild:
+            logger.info(f"Vector database already contains {existing_count} chunks. Use --force-rebuild to rebuild.")
+            return
+        if force_rebuild and existing_count > 0:
+            logger.info("Force rebuild requested. Clearing existing collection...")
+            try:
+                self.client.delete_collection(self.vector_retriever.collection_name)
+                self.vector_retriever.collection = self.client.create_collection(
+                    name=self.vector_retriever.collection_name,
+                    metadata={"description": "CGT-LLM-Beta document collection"}
+                )
+            except Exception as e:
+                logger.error(f"Error clearing collection: {e}")
+        # Add chunks to vector database
+        self.vector_retriever.add_documents(chunks)
+        logger.info("Vector index built successfully")
+    def retrieve(self, query: str, k: int) -> List[Chunk]:
+        """Retrieve relevant chunks for a query using vector search"""
+        results = self.vector_retriever.search(query, k)
+        chunks = [chunk for chunk, score in results]
+        if self.args.verbose:
+            logger.info(f"Retrieved {len(chunks)} chunks for query: {query[:50]}...")
+            for i, (chunk, score) in enumerate(results):
+                logger.info(f"  {i+1}. {chunk.filename} (score: {score:.3f})")
+        return chunks
+    def retrieve_with_scores(self, query: str, k: int) -> Tuple[List[Chunk], List[float]]:
+        """Retrieve relevant chunks with similarity scores
+        Returns:
+            Tuple of (chunks, scores) where scores are similarity scores for each chunk
+        """
+        results = self.vector_retriever.search(query, k)
+        chunks = [chunk for chunk, score in results]
+        scores = [score for chunk, score in results]
+        if self.args.verbose:
+            logger.info(f"Retrieved {len(chunks)} chunks for query: {query[:50]}...")
+            for i, (chunk, score) in enumerate(results):
+                logger.info(f"  {i+1}. {chunk.filename} (score: {score:.3f})")
+        return chunks, scores
+    def format_prompt(self, context_chunks: List[Chunk], question: str) -> str:
+        """Format the prompt with context and question, ensuring it fits within token limits"""
+        context_parts = []
+        for chunk in context_chunks:
+            context_parts.append(f"{chunk.text}")
+        context = "\n".join(context_parts)
+        # Try to use the tokenizer's chat template if available
+        if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+            try:
+                messages = [
+                    {"role": "system", "content": "You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative."},
+                    {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
+                ]
+                base_prompt = self.tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            except Exception as e:
+                logger.warning(f"Failed to use chat template, falling back to manual format: {e}")
+                base_prompt = self._format_prompt_manual(context, question)
+        else:
+            # Fall back to manual formatting (for Llama models)
+            base_prompt = self._format_prompt_manual(context, question)
+        # Check if prompt is too long and truncate context if needed
+        max_context_tokens = 1200  # Leave room for generation
+        try:
+            tokenized = self.tokenizer(base_prompt, return_tensors="pt")
+            current_tokens = tokenized['input_ids'].shape[1]
+        except Exception as e:
+            logger.warning(f"Tokenization error, using base prompt as-is: {e}")
+            return base_prompt
+        if current_tokens > max_context_tokens:
+            # Truncate context to fit within limits
+            try:
+                context_tokens = self.tokenizer(context, return_tensors="pt")['input_ids'].shape[1]
+                available_tokens = max_context_tokens - (current_tokens - context_tokens)
+                if available_tokens > 0:
+                    # Truncate context to fit
+                    truncated_context = self.tokenizer.decode(
+                        self.tokenizer(context, return_tensors="pt", truncation=True, max_length=available_tokens)['input_ids'][0],
+                        skip_special_tokens=True
+                    )
+                    # Reformat with truncated context
+                    if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                        try:
+                            messages = [
+                                {"role": "system", "content": "You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative."},
+                                {"role": "user", "content": f"Context: {truncated_context}\n\nQuestion: {question}"}
+                            ]
+                            prompt = self.tokenizer.apply_chat_template(
+                                messages,
+                                tokenize=False,
+                                add_generation_prompt=True
+                            )
+                        except:
+                            prompt = self._format_prompt_manual(truncated_context, question)
+                    else:
+                        prompt = self._format_prompt_manual(truncated_context, question)
+                else:
+                    # If even basic prompt is too long, use minimal format
+                    prompt = self._format_prompt_manual(context[:500] + "...", question)
+            except Exception as e:
+                logger.warning(f"Error truncating context: {e}, using base prompt")
+                prompt = base_prompt
+        else:
+            prompt = base_prompt
+        return prompt
+    def _format_prompt_manual(self, context: str, question: str) -> str:
+        """Manual prompt formatting for models without chat templates (e.g., Llama)"""
+        return f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative.<|eot_id|><|start_header_id|>user<|end_header_id|>
+Context: {context}
+Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+    def format_improved_prompt(self, context_chunks: List[Chunk], question: str) -> Tuple[str, str]:
+        """Format an improved prompt with better tone, structure, and medical appropriateness
+        Returns:
+            Tuple of (prompt, prompt_text) where prompt_text is the system prompt instructions
+        """
+        context_parts = []
+        for chunk in context_chunks:
+            context_parts.append(f"{chunk.text}")
+        context = "\n".join(context_parts)
+        # Improved prompt with all the feedback incorporated
+        improved_prompt_text = """Provide a concise, neutral, and informative answer based on the provided medical context.
+CRITICAL GUIDELINES:
+- Format your response as clear, well-structured sentences and paragraphs
+- Be concise and direct - focus on answering the specific question asked
+- Use neutral, factual language - do NOT tell the questioner how to feel (avoid phrases like 'don't worry', 'the good news is', etc.)
+- Do NOT use leading or coercive language - present information neutrally to preserve patient autonomy
+- Do NOT make specific medical recommendations - instead state that management decisions should be made with a healthcare provider
+- Use third-person voice only - never claim to be a medical professional or assistant
+- Use consistent terminology: use 'children' (not 'offspring') consistently
+- Do NOT include hypothetical examples with specific names (e.g., avoid 'Aunt Jenna' or similar)
+- Include important distinctions when relevant (e.g., somatic vs. germline variants, reproductive risks)
+- When citing sources, be consistent - always specify which guidelines or sources when mentioned
+- Remove any formatting markers like asterisks (*) or bold markers
+- Do NOT include phrases like 'Here's a rewritten version' - just provide the answer directly
+If the question asks about medical management, screening, or interventions, conclude with: 'Management recommendations are individualized and should be discussed with a healthcare provider or genetic counselor.'"""
+        # Try to use the tokenizer's chat template if available
+        if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+            try:
+                messages = [
+                    {"role": "system", "content": improved_prompt_text},
+                    {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
+                ]
+                base_prompt = self.tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            except Exception as e:
+                logger.warning(f"Failed to use chat template for improved prompt, falling back to manual format: {e}")
+                base_prompt = self._format_improved_prompt_manual(context, question, improved_prompt_text)
+        else:
+            # Fall back to manual formatting (for Llama models)
+            base_prompt = self._format_improved_prompt_manual(context, question, improved_prompt_text)
+        # Check if prompt is too long and truncate context if needed
+        max_context_tokens = 1200  # Leave room for generation
+        try:
+            tokenized = self.tokenizer(base_prompt, return_tensors="pt")
+            current_tokens = tokenized['input_ids'].shape[1]
+        except Exception as e:
+            logger.warning(f"Tokenization error for improved prompt, using base prompt as-is: {e}")
+            return base_prompt, improved_prompt_text
+        if current_tokens > max_context_tokens:
+            # Truncate context to fit within limits
+            try:
+                context_tokens = self.tokenizer(context, return_tensors="pt")['input_ids'].shape[1]
+                available_tokens = max_context_tokens - (current_tokens - context_tokens)
+                if available_tokens > 0:
+                    # Truncate context to fit
+                    truncated_context = self.tokenizer.decode(
+                        self.tokenizer(context, return_tensors="pt", truncation=True, max_length=available_tokens)['input_ids'][0],
+                        skip_special_tokens=True
+                    )
+                    # Reformat with truncated context
+                    if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                        try:
+                            messages = [
+                                {"role": "system", "content": improved_prompt_text},
+                                {"role": "user", "content": f"Context: {truncated_context}\n\nQuestion: {question}"}
+                            ]
+                            prompt = self.tokenizer.apply_chat_template(
+                                messages,
+                                tokenize=False,
+                                add_generation_prompt=True
+                            )
+                        except:
+                            prompt = self._format_improved_prompt_manual(truncated_context, question, improved_prompt_text)
+                    else:
+                        prompt = self._format_improved_prompt_manual(truncated_context, question, improved_prompt_text)
+                else:
+                    # If even basic prompt is too long, use minimal format
+                    prompt = self._format_improved_prompt_manual(context[:500] + "...", question, improved_prompt_text)
+            except Exception as e:
+                logger.warning(f"Error truncating context for improved prompt: {e}, using base prompt")
+                prompt = base_prompt
+        else:
+            prompt = base_prompt
+        return prompt, improved_prompt_text
+    def _format_improved_prompt_manual(self, context: str, question: str, improved_prompt_text: str) -> str:
+        """Manual prompt formatting for improved prompts (for models without chat templates)"""
+        return f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{improved_prompt_text}<|eot_id|><|start_header_id|>user<|end_header_id|>
+Context: {context}
+Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+    def generate_answer(self, prompt: str, **gen_kwargs) -> str:
+        """Generate answer using the language model"""
+        try:
+            if self.args.verbose:
+                logger.info(f"Full prompt (first 500 chars): {prompt[:500]}...")
+            # Tokenize input with more conservative limit to leave room for generation
+            inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1500)
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            if self.args.verbose:
+                logger.info(f"Input tokens: {inputs['input_ids'].shape}")
+            # Generate
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_new_tokens=gen_kwargs.get('max_new_tokens', 512),
+                    temperature=gen_kwargs.get('temperature', 0.7),
+                    top_p=gen_kwargs.get('top_p', 0.95),
+                    repetition_penalty=gen_kwargs.get('repetition_penalty', 1.05),
+                    do_sample=True,
+                    pad_token_id=self.tokenizer.eos_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    use_cache=True,
+                    num_beams=1
+                )
+            # Decode response without skipping special tokens to preserve full length
+            response = self.tokenizer.decode(outputs[0], skip_special_tokens=False)
+            if self.args.verbose:
+                logger.info(f"Full response (first 1000 chars): {response[:1000]}...")
+                logger.info(f"Looking for 'Answer:' in response: {'Answer:' in response}")
+                if "Answer:" in response:
+                    answer_part = response.split("Answer:")[-1]
+                    logger.info(f"Answer part (first 200 chars): {answer_part[:200]}...")
+                # Debug: Show the full response to understand the structure
+                logger.info(f"Full response length: {len(response)}")
+                logger.info(f"Prompt length: {len(prompt)}")
+                logger.info(f"Response after prompt (first 500 chars): {response[len(prompt):][:500]}...")
+            # Extract the answer more robustly by looking for the end of the prompt
+            # Find the actual end of the prompt in the response
+            prompt_end_marker = "<|start_header_id|>assistant<|end_header_id|>\n\n"
+            if prompt_end_marker in response:
+                answer = response.split(prompt_end_marker)[-1].strip()
+            else:
+                # Fallback to character-based extraction
+                answer = response[len(prompt):].strip()
+            if self.args.verbose:
+                logger.info(f"Full LLM output (first 200 chars): {answer[:200]}...")
+                logger.info(f"Full LLM output length: {len(answer)} characters")
+                logger.info(f"Full LLM output (last 200 chars): ...{answer[-200:]}")
+            # Only do minimal cleanup to preserve the full response
+            # Remove special tokens that might interfere with display, but preserve content
+            if "<|start_header_id|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|start_header_id|>"):
+                    answer = answer[:-len("<|start_header_id|>")].strip()
+            if "<|eot_id|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|eot_id|>"):
+                    answer = answer[:-len("<|eot_id|>")].strip()
+            if "<|end_of_text|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|end_of_text|>"):
+                    answer = answer[:-len("<|end_of_text|>")].strip()
+            # Final validation - only reject if completely empty
+            if not answer or len(answer) < 3:
+                answer = "I don't know."
+            if self.args.verbose:
+                logger.info(f"Final answer: '{answer}'")
+            return answer
+        except Exception as e:
+            logger.error(f"Generation error: {e}")
+            return "I encountered an error while generating the answer."
+    def process_questions(self, questions_path: str, **kwargs) -> List[Tuple[str, str, str, str, float, str, float, str, float, str, str]]:
+        """Process all questions and generate answers with multiple readability levels
+        Returns:
+            List of tuples: (question, answer, sources, question_group, original_flesch,
+                            middle_school_answer, middle_school_flesch,
+                            high_school_answer, high_school_flesch, improved_answer, similarity_scores)
+        """
+        logger.info(f"Processing questions from {questions_path}")
+        # Load questions
+        try:
+            with open(questions_path, 'r', encoding='utf-8') as f:
+                questions = [line.strip() for line in f if line.strip()]
+        except Exception as e:
+            logger.error(f"Failed to load questions: {e}")
+            sys.exit(1)
+        logger.info(f"Found {len(questions)} questions to process")
+        qa_pairs = []
+        # Get the improved prompt text for CSV header by calling format_improved_prompt with empty chunks
+        # This will give us the prompt text without actually generating
+        _, improved_prompt_text = self.format_improved_prompt([], "")
+        # Initialize CSV file with headers
+        self.write_csv([], kwargs.get('output_file', 'results.csv'), append=False, improved_prompt_text=improved_prompt_text)
+        # Process each question
+        for i, question in enumerate(tqdm(questions, desc="Processing questions")):
+            logger.info(f"Question {i+1}/{len(questions)}: {question[:50]}...")
+            try:
+                # Categorize question
+                question_group = self._categorize_question(question)
+                # Retrieve relevant chunks with similarity scores
+                context_chunks, similarity_scores = self.retrieve_with_scores(question, self.args.k)
+                # Format similarity scores as a string (comma-separated, 3 decimal places)
+                similarity_scores_str = ", ".join([f"{score:.3f}" for score in similarity_scores]) if similarity_scores else "0.000"
+                if not context_chunks:
+                    answer = "I don't know."
+                    sources = "No sources found"
+                    middle_school_answer = "I don't know."
+                    high_school_answer = "I don't know."
+                    improved_answer = "I don't know."
+                    original_flesch = 0.0
+                    middle_school_flesch = 0.0
+                    high_school_flesch = 0.0
+                    similarity_scores_str = "0.000"
+                else:
+                    # Format original prompt
+                    prompt = self.format_prompt(context_chunks, question)
+                    # Generate original answer
+                    start_time = time.time()
+                    answer = self.generate_answer(prompt, **kwargs)
+                    gen_time = time.time() - start_time
+                    # Generate improved answer
+                    improved_prompt, _ = self.format_improved_prompt(context_chunks, question)
+                    improved_start = time.time()
+                    improved_answer = self.generate_answer(improved_prompt, **kwargs)
+                    improved_time = time.time() - improved_start
+                    # Clean up improved answer - remove unwanted phrases and formatting
+                    improved_answer = self._clean_improved_answer(improved_answer)
+                    logger.info(f"Improved answer generated in {improved_time:.2f}s")
+                    # Extract source documents
+                    sources = self._extract_sources(context_chunks)
+                    # Calculate original answer Flesch score
+                    try:
+                        original_flesch = textstat.flesch_kincaid_grade(answer)
+                    except:
+                        original_flesch = 0.0
+                    # Generate middle school version
+                    readability_start = time.time()
+                    middle_school_answer, middle_school_flesch = self.enhance_readability(answer, "middle_school")
+                    readability_time = time.time() - readability_start
+                    logger.info(f"Middle school readability in {readability_time:.2f}s")
+                    # Generate high school version
+                    readability_start = time.time()
+                    high_school_answer, high_school_flesch = self.enhance_readability(answer, "high_school")
+                    readability_time = time.time() - readability_start
+                    logger.info(f"High school readability in {readability_time:.2f}s")
+                    logger.info(f"Generated answer in {gen_time:.2f}s")
+                    logger.info(f"Sources: {sources}")
+                    logger.info(f"Similarity scores: {similarity_scores_str}")
+                    logger.info(f"Original Flesch: {original_flesch:.1f}, Middle School: {middle_school_flesch:.1f}, High School: {high_school_flesch:.1f}")
+                qa_pairs.append((question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str))
+                # Write incrementally to CSV after each question
+                self.write_csv([(question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str)],
+                             kwargs.get('output_file', 'results.csv'), append=True, improved_prompt_text=improved_prompt_text)
+                logger.info(f"Progress saved: {i+1}/{len(questions)} questions completed")
+            except Exception as e:
+                logger.error(f"Error processing question {i+1}: {e}")
+                error_answer = "I encountered an error processing this question."
+                sources = "Error retrieving sources"
+                question_group = self._categorize_question(question)
+                original_flesch = 0.0
+                middle_school_answer = "I encountered an error processing this question."
+                high_school_answer = "I encountered an error processing this question."
+                improved_answer = "I encountered an error processing this question."
+                middle_school_flesch = 0.0
+                high_school_flesch = 0.0
+                similarity_scores_str = "0.000"
+                qa_pairs.append((question, error_answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str))
+                # Still write the error to CSV
+                self.write_csv([(question, error_answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str)],
+                             kwargs.get('output_file', 'results.csv'), append=True, improved_prompt_text=improved_prompt_text)
+                logger.info(f"Error saved: {i+1}/{len(questions)} questions completed")
+        return qa_pairs
+    def _clean_readability_answer(self, answer: str, target_level: str) -> str:
+        """Clean up readability-enhanced answers to remove unwanted phrases and formatting
+        Args:
+            answer: The readability-enhanced answer
+            target_level: Either "middle_school" or "high_school"
+        """
+        cleaned = answer
+        # Remove the "Here's a rewritten version" phrases
+        if target_level == "middle_school":
+            unwanted_phrases = [
+                "Here's a rewritten version of the text at a middle school reading level:",
+                "Here's a rewritten version of the text at a middle school reading level",
+                "Here is a rewritten version of the text at a middle school reading level:",
+                "Here is a rewritten version of the text at a middle school reading level",
+                "Here's a rewritten version at a middle school reading level:",
+                "Here's a rewritten version at a middle school reading level",
+            ]
+        elif target_level == "high_school":
+            unwanted_phrases = [
+                "Here's a rewritten version of the text at a high school reading level",
+                "Here's a rewritten version of the text at a high school reading level:",
+                "Here is a rewritten version of the text at a high school reading level",
+                "Here is a rewritten version of the text at a high school reading level:",
+                "Here's a rewritten version at a high school reading level",
+                "Here's a rewritten version at a high school reading level:",
+            ]
+        else:
+            unwanted_phrases = []
+        for phrase in unwanted_phrases:
+            if phrase.lower() in cleaned.lower():
+                # Find and remove the phrase (case-insensitive)
+                pattern = re.compile(re.escape(phrase), re.IGNORECASE)
+                cleaned = pattern.sub("", cleaned).strip()
+                # Remove leading colons, semicolons, or dashes
+                cleaned = re.sub(r'^[:;\-]\s*', '', cleaned).strip()
+        # Remove asterisks (but preserve bullet points if they use •)
+        cleaned = re.sub(r'\*\*', '', cleaned)  # Remove bold markers
+        cleaned = re.sub(r'\(\*\)', '', cleaned)  # Remove (*)
+        cleaned = re.sub(r'\*', '', cleaned)  # Remove remaining asterisks
+        # Clean up extra whitespace
+        cleaned = ' '.join(cleaned.split())
+        return cleaned
+    def _clean_improved_answer(self, answer: str) -> str:
+        """Clean up improved answer to remove unwanted phrases and formatting"""
+        # Remove phrases like "Here's a rewritten version" or similar
+        unwanted_phrases = [
+            "Here's a rewritten version",
+            "Here's a version",
+            "Here is a rewritten version",
+            "Here is a version",
+            "Here's the answer",
+            "Here is the answer"
+        ]
+        cleaned = answer
+        for phrase in unwanted_phrases:
+            if phrase.lower() in cleaned.lower():
+                # Find and remove the phrase and any following colon/semicolon
+                pattern = re.compile(re.escape(phrase), re.IGNORECASE)
+                cleaned = pattern.sub("", cleaned).strip()
+                # Remove leading colons, semicolons, or dashes
+                cleaned = re.sub(r'^[:;\-]\s*', '', cleaned).strip()
+        # Remove formatting markers like (*) or ** but preserve bullet points
+        cleaned = re.sub(r'\*\*', '', cleaned)  # Remove bold markers
+        cleaned = re.sub(r'\(\*\)', '', cleaned)  # Remove (*)
+        # Note: Single asterisks are left alone as they might be used for formatting
+        # The prompt specifies using • for bullet points, so this should be fine
+        # Remove "Don't worry" and similar emotional management phrases
+        emotional_phrases = [
+            r"don't worry[^.]*\.\s*",
+            r"Don't worry[^.]*\.\s*",
+            r"the good news is[^.]*\.\s*",
+            r"The good news is[^.]*\.\s*",
+        ]
+        for pattern in emotional_phrases:
+            cleaned = re.sub(pattern, '', cleaned, flags=re.IGNORECASE)
+        # Clean up extra whitespace
+        cleaned = ' '.join(cleaned.split())
+        return cleaned
+    def diagnose_system(self, sample_questions: List[str] = None) -> Dict[str, Any]:
+        """Diagnose the document loading, chunking, and retrieval system
+        Args:
+            sample_questions: Optional list of questions to test retrieval
+        Returns:
+            Dictionary with diagnostic information
+        """
+        diagnostics = {
+            'vector_db_stats': {},
+            'document_stats': {},
+            'chunk_stats': {},
+            'retrieval_tests': []
+        }
+        # Check vector database
+        try:
+            stats = self.vector_retriever.get_collection_stats()
+            diagnostics['vector_db_stats'] = {
+                'total_chunks': stats.get('total_chunks', 0),
+                'collection_name': stats.get('collection_name', 'unknown'),
+                'status': 'OK' if stats.get('total_chunks', 0) > 0 else 'EMPTY'
+            }
+        except Exception as e:
+            diagnostics['vector_db_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test document loading (without actually loading)
+        try:
+            data_path = Path(self.args.data_dir)
+            if data_path.exists():
+                supported_extensions = {'.txt', '.md', '.json', '.csv'}
+                if PDF_AVAILABLE:
+                    supported_extensions.add('.pdf')
+                if DOCX_AVAILABLE:
+                    supported_extensions.add('.docx')
+                    supported_extensions.add('.doc')
+                files = []
+                for ext in supported_extensions:
+                    files.extend(data_path.rglob(f"*{ext}"))
+                # Sample a few files to check content
+                sample_files = files[:5] if len(files) > 5 else files
+                file_samples = []
+                for file_path in sample_files:
+                    try:
+                        content = self._read_file(file_path)
+                        file_samples.append({
+                            'filename': file_path.name,
+                            'size_chars': len(content),
+                            'size_words': len(content.split()),
+                            'readable': True
+                        })
+                    except Exception as e:
+                        file_samples.append({
+                            'filename': file_path.name,
+                            'readable': False,
+                            'error': str(e)
+                        })
+                diagnostics['document_stats'] = {
+                    'total_files_found': len(files),
+                    'sample_files': file_samples,
+                    'status': 'OK'
+                }
+            else:
+                diagnostics['document_stats'] = {
+                    'status': 'ERROR',
+                    'error': f'Data directory {self.args.data_dir} does not exist'
+                }
+        except Exception as e:
+            diagnostics['document_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test chunking on a sample document
+        try:
+            if diagnostics['document_stats'].get('status') == 'OK':
+                sample_file = None
+                for file_info in diagnostics['document_stats'].get('sample_files', []):
+                    if file_info.get('readable', False):
+                        # Find the actual file
+                        data_path = Path(self.args.data_dir)
+                        for ext in ['.txt', '.md', '.pdf', '.docx']:
+                            files = list(data_path.rglob(f"*{file_info['filename']}"))
+                            if files:
+                                sample_file = files[0]
+                                break
+                        if sample_file:
+                            break
+                if sample_file:
+                    content = self._read_file(sample_file)
+                    # Create a dummy document (Document is already imported at top)
+                    sample_doc = Document(
+                        filename=sample_file.name,
+                        content=content,
+                        filepath=str(sample_file),
+                        file_type=sample_file.suffix.lower(),
+                        file_hash=""
+                    )
+                    # Test chunking
+                    sample_chunks = self._chunk_text(
+                        content,
+                        sample_file.name,
+                        self.args.chunk_size,
+                        self.args.chunk_overlap
+                    )
+                    chunk_lengths = [len(chunk.text.split()) for chunk in sample_chunks]
+                    diagnostics['chunk_stats'] = {
+                        'sample_document': sample_file.name,
+                        'total_chunks': len(sample_chunks),
+                        'avg_chunk_size_words': sum(chunk_lengths) / len(chunk_lengths) if chunk_lengths else 0,
+                        'min_chunk_size_words': min(chunk_lengths) if chunk_lengths else 0,
+                        'max_chunk_size_words': max(chunk_lengths) if chunk_lengths else 0,
+                        'chunk_size_setting': self.args.chunk_size,
+                        'chunk_overlap_setting': self.args.chunk_overlap,
+                        'status': 'OK'
+                    }
+        except Exception as e:
+            diagnostics['chunk_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test retrieval with sample questions
+        if sample_questions and diagnostics['vector_db_stats'].get('status') == 'OK':
+            for question in sample_questions:
+                try:
+                    context_chunks = self.retrieve(question, self.args.k)
+                    sources = self._extract_sources(context_chunks)
+                    # Get similarity scores
+                    results = self.vector_retriever.search(question, self.args.k)
+                    # Get sample chunk text (first 200 chars of first chunk)
+                    sample_chunk_text = context_chunks[0].text[:200] + "..." if context_chunks else "N/A"
+                    diagnostics['retrieval_tests'].append({
+                        'question': question,
+                        'chunks_retrieved': len(context_chunks),
+                        'sources': sources,
+                        'similarity_scores': [f"{score:.3f}" for _, score in results],
+                        'sample_chunk_preview': sample_chunk_text,
+                        'status': 'OK' if context_chunks else 'NO_RESULTS'
+                    })
+                except Exception as e:
+                    diagnostics['retrieval_tests'].append({
+                        'question': question,
+                        'status': 'ERROR',
+                        'error': str(e)
+                    })
+        return diagnostics
+    def print_diagnostics(self, diagnostics: Dict[str, Any]) -> None:
+        """Print diagnostic information in a readable format"""
+        print("\n" + "="*80)
+        print("SYSTEM DIAGNOSTICS")
+        print("="*80)
+        # Vector DB Stats
+        print("\n📊 VECTOR DATABASE:")
+        vdb = diagnostics.get('vector_db_stats', {})
+        print(f"  Status: {vdb.get('status', 'UNKNOWN')}")
+        print(f"  Total chunks: {vdb.get('total_chunks', 0)}")
+        print(f"  Collection: {vdb.get('collection_name', 'unknown')}")
+        if 'error' in vdb:
+            print(f"  Error: {vdb['error']}")
+        # Document Stats
+        print("\n📄 DOCUMENT LOADING:")
+        doc_stats = diagnostics.get('document_stats', {})
+        print(f"  Status: {doc_stats.get('status', 'UNKNOWN')}")
+        print(f"  Total files found: {doc_stats.get('total_files_found', 0)}")
+        if 'sample_files' in doc_stats:
+            print(f"  Sample files:")
+            for file_info in doc_stats['sample_files']:
+                if file_info.get('readable', False):
+                    print(f"    ✓ {file_info['filename']}: {file_info.get('size_chars', 0):,} chars, {file_info.get('size_words', 0):,} words")
+                else:
+                    print(f"    ✗ {file_info['filename']}: {file_info.get('error', 'unreadable')}")
+        if 'error' in doc_stats:
+            print(f"  Error: {doc_stats['error']}")
+        # Chunk Stats
+        print("\n✂️  CHUNKING:")
+        chunk_stats = diagnostics.get('chunk_stats', {})
+        print(f"  Status: {chunk_stats.get('status', 'UNKNOWN')}")
+        if chunk_stats.get('status') == 'OK':
+            print(f"  Sample document: {chunk_stats.get('sample_document', 'N/A')}")
+            print(f"  Total chunks from sample: {chunk_stats.get('total_chunks', 0)}")
+            print(f"  Average chunk size: {chunk_stats.get('avg_chunk_size_words', 0):.1f} words")
+            print(f"  Chunk size range: {chunk_stats.get('min_chunk_size_words', 0)} - {chunk_stats.get('max_chunk_size_words', 0)} words")
+            print(f"  Settings: size={chunk_stats.get('chunk_size_setting', 0)}, overlap={chunk_stats.get('chunk_overlap_setting', 0)}")
+        if 'error' in chunk_stats:
+            print(f"  Error: {chunk_stats['error']}")
+        # Retrieval Tests
+        if diagnostics.get('retrieval_tests'):
+            print("\n🔍 RETRIEVAL TESTS:")
+            for test in diagnostics['retrieval_tests']:
+                print(f"\n  Question: {test.get('question', 'N/A')}")
+                print(f"  Status: {test.get('status', 'UNKNOWN')}")
+                if test.get('status') == 'OK':
+                    print(f"  Chunks retrieved: {test.get('chunks_retrieved', 0)}")
+                    print(f"  Sources: {test.get('sources', 'N/A')}")
+                    scores = test.get('similarity_scores', [])
+                    if scores:
+                        print(f"  Similarity scores: {', '.join(scores)}")
+                        # Warn if scores are low
+                        try:
+                            score_values = [float(s) for s in scores]
+                            if max(score_values) < 0.3:
+                                print(f"  ⚠️  WARNING: Low similarity scores - retrieved chunks may not be very relevant")
+                            elif max(score_values) < 0.5:
+                                print(f"  ⚠️  NOTE: Moderate similarity - consider increasing --k or checking chunk quality")
+                        except:
+                            pass
+                    if 'sample_chunk_preview' in test:
+                        print(f"  Sample chunk preview: {test['sample_chunk_preview']}")
+                elif 'error' in test:
+                    print(f"  Error: {test['error']}")
+        print("\n" + "="*80 + "\n")
+    def _extract_sources(self, context_chunks: List[Chunk]) -> str:
+        """Extract source document names from context chunks"""
+        sources = []
+        for chunk in context_chunks:
+            # Debug: Print chunk filename if verbose
+            if self.args.verbose:
+                logger.info(f"Chunk filename: {chunk.filename}")
+            # Extract filename from chunk attribute (not metadata)
+            source = chunk.filename if hasattr(chunk, 'filename') and chunk.filename else 'Unknown source'
+            # Clean up the source name
+            if source.endswith('.pdf'):
+                source = source[:-4]  # Remove .pdf extension
+            elif source.endswith('.txt'):
+                source = source[:-4]  # Remove .txt extension
+            elif source.endswith('.md'):
+                source = source[:-3]  # Remove .md extension
+            sources.append(source)
+        # Remove duplicates while preserving order
+        unique_sources = []
+        for source in sources:
+            if source not in unique_sources:
+                unique_sources.append(source)
+        return "; ".join(unique_sources)
+    def _categorize_question(self, question: str) -> str:
+        """Categorize a question into one of 5 categories"""
+        question_lower = question.lower()
+        # Gene-Specific Recommendations
+        if any(gene in question_lower for gene in ['msh2', 'mlh1', 'msh6', 'pms2', 'epcam', 'brca1', 'brca2']):
+            if any(kw in question_lower for kw in ['screening', 'surveillance', 'prevention', 'recommendation', 'risk', 'cancer risk', 'steps', 'management']):
+                return "Gene-Specific Recommendations"
+        # Inheritance Patterns
+        if any(kw in question_lower for kw in ['inherit', 'inherited', 'pass', 'skip a generation', 'generation', 'can i pass']):
+            return "Inheritance Patterns"
+        # Family Risk Assessment
+        if any(kw in question_lower for kw in ['family member', 'relative', 'first-degree', 'family risk', 'which relative', 'should my family']):
+            return "Family Risk Assessment"
+        # Genetic Variant Interpretation
+        if any(kw in question_lower for kw in ['what does', 'genetic variant mean', 'variant mean', 'mutation mean', 'genetic result']):
+            return "Genetic Variant Interpretation"
+        # Support and Resources
+        if any(kw in question_lower for kw in ['cope', 'overwhelmed', 'resource', 'genetic counselor', 'support', 'research', 'help', 'insurance', 'gina']):
+            return "Support and Resources"
+        # Default to Genetic Variant Interpretation if unclear
+        return "Genetic Variant Interpretation"
+    def enhance_readability(self, answer: str, target_level: str = "middle_school") -> Tuple[str, float]:
+        """Enhance answer readability to different levels and calculate Flesch-Kincaid Grade Level
+        Args:
+            answer: The original answer to simplify or enhance
+            target_level: One of "middle_school", "high_school", "college", or "doctoral"
+        Returns:
+            Tuple of (enhanced_answer, grade_level)
+        """
+        try:
+            # Define prompts for different reading levels
+            if target_level == "middle_school":
+                level_description = "middle school reading level (ages 12-14, 6th-8th grade)"
+                instructions = """
+- Use simpler medical terms or explain them
+- Medium-length sentences
+- Clear, structured explanations
+- Keep important medical information accessible"""
+            elif target_level == "high_school":
+                level_description = "high school reading level (ages 15-18, 9th-12th grade)"
+                instructions = """
+- Use appropriate medical terminology with context
+- Varied sentence length
+- Comprehensive yet accessible explanations
+- Maintain technical accuracy while ensuring clarity"""
+            elif target_level == "college":
+                level_description = "college reading level (undergraduate level, ages 18-22)"
+                instructions = """
+- Use standard medical terminology with brief explanations
+- Professional and clear writing style
+- Include relevant clinical context
+- Maintain scientific accuracy and precision
+- Appropriate for undergraduate students in health sciences"""
+            elif target_level == "doctoral":
+                level_description = "doctoral/professional reading level (graduate level, medical professionals)"
+                instructions = """
+- Use advanced medical and scientific terminology
+- Include detailed clinical and research context
+- Reference specific mechanisms, pathways, and evidence
+- Provide comprehensive technical explanations
+- Appropriate for medical professionals, researchers, and graduate students
+- Include nuanced discussions of clinical implications and research findings"""
+            else:
+                raise ValueError(f"Unknown target_level: {target_level}. Must be one of: middle_school, high_school, college, doctoral")
+            # Create a prompt to enhance the medical answer for the target level
+            # Try to use chat template if available, otherwise use manual format
+            system_message = f"""You are a helpful medical assistant who specializes in explaining complex medical information at appropriate reading levels. Rewrite the following medical answer for {level_description}:
+{instructions}
+- Keep the same important information but adapt the complexity
+- Provide context for technical terms
+- Ensure the answer is informative yet understandable"""
+            user_message = f"Please rewrite this medical answer for {level_description}:\n\n{answer}"
+            # Try to use chat template if available
+            if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                try:
+                    messages = [
+                        {"role": "system", "content": system_message},
+                        {"role": "user", "content": user_message}
+                    ]
+                    readability_prompt = self.tokenizer.apply_chat_template(
+                        messages,
+                        tokenize=False,
+                        add_generation_prompt=True
+                    )
+                except Exception as e:
+                    logger.warning(f"Failed to use chat template for readability, falling back to manual format: {e}")
+                    readability_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_message}
+<|eot_id|><|start_header_id|>user<|end_header_id|>
+{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+            else:
+                # Fall back to manual formatting (for Llama models)
+                readability_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_message}
+<|eot_id|><|start_header_id|>user<|end_header_id|>
+{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+            # Generate simplified answer
+            inputs = self.tokenizer(readability_prompt, return_tensors="pt", truncation=True, max_length=2048)
+            if self.device == "mps":
+                inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            # Adjust generation parameters based on target level
+            if target_level in ["college", "doctoral"]:
+                max_tokens = 512  # Reduced from 1024 for faster responses
+                temp = 0.4  # Slightly higher temperature for more natural flow
+            else:
+                max_tokens = 384  # Reduced from 512 for faster responses
+                temp = 0.3  # Lower temperature for more consistent simplification
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_new_tokens=max_tokens,
+                    temperature=temp,
+                    top_p=0.9,
+                    repetition_penalty=1.05,
+                    do_sample=True,
+                    pad_token_id=self.tokenizer.eos_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    use_cache=True,
+                    num_beams=1
+                )
+            # Decode response
+            response = self.tokenizer.decode(outputs[0], skip_special_tokens=False)
+            # Extract enhanced answer
+            # Try to find the assistant response marker
+            prompt_end_marker = "<|start_header_id|>assistant<|end_header_id|>\n\n"
+            if prompt_end_marker in response:
+                simplified_answer = response.split(prompt_end_marker)[-1].strip()
+            elif "<|assistant|>" in response:
+                # Some chat templates use <|assistant|>
+                simplified_answer = response.split("<|assistant|>")[-1].strip()
+            else:
+                # Fallback: extract everything after the prompt
+                simplified_answer = response[len(readability_prompt):].strip()
+            # Clean up special tokens
+            if "<|eot_id|>" in simplified_answer:
+                if simplified_answer.endswith("<|eot_id|>"):
+                    simplified_answer = simplified_answer[:-len("<|eot_id|>")].strip()
+            if "<|end_of_text|>" in simplified_answer:
+                if simplified_answer.endswith("<|end_of_text|>"):
+                    simplified_answer = simplified_answer[:-len("<|end_of_text|>")].strip()
+            # Clean up unwanted phrases and formatting
+            simplified_answer = self._clean_readability_answer(simplified_answer, target_level)
+            # Calculate Flesch-Kincaid Grade Level
+            try:
+                grade_level = textstat.flesch_kincaid_grade(simplified_answer)
+            except:
+                grade_level = 0.0
+            if self.args.verbose:
+                logger.info(f"Simplified answer length: {len(simplified_answer)} characters")
+                logger.info(f"Flesch-Kincaid Grade Level: {grade_level:.1f}")
+            return simplified_answer, grade_level
+        except Exception as e:
+            logger.error(f"Error enhancing readability: {e}")
+            # Fallback: return original answer with estimated grade level
+            try:
+                grade_level = textstat.flesch_kincaid_grade(answer)
+            except:
+                grade_level = 12.0  # Default to high school level
+            return answer, grade_level
+    def write_csv(self, qa_pairs: List[Tuple[str, str, str, str, float, str, float, str, float, str, str]], output_path: str, append: bool = False, improved_prompt_text: str = "") -> None:
+        """Write Q&A pairs to CSV file in results folder
+        Expected tuple format: (question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores)
+        """
+        # Ensure results directory exists
+        os.makedirs('results', exist_ok=True)
+        # If output_path doesn't already have results/ prefix, add it
+        if not output_path.startswith('results/'):
+            output_path = f'results/{output_path}'
+        if append:
+            logger.info(f"Appending results to {output_path}")
+        else:
+            logger.info(f"Writing results to {output_path}")
+        # Create output directory if needed
+        output_path = Path(output_path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        try:
+            # Check if file exists and if we're appending
+            file_exists = output_path.exists()
+            write_mode = 'a' if append and file_exists else 'w'
+            with open(output_path, write_mode, newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                # Write header only if creating new file or first append
+                if not append or not file_exists:
+                    # Create improved answer header with prompt text
+                    improved_header = f'improved_answer (PROMPT: {improved_prompt_text})'
+                    writer.writerow(['question', 'question_group', 'answer', 'original_flesch', 'sources',
+                                   'similarity_scores', 'middle_school_answer', 'middle_school_flesch',
+                                   'high_school_answer', 'high_school_flesch', improved_header])
+                for data in qa_pairs:
+                    # Unpack the data tuple
+                    (question, answer, sources, question_group, original_flesch,
+                     middle_school_answer, middle_school_flesch,
+                     high_school_answer, high_school_flesch, improved_answer, similarity_scores) = data
+                    # Clean and escape the answers for CSV
+                    def clean_text(text):
+                        # Replace newlines with spaces and clean up formatting
+                        cleaned = text.replace('\n', ' ').replace('\r', ' ')
+                        # Remove extra whitespace but preserve the full content
+                        cleaned = ' '.join(cleaned.split())
+                        # Escape quotes properly for CSV
+                        cleaned = cleaned.replace('"', '""')
+                        return cleaned
+                    clean_question = clean_text(question)
+                    clean_answer = clean_text(answer)
+                    clean_sources = clean_text(sources)
+                    clean_middle_school = clean_text(middle_school_answer)
+                    clean_high_school = clean_text(high_school_answer)
+                    clean_improved = clean_text(improved_answer)
+                    # Log the full answer length for debugging
+                    if self.args.verbose:
+                        logger.info(f"Writing answer length: {len(clean_answer)} characters")
+                        logger.info(f"Middle school answer length: {len(clean_middle_school)} characters")
+                        logger.info(f"High school answer length: {len(clean_high_school)} characters")
+                        logger.info(f"Improved answer length: {len(clean_improved)} characters")
+                        logger.info(f"Question group: {question_group}")
+                    # Use proper CSV quoting - let csv.writer handle the quoting
+                    writer.writerow([
+                        clean_question,
+                        question_group,
+                        clean_answer,
+                        f"{original_flesch:.1f}",
+                        clean_sources,
+                        similarity_scores,  # Similarity scores (comma-separated)
+                        clean_middle_school,
+                        f"{middle_school_flesch:.1f}",
+                        clean_high_school,
+                        f"{high_school_flesch:.1f}",
+                        clean_improved
+                    ])
+            if append:
+                logger.info(f"Appended {len(qa_pairs)} Q&A pairs to {output_path}")
+            else:
+                logger.info(f"Successfully wrote {len(qa_pairs)} Q&A pairs to {output_path}")
+        except Exception as e:
+            logger.error(f"Failed to write CSV: {e}")
+            sys.exit(4)
+def parse_args():
+    """Parse command line arguments"""
+    parser = argparse.ArgumentParser(description="RAG Chatbot for CGT-LLM-Beta with Vector Database")
+    # File paths
+    parser.add_argument('--data-dir', default='./Data Resources',
+                       help='Directory containing documents to index')
+    parser.add_argument('--questions', default='./questions.txt',
+                       help='File containing questions (one per line)')
+    parser.add_argument('--out', default='./answers.csv',
+                       help='Output CSV file for answers')
+    parser.add_argument('--vector-db-dir', default='./chroma_db',
+                       help='Directory for ChromaDB persistence')
+    # Retrieval parameters
+    parser.add_argument('--k', type=int, default=5,
+                       help='Number of chunks to retrieve per question')
+    # Chunking parameters
+    parser.add_argument('--chunk-size', type=int, default=500,
+                       help='Size of text chunks in tokens')
+    parser.add_argument('--chunk-overlap', type=int, default=200,
+                       help='Overlap between chunks in tokens')
+    # Model selection
+    parser.add_argument('--model', type=str, default='meta-llama/Llama-3.2-3B-Instruct',
+                       help='HuggingFace model name to use (e.g., meta-llama/Llama-3.2-3B-Instruct, mistralai/Mistral-7B-Instruct-v0.2)')
+    # Generation parameters
+    parser.add_argument('--max-new-tokens', type=int, default=1024,
+                       help='Maximum new tokens to generate')
+    parser.add_argument('--temperature', type=float, default=0.2,
+                       help='Generation temperature')
+    parser.add_argument('--top-p', type=float, default=0.9,
+                       help='Top-p sampling parameter')
+    parser.add_argument('--repetition-penalty', type=float, default=1.1,
+                       help='Repetition penalty')
+    # Database options
+    parser.add_argument('--force-rebuild', action='store_true',
+                       help='Force rebuild of vector database')
+    parser.add_argument('--skip-indexing', action='store_true',
+                       help='Skip document indexing, use existing database')
+    # Other options
+    parser.add_argument('--seed', type=int, default=42,
+                       help='Random seed for reproducibility')
+    parser.add_argument('--verbose', action='store_true',
+                       help='Enable verbose logging')
+    parser.add_argument('--dry-run', action='store_true',
+                       help='Build index and test retrieval without generation')
+    parser.add_argument('--diagnose', action='store_true',
+                       help='Run system diagnostics and exit')
+    return parser.parse_args()
+def main():
+    """Main function"""
+    args = parse_args()
+    # Set random seed
+    torch.manual_seed(args.seed)
+    np.random.seed(args.seed)
+    # Set logging level
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+    logger.info("Starting RAG Chatbot with Vector Database")
+    logger.info(f"Arguments: {vars(args)}")
+    try:
+        # Initialize bot
+        bot = RAGBot(args)
+        # Check if we should skip indexing
+        if not args.skip_indexing:
+            # Load and process documents
+            documents = bot.load_corpus(args.data_dir)
+            if not documents:
+                logger.error("No documents found to process")
+                sys.exit(3)
+            # Chunk documents
+            chunks = bot.chunk_documents(documents, args.chunk_size, args.chunk_overlap)
+            if not chunks:
+                logger.error("No chunks created from documents")
+                sys.exit(3)
+            # Build or update index
+            bot.build_or_update_index(chunks, args.force_rebuild)
+        else:
+            logger.info("Skipping document indexing, using existing vector database")
+        # Run diagnostics if requested
+        if args.diagnose:
+            sample_questions = [
+                "What is Lynch Syndrome?",
+                "What does a BRCA1 genetic variant mean?",
+                "What screening tests are recommended for MSH2 carriers?"
+            ]
+            diagnostics = bot.diagnose_system(sample_questions=sample_questions)
+            bot.print_diagnostics(diagnostics)
+            return
+        if args.dry_run:
+            logger.info("Dry run completed successfully")
+            return
+        # Process questions
+        generation_kwargs = {
+            'max_new_tokens': args.max_new_tokens,
+            'temperature': args.temperature,
+            'top_p': args.top_p,
+            'repetition_penalty': args.repetition_penalty
+        }
+        qa_pairs = bot.process_questions(args.questions, output_file=args.out, **generation_kwargs)
+        logger.info("RAG Chatbot completed successfully")
+    except KeyboardInterrupt:
+        logger.info("Interrupted by user")
+        sys.exit(0)
+    except Exception as e:
+        logger.error(f"Unexpected error: {e}")
+        if args.verbose:
+            import traceback
+            traceback.print_exc()
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/data_level0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80fe29380be0f587de8c3d0df3bbd891219ebe35d3ab4e007721d322ca704b9f
+size 18888520

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/header.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56091853c1c20a1ec97ba4a7935cb7ab95f58b91d1ca56b990bf768f7bd2df88
+size 100

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/index_metadata.pickle ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:754f12ddf66368443039e44c7d3625dbfa54c42604f231054e5c8ab8df162ebb
+size 548379

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/length.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e72c9f5fb80c8fa3f488f68172cf32cdaf226d94cb6cff09ff68990b34fbb04c
+size 45080

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/link_lists.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a0046b8333ff42649a27896a5da1f0fd89ee54954221fde9172dfe284d94262b
+size 99820

chroma_db/chroma.sqlite3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70340ab0d0dddb6b5bcf29c0e09f316b0f695f6645be0231302346d5af463700
+size 294584320

requirements.txt ADDED Viewed

	@@ -0,0 +1,56 @@

+# =============================================================================
+# RAG Chatbot with Vector Database - Requirements
+# =============================================================================
+# Production-ready dependencies for medical document analysis and Q&A
+# Core ML/AI Framework
+torch>=2.0.0                    # PyTorch for model inference
+transformers>=4.30.0            # Hugging Face transformers
+huggingface_hub>=0.20.0        # Hugging Face Hub API (for Inference API)
+accelerate>=0.20.0              # Model loading optimization
+safetensors>=0.3.0              # Safe model loading
+# Vector Database & Embeddings
+chromadb>=0.4.0                 # Vector database for fast retrieval
+sentence-transformers>=2.2.0   # Semantic embeddings (all-MiniLM-L6-v2)
+# Data Processing
+pandas>=1.3.0                   # Data manipulation and CSV handling
+numpy>=1.21.0                   # Numerical computing
+scikit-learn>=1.0.0             # ML utilities and TF-IDF
+# Text Analysis & Readability
+textstat>=0.7.0                 # Flesch-Kincaid Grade Level calculation
+nltk>=3.8.0                     # Natural language processing utilities
+# Document Processing (Core)
+pypdf>=3.0.0                    # PDF document parsing
+python-docx>=0.8.11             # DOCX document parsing
+# Optional Document Processing
+rank-bm25>=0.2.2                # BM25 retrieval algorithm (alternative to TF-IDF)
+# Utilities & Progress
+tqdm>=4.65.0                    # Progress bars
+pathlib2>=2.3.0                 # Enhanced path handling (if needed)
+# Web Interface
+gradio==4.44.1                  # Gradio web interface for chatbot (updated for Spaces compatibility)
+# Development & Testing (Optional)
+pytest>=7.0.0                   # Testing framework
+black>=22.0.0                   # Code formatting
+flake8>=4.0.0                   # Code linting
+# Performance Monitoring (Optional)
+psutil>=5.8.0                   # System resource monitoring
+memory-profiler>=0.60.0         # Memory usage profiling
+# =============================================================================
+# Installation Notes:
+# =============================================================================
+# 1. Install with: pip install -r requirements.txt
+# 2. For Apple Silicon: PyTorch will automatically use MPS acceleration
+# 3. Optional packages can be installed separately if needed
+# 4. Model files (~6GB) will be downloaded on first run
+# =============================================================================