# DeepCritical: Medical Drug Repurposing Research Agent ## Project Overview --- ## Executive Summary **DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases. ### The Problem We Solve Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must: - Search thousands of papers across multiple databases - Identify molecular mechanisms - Find relevant clinical trials - Assess safety profiles - Synthesize evidence into actionable insights **DeepCritical automates this process from hours to minutes.** ### What Is Drug Repurposing? **Simple Explanation:** Using existing approved drugs to treat NEW diseases they weren't originally designed for. **Real Examples:** - **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction - **Thalidomide**: Once banned → Now treats multiple myeloma - **Aspirin**: Pain reliever → Heart attack prevention - **Metformin**: Diabetes drug → Being tested for aging/longevity **Why It Matters:** - Faster than developing new drugs (years vs decades) - Cheaper (known safety profiles) - Lower risk (already FDA approved) - Immediate patient benefit potential --- ## Core Use Case ### Primary Query Type > "What existing drugs might help treat [disease/condition]?" ### Example Queries 1. **Long COVID Fatigue** - Query: "What existing drugs might help treat long COVID fatigue?" - Agent searches: PubMed, clinical trials, drug databases - Output: List of candidate drugs with mechanisms + evidence + citations 2. **Alzheimer's Disease** - Query: "Find existing drugs that target beta-amyloid pathways" - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence - Output: Comprehensive research report with drug candidates 3. **Rare Disease Treatment** - Query: "What drugs might help with fibrodysplasia ossificans progressiva?" - Agent finds: Similar conditions → Shared pathways → Potential treatments - Output: Evidence-based treatment suggestions --- ## System Architecture ### High-Level Design (Phases 1-8) ```text User Query ↓ Gradio UI (Phase 4) ↓ Magentic Manager (Phase 5) ← LLM-powered coordinator ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6) ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment └── ReportAgent (Phase 8) ←→ Final Synthesis ↓ Structured Research Report ``` ### Key Components 1. **Magentic Manager (Orchestrator)** - LLM-powered multi-agent coordinator - Dynamic planning and agent selection - Built-in stall detection and replanning - Microsoft Agent Framework integration 2. **SearchAgent (Phase 2+5+6)** - PubMed E-utilities search - DuckDuckGo web search - Semantic search via ChromaDB (Phase 6) - Evidence deduplication 3. **HypothesisAgent (Phase 7)** - Generates Drug → Target → Pathway → Effect hypotheses - Guides targeted searches - Scientific reasoning about mechanisms 4. **JudgeAgent (Phase 3+5)** - LLM-based evidence assessment - Mechanism score + Clinical score - Recommends continue/synthesize - Generates refined search queries 5. **ReportAgent (Phase 8)** - Structured scientific reports - Executive summary, methodology - Hypotheses tested with evidence counts - Proper citations and limitations 6. **Gradio UI (Phase 4)** - Chat interface for questions - Real-time progress via events - Mode toggle (Simple/Magentic) - Formatted markdown output --- ## Design Patterns ### 1. Search-and-Judge Loop (Primary Pattern) ```python def research(question: str) -> Report: context = [] for iteration in range(max_iterations): # SEARCH: Query relevant tools results = search_tools(question, context) context.extend(results) # JUDGE: Evaluate quality if judge.is_sufficient(question, context): break # REFINE: Adjust search strategy query = refine_query(question, context) # SYNTHESIZE: Generate report return synthesize_report(question, context) ``` **Why This Pattern:** - Simple to implement and debug - Clear loop termination conditions - Iterative improvement of search quality - Balances depth vs speed ### 2. Multi-Tool Orchestration ``` Question → Agent decides which tools to use ↓ ┌───┴────┬─────────┬──────────┐ ↓ ↓ ↓ ↓ PubMed Web Search Trials DB Drug DB ↓ ↓ ↓ ↓ └───┬────┴─────────┴──────────┘ ↓ Aggregate Results → Judge ``` **Why This Pattern:** - Different sources provide different evidence types - Parallel tool execution (when possible) - Comprehensive coverage ### 3. LLM-as-Judge with Token Budget **Dual Stopping Conditions:** - **Smart Stop**: LLM judge says "we have sufficient evidence" - **Hard Stop**: Token budget exhausted OR max iterations reached **Why Both:** - Judge enables early exit when answer is good - Budget prevents runaway costs - Iterations prevent infinite loops ### 4. Stateful Checkpointing ``` .deepresearch/ ├── state/ │ └── query_123.json # Current research state ├── checkpoints/ │ └── query_123_iter3/ # Checkpoint at iteration 3 └── workspace/ └── query_123/ # Downloaded papers, data ``` **Why This Pattern:** - Resume interrupted research - Debugging and analysis - Cost savings (don't re-search) --- ## Component Breakdown ### Agent (Orchestrator) - **Responsibility**: Coordinate research process - **Size**: ~100 lines - **Key Methods**: - `research(question)` - Main entry point - `plan_search_strategy()` - Decide what to search - `execute_search()` - Run tool queries - `evaluate_progress()` - Call judge - `synthesize_findings()` - Generate report ### Tools - **Responsibility**: Interface with external data sources - **Size**: ~50 lines per tool - **Implementations**: - `PubMedTool` - Search biomedical literature - `WebSearchTool` - General medical information - `ClinicalTrialsTool` - Trial data (optional) - `DrugInfoTool` - FDA drug database (optional) ### Judge - **Responsibility**: Evaluate evidence quality - **Size**: ~50 lines - **Key Methods**: - `is_sufficient(question, evidence)` → bool - `assess_quality(evidence)` → score - `identify_gaps(question, evidence)` → missing_info ### Gradio App - **Responsibility**: User interface - **Size**: ~50 lines - **Features**: - Text input for questions - Progress indicators - Formatted output with citations - Download research report --- ## Technical Stack ### Core Dependencies ```toml [dependencies] python = ">=3.10" pydantic = "^2.7" pydantic-ai = "^0.0.16" fastmcp = "^0.1.0" gradio = "^5.0" beautifulsoup4 = "^4.12" httpx = "^0.27" ``` ### Optional Enhancements - `modal` - For GPU-accelerated local LLM - `fastmcp` - MCP server integration - `sentence-transformers` - Semantic search - `faiss-cpu` - Vector similarity ### Tool APIs & Rate Limits | API | Cost | Rate Limit | API Key? | Notes | |-----|------|------------|----------|-------| | **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits | | **Brave Search API** | Free tier | 2000/month free | Required | Primary web search | | **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search | | **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal | | **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info | **Web Search Strategy (Priority Order):** 1. **Brave Search API** (free tier: 2000 queries/month) - Primary 2. **DuckDuckGo** (unofficial, no API key) - Fallback 3. **SerpAPI** ($50/month) - Only if free options fail **Why NOT SerpAPI first?** - Costs money (hackathon budget = $0) - Free alternatives work fine for demo - Can upgrade later if needed --- ## Success Criteria ### Phase 1-5 (MVP) ✅ COMPLETE **Completed in ONE DAY:** - [x] User can ask drug repurposing question - [x] Agent searches PubMed (async) - [x] Agent searches web (DuckDuckGo) - [x] LLM judge evaluates evidence quality - [x] System respects token budget and iterations - [x] Output includes drug candidates + citations - [x] Works end-to-end for demo query - [x] Gradio UI with streaming progress - [x] Magentic multi-agent orchestration - [x] 38 unit tests passing - [x] CI/CD pipeline green ### Hackathon Submission ✅ COMPLETE - [x] Gradio UI deployed on HuggingFace Spaces - [x] Example queries working and tested - [x] Architecture documentation - [x] README with setup instructions ### Phase 6-8 (Enhanced) **Specs ready for implementation:** - [ ] Embeddings & Semantic Search (Phase 6) - [ ] Hypothesis Agent (Phase 7) - [ ] Report Agent (Phase 8) ### What's EXPLICITLY Out of Scope **NOT building (to stay focused):** - ❌ User authentication - ❌ Database storage of queries - ❌ Multi-user support - ❌ Payment/billing - ❌ Production monitoring - ❌ Mobile UI --- ## Implementation Timeline ### Day 1 (Today): Architecture & Setup - [x] Define use case (drug repurposing) ✅ - [x] Write architecture docs ✅ - [ ] Create project structure - [ ] First PR: Structure + Docs ### Day 2: Core Agent Loop - [ ] Implement basic orchestrator - [ ] Add PubMed search tool - [ ] Simple judge (keyword-based) - [ ] Test with 1 query ### Day 3: Intelligence Layer - [ ] Upgrade to LLM judge - [ ] Add web search tool - [ ] Token budget tracking - [ ] Test with multiple queries ### Day 4: UI & Integration - [ ] Build Gradio interface - [ ] Wire up agent to UI - [ ] Add progress indicators - [ ] Format output nicely ### Day 5: Polish & Extend - [ ] Add more tools (clinical trials) - [ ] Improve judge prompts - [ ] Checkpoint system - [ ] Error handling ### Day 6: Deploy & Document - [ ] Deploy to HuggingFace Spaces - [ ] Record demo video - [ ] Write submission materials - [ ] Final testing --- ## Questions This Document Answers ### For The Maintainer **Q: "What should our design pattern be?"** A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section) **Q: "Should we use LLM-as-judge or token budget?"** A: Both - judge for smart stopping, budget for cost control **Q: "What's the break pattern?"** A: Three conditions: judge approval, token limit, or max iterations (whichever comes first) **Q: "What components do we need?"** A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown) ### For The Team **Q: "What are we actually building?"** A: Medical drug repurposing research agent (see Core Use Case) **Q: "How complex should it be?"** A: Simple but complete - ~300 lines of core code (see Component sizes) **Q: "What's the timeline?"** A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline) **Q: "What datasets/APIs do we use?"** A: PubMed (free), web search, clinical trials.gov (see Tool APIs) --- ## Next Steps 1. **Review this document** - Team feedback on architecture 2. **Finalize design** - Incorporate feedback 3. **Create project structure** - Scaffold repository 4. **Move to proper docs** - `docs/architecture/` folder 5. **Open first PR** - Structure + Documentation 6. **Start implementation** - Day 2 onward --- ## Notes & Decisions ### Why Drug Repurposing? - Clear, impressive use case - Real-world medical impact - Good data availability (PubMed, trials) - Easy to explain (Viagra example!) - Physician on team ✅ ### Why Simple Architecture? - 6-day timeline - Need working end-to-end system - Hackathon judges value "works" over "complex" - Can extend later if successful ### Why These Tools First? - PubMed: Best biomedical literature source - Web search: General medical knowledge - Clinical trials: Evidence of actual testing - Others: Nice-to-have, not critical for MVP --- --- ## Appendix A: Demo Queries (Pre-tested) These queries will be used for demo and testing. They're chosen because: 1. They have good PubMed coverage 2. They're medically interesting 3. They show the system's capabilities ### Primary Demo Query ``` "What existing drugs might help treat long COVID fatigue?" ``` **Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil **Expected sources**: 20+ PubMed papers, 2-3 clinical trials ### Secondary Demo Queries ``` "Find existing drugs that might slow Alzheimer's progression" "What approved medications could help with fibromyalgia pain?" "Which diabetes drugs show promise for cancer treatment?" ``` ### Why These Queries? - Represent real clinical needs - Have substantial literature - Show diverse drug classes - Physician on team can validate results --- ## Appendix B: Risk Assessment | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | PubMed rate limiting | Medium | High | Implement caching, respect 3/sec | | Web search API fails | Low | Medium | DuckDuckGo fallback | | LLM costs exceed budget | Medium | Medium | Hard token cap at 50K | | Judge quality poor | Medium | High | Pre-test prompts, iterate | | HuggingFace deploy issues | Low | High | Test deployment Day 4 | | Demo crashes live | Medium | High | Pre-recorded backup video | --- --- **Document Status**: Official Architecture Spec **Review Score**: 98/100 **Last Updated**: November 2025