VibecoderMcSwaggins's picture
fix: address CodeRabbit Phase 5 review feedback
d247864
|
raw
history blame
13.8 kB
# DeepCritical: Medical Drug Repurposing Research Agent
## Project Overview
---
## Executive Summary
**DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.
### The Problem We Solve
Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
- Search thousands of papers across multiple databases
- Identify molecular mechanisms
- Find relevant clinical trials
- Assess safety profiles
- Synthesize evidence into actionable insights
**DeepCritical automates this process from hours to minutes.**
### What Is Drug Repurposing?
**Simple Explanation:**
Using existing approved drugs to treat NEW diseases they weren't originally designed for.
**Real Examples:**
- **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
- **Thalidomide**: Once banned → Now treats multiple myeloma
- **Aspirin**: Pain reliever → Heart attack prevention
- **Metformin**: Diabetes drug → Being tested for aging/longevity
**Why It Matters:**
- Faster than developing new drugs (years vs decades)
- Cheaper (known safety profiles)
- Lower risk (already FDA approved)
- Immediate patient benefit potential
---
## Core Use Case
### Primary Query Type
> "What existing drugs might help treat [disease/condition]?"
### Example Queries
1. **Long COVID Fatigue**
- Query: "What existing drugs might help treat long COVID fatigue?"
- Agent searches: PubMed, clinical trials, drug databases
- Output: List of candidate drugs with mechanisms + evidence + citations
2. **Alzheimer's Disease**
- Query: "Find existing drugs that target beta-amyloid pathways"
- Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
- Output: Comprehensive research report with drug candidates
3. **Rare Disease Treatment**
- Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
- Agent finds: Similar conditions → Shared pathways → Potential treatments
- Output: Evidence-based treatment suggestions
---
## System Architecture
### High-Level Design (Phases 1-8)
```text
User Query
Gradio UI (Phase 4)
Magentic Manager (Phase 5) ← LLM-powered coordinator
├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
└── ReportAgent (Phase 8) ←→ Final Synthesis
Structured Research Report
```
### Key Components
1. **Magentic Manager (Orchestrator)**
- LLM-powered multi-agent coordinator
- Dynamic planning and agent selection
- Built-in stall detection and replanning
- Microsoft Agent Framework integration
2. **SearchAgent (Phase 2+5+6)**
- PubMed E-utilities search
- DuckDuckGo web search
- Semantic search via ChromaDB (Phase 6)
- Evidence deduplication
3. **HypothesisAgent (Phase 7)**
- Generates Drug → Target → Pathway → Effect hypotheses
- Guides targeted searches
- Scientific reasoning about mechanisms
4. **JudgeAgent (Phase 3+5)**
- LLM-based evidence assessment
- Mechanism score + Clinical score
- Recommends continue/synthesize
- Generates refined search queries
5. **ReportAgent (Phase 8)**
- Structured scientific reports
- Executive summary, methodology
- Hypotheses tested with evidence counts
- Proper citations and limitations
6. **Gradio UI (Phase 4)**
- Chat interface for questions
- Real-time progress via events
- Mode toggle (Simple/Magentic)
- Formatted markdown output
---
## Design Patterns
### 1. Search-and-Judge Loop (Primary Pattern)
```python
def research(question: str) -> Report:
context = []
for iteration in range(max_iterations):
# SEARCH: Query relevant tools
results = search_tools(question, context)
context.extend(results)
# JUDGE: Evaluate quality
if judge.is_sufficient(question, context):
break
# REFINE: Adjust search strategy
query = refine_query(question, context)
# SYNTHESIZE: Generate report
return synthesize_report(question, context)
```
**Why This Pattern:**
- Simple to implement and debug
- Clear loop termination conditions
- Iterative improvement of search quality
- Balances depth vs speed
### 2. Multi-Tool Orchestration
```
Question → Agent decides which tools to use
┌───┴────┬─────────┬──────────┐
↓ ↓ ↓ ↓
PubMed Web Search Trials DB Drug DB
↓ ↓ ↓ ↓
└───┬────┴─────────┴──────────┘
Aggregate Results → Judge
```
**Why This Pattern:**
- Different sources provide different evidence types
- Parallel tool execution (when possible)
- Comprehensive coverage
### 3. LLM-as-Judge with Token Budget
**Dual Stopping Conditions:**
- **Smart Stop**: LLM judge says "we have sufficient evidence"
- **Hard Stop**: Token budget exhausted OR max iterations reached
**Why Both:**
- Judge enables early exit when answer is good
- Budget prevents runaway costs
- Iterations prevent infinite loops
### 4. Stateful Checkpointing
```
.deepresearch/
├── state/
│ └── query_123.json # Current research state
├── checkpoints/
│ └── query_123_iter3/ # Checkpoint at iteration 3
└── workspace/
└── query_123/ # Downloaded papers, data
```
**Why This Pattern:**
- Resume interrupted research
- Debugging and analysis
- Cost savings (don't re-search)
---
## Component Breakdown
### Agent (Orchestrator)
- **Responsibility**: Coordinate research process
- **Size**: ~100 lines
- **Key Methods**:
- `research(question)` - Main entry point
- `plan_search_strategy()` - Decide what to search
- `execute_search()` - Run tool queries
- `evaluate_progress()` - Call judge
- `synthesize_findings()` - Generate report
### Tools
- **Responsibility**: Interface with external data sources
- **Size**: ~50 lines per tool
- **Implementations**:
- `PubMedTool` - Search biomedical literature
- `WebSearchTool` - General medical information
- `ClinicalTrialsTool` - Trial data (optional)
- `DrugInfoTool` - FDA drug database (optional)
### Judge
- **Responsibility**: Evaluate evidence quality
- **Size**: ~50 lines
- **Key Methods**:
- `is_sufficient(question, evidence)` → bool
- `assess_quality(evidence)` → score
- `identify_gaps(question, evidence)` → missing_info
### Gradio App
- **Responsibility**: User interface
- **Size**: ~50 lines
- **Features**:
- Text input for questions
- Progress indicators
- Formatted output with citations
- Download research report
---
## Technical Stack
### Core Dependencies
```toml
[dependencies]
python = ">=3.10"
pydantic = "^2.7"
pydantic-ai = "^0.0.16"
fastmcp = "^0.1.0"
gradio = "^5.0"
beautifulsoup4 = "^4.12"
httpx = "^0.27"
```
### Optional Enhancements
- `modal` - For GPU-accelerated local LLM
- `fastmcp` - MCP server integration
- `sentence-transformers` - Semantic search
- `faiss-cpu` - Vector similarity
### Tool APIs & Rate Limits
| API | Cost | Rate Limit | API Key? | Notes |
|-----|------|------------|----------|-------|
| **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
| **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
| **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
| **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
| **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |
**Web Search Strategy (Priority Order):**
1. **Brave Search API** (free tier: 2000 queries/month) - Primary
2. **DuckDuckGo** (unofficial, no API key) - Fallback
3. **SerpAPI** ($50/month) - Only if free options fail
**Why NOT SerpAPI first?**
- Costs money (hackathon budget = $0)
- Free alternatives work fine for demo
- Can upgrade later if needed
---
## Success Criteria
### Phase 1-5 (MVP) ✅ COMPLETE
**Completed in ONE DAY:**
- [x] User can ask drug repurposing question
- [x] Agent searches PubMed (async)
- [x] Agent searches web (DuckDuckGo)
- [x] LLM judge evaluates evidence quality
- [x] System respects token budget and iterations
- [x] Output includes drug candidates + citations
- [x] Works end-to-end for demo query
- [x] Gradio UI with streaming progress
- [x] Magentic multi-agent orchestration
- [x] 38 unit tests passing
- [x] CI/CD pipeline green
### Hackathon Submission ✅ COMPLETE
- [x] Gradio UI deployed on HuggingFace Spaces
- [x] Example queries working and tested
- [x] Architecture documentation
- [x] README with setup instructions
### Phase 6-8 (Enhanced)
**Specs ready for implementation:**
- [ ] Embeddings & Semantic Search (Phase 6)
- [ ] Hypothesis Agent (Phase 7)
- [ ] Report Agent (Phase 8)
### What's EXPLICITLY Out of Scope
**NOT building (to stay focused):**
- ❌ User authentication
- ❌ Database storage of queries
- ❌ Multi-user support
- ❌ Payment/billing
- ❌ Production monitoring
- ❌ Mobile UI
---
## Implementation Timeline
### Day 1 (Today): Architecture & Setup
- [x] Define use case (drug repurposing) ✅
- [x] Write architecture docs ✅
- [ ] Create project structure
- [ ] First PR: Structure + Docs
### Day 2: Core Agent Loop
- [ ] Implement basic orchestrator
- [ ] Add PubMed search tool
- [ ] Simple judge (keyword-based)
- [ ] Test with 1 query
### Day 3: Intelligence Layer
- [ ] Upgrade to LLM judge
- [ ] Add web search tool
- [ ] Token budget tracking
- [ ] Test with multiple queries
### Day 4: UI & Integration
- [ ] Build Gradio interface
- [ ] Wire up agent to UI
- [ ] Add progress indicators
- [ ] Format output nicely
### Day 5: Polish & Extend
- [ ] Add more tools (clinical trials)
- [ ] Improve judge prompts
- [ ] Checkpoint system
- [ ] Error handling
### Day 6: Deploy & Document
- [ ] Deploy to HuggingFace Spaces
- [ ] Record demo video
- [ ] Write submission materials
- [ ] Final testing
---
## Questions This Document Answers
### For The Maintainer
**Q: "What should our design pattern be?"**
A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)
**Q: "Should we use LLM-as-judge or token budget?"**
A: Both - judge for smart stopping, budget for cost control
**Q: "What's the break pattern?"**
A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)
**Q: "What components do we need?"**
A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)
### For The Team
**Q: "What are we actually building?"**
A: Medical drug repurposing research agent (see Core Use Case)
**Q: "How complex should it be?"**
A: Simple but complete - ~300 lines of core code (see Component sizes)
**Q: "What's the timeline?"**
A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)
**Q: "What datasets/APIs do we use?"**
A: PubMed (free), web search, clinical trials.gov (see Tool APIs)
---
## Next Steps
1. **Review this document** - Team feedback on architecture
2. **Finalize design** - Incorporate feedback
3. **Create project structure** - Scaffold repository
4. **Move to proper docs** - `docs/architecture/` folder
5. **Open first PR** - Structure + Documentation
6. **Start implementation** - Day 2 onward
---
## Notes & Decisions
### Why Drug Repurposing?
- Clear, impressive use case
- Real-world medical impact
- Good data availability (PubMed, trials)
- Easy to explain (Viagra example!)
- Physician on team ✅
### Why Simple Architecture?
- 6-day timeline
- Need working end-to-end system
- Hackathon judges value "works" over "complex"
- Can extend later if successful
### Why These Tools First?
- PubMed: Best biomedical literature source
- Web search: General medical knowledge
- Clinical trials: Evidence of actual testing
- Others: Nice-to-have, not critical for MVP
---
---
## Appendix A: Demo Queries (Pre-tested)
These queries will be used for demo and testing. They're chosen because:
1. They have good PubMed coverage
2. They're medically interesting
3. They show the system's capabilities
### Primary Demo Query
```
"What existing drugs might help treat long COVID fatigue?"
```
**Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
**Expected sources**: 20+ PubMed papers, 2-3 clinical trials
### Secondary Demo Queries
```
"Find existing drugs that might slow Alzheimer's progression"
"What approved medications could help with fibromyalgia pain?"
"Which diabetes drugs show promise for cancer treatment?"
```
### Why These Queries?
- Represent real clinical needs
- Have substantial literature
- Show diverse drug classes
- Physician on team can validate results
---
## Appendix B: Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
| Web search API fails | Low | Medium | DuckDuckGo fallback |
| LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
| Judge quality poor | Medium | High | Pre-test prompts, iterate |
| HuggingFace deploy issues | Low | High | Test deployment Day 4 |
| Demo crashes live | Medium | High | Pre-recorded backup video |
---
---
**Document Status**: Official Architecture Spec
**Review Score**: 98/100
**Last Updated**: November 2025