Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /architecture /overview.md

VibecoderMcSwaggins

fix: address CodeRabbit Phase 5 review feedback

d247864 17 days ago

preview code

raw

history blame

13.8 kB

	# DeepCritical: Medical Drug Repurposing Research Agent
	## Project Overview

	---

	## Executive Summary

	DeepCritical is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.

	### The Problem We Solve

	Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
	- Search thousands of papers across multiple databases
	- Identify molecular mechanisms
	- Find relevant clinical trials
	- Assess safety profiles
	- Synthesize evidence into actionable insights

	DeepCritical automates this process from hours to minutes.

	### What Is Drug Repurposing?

	Simple Explanation:
	Using existing approved drugs to treat NEW diseases they weren't originally designed for.

	Real Examples:
	- Viagra (sildenafil): Originally for heart disease → Now treats erectile dysfunction
	- Thalidomide: Once banned → Now treats multiple myeloma
	- Aspirin: Pain reliever → Heart attack prevention
	- Metformin: Diabetes drug → Being tested for aging/longevity

	Why It Matters:
	- Faster than developing new drugs (years vs decades)
	- Cheaper (known safety profiles)
	- Lower risk (already FDA approved)
	- Immediate patient benefit potential

	---

	## Core Use Case

	### Primary Query Type
	> "What existing drugs might help treat [disease/condition]?"

	### Example Queries

	1. Long COVID Fatigue
	- Query: "What existing drugs might help treat long COVID fatigue?"
	- Agent searches: PubMed, clinical trials, drug databases
	- Output: List of candidate drugs with mechanisms + evidence + citations

	2. Alzheimer's Disease
	- Query: "Find existing drugs that target beta-amyloid pathways"
	- Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
	- Output: Comprehensive research report with drug candidates

	3. Rare Disease Treatment
	- Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
	- Agent finds: Similar conditions → Shared pathways → Potential treatments
	- Output: Evidence-based treatment suggestions

	---

	## System Architecture

	### High-Level Design (Phases 1-8)

	```text
	User Query
	↓
	Gradio UI (Phase 4)
	↓
	Magentic Manager (Phase 5) ← LLM-powered coordinator
	├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
	├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
	├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
	└── ReportAgent (Phase 8) ←→ Final Synthesis
	↓
	Structured Research Report
	```

	### Key Components

	1. Magentic Manager (Orchestrator)
	- LLM-powered multi-agent coordinator
	- Dynamic planning and agent selection
	- Built-in stall detection and replanning
	- Microsoft Agent Framework integration

	2. SearchAgent (Phase 2+5+6)
	- PubMed E-utilities search
	- DuckDuckGo web search
	- Semantic search via ChromaDB (Phase 6)
	- Evidence deduplication

	3. HypothesisAgent (Phase 7)
	- Generates Drug → Target → Pathway → Effect hypotheses
	- Guides targeted searches
	- Scientific reasoning about mechanisms

	4. JudgeAgent (Phase 3+5)
	- LLM-based evidence assessment
	- Mechanism score + Clinical score
	- Recommends continue/synthesize
	- Generates refined search queries

	5. ReportAgent (Phase 8)
	- Structured scientific reports
	- Executive summary, methodology
	- Hypotheses tested with evidence counts
	- Proper citations and limitations

	6. Gradio UI (Phase 4)
	- Chat interface for questions
	- Real-time progress via events
	- Mode toggle (Simple/Magentic)
	- Formatted markdown output

	---

	## Design Patterns

	### 1. Search-and-Judge Loop (Primary Pattern)

	```python
	def research(question: str) -> Report:
	context = []
	for iteration in range(max_iterations):
	# SEARCH: Query relevant tools
	results = search_tools(question, context)
	context.extend(results)

	# JUDGE: Evaluate quality
	if judge.is_sufficient(question, context):
	break

	# REFINE: Adjust search strategy
	query = refine_query(question, context)

	# SYNTHESIZE: Generate report
	return synthesize_report(question, context)
	```

	Why This Pattern:
	- Simple to implement and debug
	- Clear loop termination conditions
	- Iterative improvement of search quality
	- Balances depth vs speed

	### 2. Multi-Tool Orchestration

	```
	Question → Agent decides which tools to use
	↓
	┌───┴────┬─────────┬──────────┐
	↓ ↓ ↓ ↓
	PubMed Web Search Trials DB Drug DB
	↓ ↓ ↓ ↓
	└───┬────┴─────────┴──────────┘
	↓
	Aggregate Results → Judge
	```

	Why This Pattern:
	- Different sources provide different evidence types
	- Parallel tool execution (when possible)
	- Comprehensive coverage

	### 3. LLM-as-Judge with Token Budget

	Dual Stopping Conditions:
	- Smart Stop: LLM judge says "we have sufficient evidence"
	- Hard Stop: Token budget exhausted OR max iterations reached

	Why Both:
	- Judge enables early exit when answer is good
	- Budget prevents runaway costs
	- Iterations prevent infinite loops

	### 4. Stateful Checkpointing

	```
	.deepresearch/
	├── state/
	│ └── query_123.json # Current research state
	├── checkpoints/
	│ └── query_123_iter3/ # Checkpoint at iteration 3
	└── workspace/
	└── query_123/ # Downloaded papers, data
	```

	Why This Pattern:
	- Resume interrupted research
	- Debugging and analysis
	- Cost savings (don't re-search)

	---

	## Component Breakdown

	### Agent (Orchestrator)
	- Responsibility: Coordinate research process
	- Size: ~100 lines
	- Key Methods:
	- `research(question)` - Main entry point
	- `plan_search_strategy()` - Decide what to search
	- `execute_search()` - Run tool queries
	- `evaluate_progress()` - Call judge
	- `synthesize_findings()` - Generate report

	### Tools
	- Responsibility: Interface with external data sources
	- Size: ~50 lines per tool
	- Implementations:
	- `PubMedTool` - Search biomedical literature
	- `WebSearchTool` - General medical information
	- `ClinicalTrialsTool` - Trial data (optional)
	- `DrugInfoTool` - FDA drug database (optional)

	### Judge
	- Responsibility: Evaluate evidence quality
	- Size: ~50 lines
	- Key Methods:
	- `is_sufficient(question, evidence)` → bool
	- `assess_quality(evidence)` → score
	- `identify_gaps(question, evidence)` → missing_info

	### Gradio App
	- Responsibility: User interface
	- Size: ~50 lines
	- Features:
	- Text input for questions
	- Progress indicators
	- Formatted output with citations
	- Download research report

	---

	## Technical Stack

	### Core Dependencies
	```toml
	[dependencies]
	python = ">=3.10"
	pydantic = "^2.7"
	pydantic-ai = "^0.0.16"
	fastmcp = "^0.1.0"
	gradio = "^5.0"
	beautifulsoup4 = "^4.12"
	httpx = "^0.27"
	```

	### Optional Enhancements
	- `modal` - For GPU-accelerated local LLM
	- `fastmcp` - MCP server integration
	- `sentence-transformers` - Semantic search
	- `faiss-cpu` - Vector similarity

	### Tool APIs & Rate Limits

	\| API \| Cost \| Rate Limit \| API Key? \| Notes \|
	\|-----\|------\|------------\|----------\|-------\|
	\| PubMed E-utilities \| Free \| 3/sec (no key), 10/sec (with key) \| Optional \| Register at NCBI for higher limits \|
	\| Brave Search API \| Free tier \| 2000/month free \| Required \| Primary web search \|
	\| DuckDuckGo \| Free \| Unofficial, ~1/sec \| No \| Fallback web search \|
	\| ClinicalTrials.gov \| Free \| 100/min \| No \| Stretch goal \|
	\| OpenFDA \| Free \| 240/min (no key), 120K/day (with key) \| Optional \| Drug info \|

	Web Search Strategy (Priority Order):
	1. Brave Search API (free tier: 2000 queries/month) - Primary
	2. DuckDuckGo (unofficial, no API key) - Fallback
	3. SerpAPI ($50/month) - Only if free options fail

	Why NOT SerpAPI first?
	- Costs money (hackathon budget = $0)
	- Free alternatives work fine for demo
	- Can upgrade later if needed

	---

	## Success Criteria

	### Phase 1-5 (MVP) ✅ COMPLETE
	Completed in ONE DAY:
	- [x] User can ask drug repurposing question
	- [x] Agent searches PubMed (async)
	- [x] Agent searches web (DuckDuckGo)
	- [x] LLM judge evaluates evidence quality
	- [x] System respects token budget and iterations
	- [x] Output includes drug candidates + citations
	- [x] Works end-to-end for demo query
	- [x] Gradio UI with streaming progress
	- [x] Magentic multi-agent orchestration
	- [x] 38 unit tests passing
	- [x] CI/CD pipeline green

	### Hackathon Submission ✅ COMPLETE
	- [x] Gradio UI deployed on HuggingFace Spaces
	- [x] Example queries working and tested
	- [x] Architecture documentation
	- [x] README with setup instructions

	### Phase 6-8 (Enhanced)
	Specs ready for implementation:
	- [ ] Embeddings & Semantic Search (Phase 6)
	- [ ] Hypothesis Agent (Phase 7)
	- [ ] Report Agent (Phase 8)

	### What's EXPLICITLY Out of Scope
	NOT building (to stay focused):
	- ❌ User authentication
	- ❌ Database storage of queries
	- ❌ Multi-user support
	- ❌ Payment/billing
	- ❌ Production monitoring
	- ❌ Mobile UI

	---

	## Implementation Timeline

	### Day 1 (Today): Architecture & Setup
	- [x] Define use case (drug repurposing) ✅
	- [x] Write architecture docs ✅
	- [ ] Create project structure
	- [ ] First PR: Structure + Docs

	### Day 2: Core Agent Loop
	- [ ] Implement basic orchestrator
	- [ ] Add PubMed search tool
	- [ ] Simple judge (keyword-based)
	- [ ] Test with 1 query

	### Day 3: Intelligence Layer
	- [ ] Upgrade to LLM judge
	- [ ] Add web search tool
	- [ ] Token budget tracking
	- [ ] Test with multiple queries

	### Day 4: UI & Integration
	- [ ] Build Gradio interface
	- [ ] Wire up agent to UI
	- [ ] Add progress indicators
	- [ ] Format output nicely

	### Day 5: Polish & Extend
	- [ ] Add more tools (clinical trials)
	- [ ] Improve judge prompts
	- [ ] Checkpoint system
	- [ ] Error handling

	### Day 6: Deploy & Document
	- [ ] Deploy to HuggingFace Spaces
	- [ ] Record demo video
	- [ ] Write submission materials
	- [ ] Final testing

	---

	## Questions This Document Answers

	### For The Maintainer

	Q: "What should our design pattern be?"
	A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)

	Q: "Should we use LLM-as-judge or token budget?"
	A: Both - judge for smart stopping, budget for cost control

	Q: "What's the break pattern?"
	A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)

	Q: "What components do we need?"
	A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)

	### For The Team

	Q: "What are we actually building?"
	A: Medical drug repurposing research agent (see Core Use Case)

	Q: "How complex should it be?"
	A: Simple but complete - ~300 lines of core code (see Component sizes)

	Q: "What's the timeline?"
	A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)

	Q: "What datasets/APIs do we use?"
	A: PubMed (free), web search, clinical trials.gov (see Tool APIs)

	---

	## Next Steps

	1. Review this document - Team feedback on architecture
	2. Finalize design - Incorporate feedback
	3. Create project structure - Scaffold repository
	4. Move to proper docs - `docs/architecture/` folder
	5. Open first PR - Structure + Documentation
	6. Start implementation - Day 2 onward

	---

	## Notes & Decisions

	### Why Drug Repurposing?
	- Clear, impressive use case
	- Real-world medical impact
	- Good data availability (PubMed, trials)
	- Easy to explain (Viagra example!)
	- Physician on team ✅

	### Why Simple Architecture?
	- 6-day timeline
	- Need working end-to-end system
	- Hackathon judges value "works" over "complex"
	- Can extend later if successful

	### Why These Tools First?
	- PubMed: Best biomedical literature source
	- Web search: General medical knowledge
	- Clinical trials: Evidence of actual testing
	- Others: Nice-to-have, not critical for MVP

	---

	---

	## Appendix A: Demo Queries (Pre-tested)

	These queries will be used for demo and testing. They're chosen because:
	1. They have good PubMed coverage
	2. They're medically interesting
	3. They show the system's capabilities

	### Primary Demo Query
	```
	"What existing drugs might help treat long COVID fatigue?"
	```
	Expected candidates: CoQ10, Low-dose Naltrexone, Modafinil
	Expected sources: 20+ PubMed papers, 2-3 clinical trials

	### Secondary Demo Queries
	```
	"Find existing drugs that might slow Alzheimer's progression"
	"What approved medications could help with fibromyalgia pain?"
	"Which diabetes drugs show promise for cancer treatment?"
	```

	### Why These Queries?
	- Represent real clinical needs
	- Have substantial literature
	- Show diverse drug classes
	- Physician on team can validate results

	---

	## Appendix B: Risk Assessment

	\| Risk \| Likelihood \| Impact \| Mitigation \|
	\|------\|------------\|--------\|------------\|
	\| PubMed rate limiting \| Medium \| High \| Implement caching, respect 3/sec \|
	\| Web search API fails \| Low \| Medium \| DuckDuckGo fallback \|
	\| LLM costs exceed budget \| Medium \| Medium \| Hard token cap at 50K \|
	\| Judge quality poor \| Medium \| High \| Pre-test prompts, iterate \|
	\| HuggingFace deploy issues \| Low \| High \| Test deployment Day 4 \|
	\| Demo crashes live \| Medium \| High \| Pre-recorded backup video \|

	---

	---

	Document Status: Official Architecture Spec
	Review Score: 98/100
	Last Updated: November 2025