Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /architecture /overview.md

VibecoderMcSwaggins

fix: address CodeRabbit Phase 5 review feedback

d247864 15 days ago

preview code

raw

history blame

13.8 kB

DeepCritical: Medical Drug Repurposing Research Agent

Project Overview

Executive Summary

DeepCritical is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.

The Problem We Solve

Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:

Search thousands of papers across multiple databases
Identify molecular mechanisms
Find relevant clinical trials
Assess safety profiles
Synthesize evidence into actionable insights

DeepCritical automates this process from hours to minutes.

What Is Drug Repurposing?

Simple Explanation: Using existing approved drugs to treat NEW diseases they weren't originally designed for.

Real Examples:

Viagra (sildenafil): Originally for heart disease → Now treats erectile dysfunction
Thalidomide: Once banned → Now treats multiple myeloma
Aspirin: Pain reliever → Heart attack prevention
Metformin: Diabetes drug → Being tested for aging/longevity

Why It Matters:

Faster than developing new drugs (years vs decades)
Cheaper (known safety profiles)
Lower risk (already FDA approved)
Immediate patient benefit potential

Core Use Case

Primary Query Type

"What existing drugs might help treat [disease/condition]?"

Example Queries

Long COVID Fatigue
- Query: "What existing drugs might help treat long COVID fatigue?"
- Agent searches: PubMed, clinical trials, drug databases
- Output: List of candidate drugs with mechanisms + evidence + citations
Alzheimer's Disease
- Query: "Find existing drugs that target beta-amyloid pathways"
- Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
- Output: Comprehensive research report with drug candidates
Rare Disease Treatment
- Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
- Agent finds: Similar conditions → Shared pathways → Potential treatments
- Output: Evidence-based treatment suggestions

System Architecture

High-Level Design (Phases 1-8)

User Query
    ↓
Gradio UI (Phase 4)
    ↓
Magentic Manager (Phase 5) ← LLM-powered coordinator
    ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
    ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
    ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
    └── ReportAgent (Phase 8) ←→ Final Synthesis
    ↓
Structured Research Report

Key Components

Magentic Manager (Orchestrator)
- LLM-powered multi-agent coordinator
- Dynamic planning and agent selection
- Built-in stall detection and replanning
- Microsoft Agent Framework integration
SearchAgent (Phase 2+5+6)
- PubMed E-utilities search
- DuckDuckGo web search
- Semantic search via ChromaDB (Phase 6)
- Evidence deduplication
HypothesisAgent (Phase 7)
- Generates Drug → Target → Pathway → Effect hypotheses
- Guides targeted searches
- Scientific reasoning about mechanisms
JudgeAgent (Phase 3+5)
- LLM-based evidence assessment
- Mechanism score + Clinical score
- Recommends continue/synthesize
- Generates refined search queries
ReportAgent (Phase 8)
- Structured scientific reports
- Executive summary, methodology
- Hypotheses tested with evidence counts
- Proper citations and limitations
Gradio UI (Phase 4)
- Chat interface for questions
- Real-time progress via events
- Mode toggle (Simple/Magentic)
- Formatted markdown output

Design Patterns

1. Search-and-Judge Loop (Primary Pattern)

def research(question: str) -> Report:
    context = []
    for iteration in range(max_iterations):
        # SEARCH: Query relevant tools
        results = search_tools(question, context)
        context.extend(results)

        # JUDGE: Evaluate quality
        if judge.is_sufficient(question, context):
            break

        # REFINE: Adjust search strategy
        query = refine_query(question, context)

    # SYNTHESIZE: Generate report
    return synthesize_report(question, context)

Why This Pattern:

Simple to implement and debug
Clear loop termination conditions
Iterative improvement of search quality
Balances depth vs speed

2. Multi-Tool Orchestration

Question → Agent decides which tools to use
           ↓
       ┌───┴────┬─────────┬──────────┐
       ↓        ↓         ↓          ↓
   PubMed  Web Search  Trials DB  Drug DB
       ↓        ↓         ↓          ↓
       └───┬────┴─────────┴──────────┘
           ↓
    Aggregate Results → Judge

Why This Pattern:

Different sources provide different evidence types
Parallel tool execution (when possible)
Comprehensive coverage

3. LLM-as-Judge with Token Budget

Dual Stopping Conditions:

Smart Stop: LLM judge says "we have sufficient evidence"
Hard Stop: Token budget exhausted OR max iterations reached

Why Both:

Judge enables early exit when answer is good
Budget prevents runaway costs
Iterations prevent infinite loops

4. Stateful Checkpointing

.deepresearch/
├── state/
│   └── query_123.json    # Current research state
├── checkpoints/
│   └── query_123_iter3/  # Checkpoint at iteration 3
└── workspace/
    └── query_123/        # Downloaded papers, data

Why This Pattern:

Resume interrupted research
Debugging and analysis
Cost savings (don't re-search)

Component Breakdown

Agent (Orchestrator)

Responsibility: Coordinate research process
Size: ~100 lines
Key Methods:
- research(question) - Main entry point
- plan_search_strategy() - Decide what to search
- execute_search() - Run tool queries
- evaluate_progress() - Call judge
- synthesize_findings() - Generate report

Tools

Responsibility: Interface with external data sources
Size: ~50 lines per tool
Implementations:
- PubMedTool - Search biomedical literature
- WebSearchTool - General medical information
- ClinicalTrialsTool - Trial data (optional)
- DrugInfoTool - FDA drug database (optional)

Judge

Responsibility: Evaluate evidence quality
Size: ~50 lines
Key Methods:
- is_sufficient(question, evidence) → bool
- assess_quality(evidence) → score
- identify_gaps(question, evidence) → missing_info

Gradio App

Responsibility: User interface
Size: ~50 lines
Features:
- Text input for questions
- Progress indicators
- Formatted output with citations
- Download research report

Technical Stack

Core Dependencies

[dependencies]
python = ">=3.10"
pydantic = "^2.7"
pydantic-ai = "^0.0.16"
fastmcp = "^0.1.0"
gradio = "^5.0"
beautifulsoup4 = "^4.12"
httpx = "^0.27"

Optional Enhancements

modal - For GPU-accelerated local LLM
fastmcp - MCP server integration
sentence-transformers - Semantic search
faiss-cpu - Vector similarity

Tool APIs & Rate Limits

API	Cost	Rate Limit	API Key?	Notes
PubMed E-utilities	Free	3/sec (no key), 10/sec (with key)	Optional	Register at NCBI for higher limits
Brave Search API	Free tier	2000/month free	Required	Primary web search
DuckDuckGo	Free	Unofficial, ~1/sec	No	Fallback web search
ClinicalTrials.gov	Free	100/min	No	Stretch goal
OpenFDA	Free	240/min (no key), 120K/day (with key)	Optional	Drug info

Web Search Strategy (Priority Order):

Brave Search API (free tier: 2000 queries/month) - Primary
DuckDuckGo (unofficial, no API key) - Fallback
SerpAPI ($50/month) - Only if free options fail

Why NOT SerpAPI first?

Costs money (hackathon budget = $0)
Free alternatives work fine for demo
Can upgrade later if needed

Success Criteria

Phase 1-5 (MVP) ✅ COMPLETE

Completed in ONE DAY:

User can ask drug repurposing question
Agent searches PubMed (async)
Agent searches web (DuckDuckGo)
LLM judge evaluates evidence quality
System respects token budget and iterations
Output includes drug candidates + citations
Works end-to-end for demo query
Gradio UI with streaming progress
Magentic multi-agent orchestration
38 unit tests passing
CI/CD pipeline green

Hackathon Submission ✅ COMPLETE

Gradio UI deployed on HuggingFace Spaces
Example queries working and tested
Architecture documentation
README with setup instructions

Phase 6-8 (Enhanced)

Specs ready for implementation:

Embeddings & Semantic Search (Phase 6)
Hypothesis Agent (Phase 7)
Report Agent (Phase 8)

What's EXPLICITLY Out of Scope

NOT building (to stay focused):

❌ User authentication
❌ Database storage of queries
❌ Multi-user support
❌ Payment/billing
❌ Production monitoring
❌ Mobile UI

Implementation Timeline

Day 1 (Today): Architecture & Setup

Define use case (drug repurposing) ✅
Write architecture docs ✅
Create project structure
First PR: Structure + Docs

Day 2: Core Agent Loop

Implement basic orchestrator
Add PubMed search tool
Simple judge (keyword-based)
Test with 1 query

Day 3: Intelligence Layer

Upgrade to LLM judge
Add web search tool
Token budget tracking
Test with multiple queries

Day 4: UI & Integration

Build Gradio interface
Wire up agent to UI
Add progress indicators
Format output nicely

Day 5: Polish & Extend

Add more tools (clinical trials)
Improve judge prompts
Checkpoint system
Error handling

Day 6: Deploy & Document

Deploy to HuggingFace Spaces
Record demo video
Write submission materials
Final testing

Questions This Document Answers

For The Maintainer

Q: "What should our design pattern be?" A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)

Q: "Should we use LLM-as-judge or token budget?" A: Both - judge for smart stopping, budget for cost control

Q: "What's the break pattern?" A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)

Q: "What components do we need?" A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)

For The Team

Q: "What are we actually building?" A: Medical drug repurposing research agent (see Core Use Case)

Q: "How complex should it be?" A: Simple but complete - ~300 lines of core code (see Component sizes)

Q: "What's the timeline?" A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)

Q: "What datasets/APIs do we use?" A: PubMed (free), web search, clinical trials.gov (see Tool APIs)

Next Steps

Review this document - Team feedback on architecture
Finalize design - Incorporate feedback
Create project structure - Scaffold repository
Move to proper docs - docs/architecture/ folder
Open first PR - Structure + Documentation
Start implementation - Day 2 onward

Notes & Decisions

Why Drug Repurposing?

Clear, impressive use case
Real-world medical impact
Good data availability (PubMed, trials)
Easy to explain (Viagra example!)
Physician on team ✅

Why Simple Architecture?

6-day timeline
Need working end-to-end system
Hackathon judges value "works" over "complex"
Can extend later if successful

Why These Tools First?

PubMed: Best biomedical literature source
Web search: General medical knowledge
Clinical trials: Evidence of actual testing
Others: Nice-to-have, not critical for MVP

Appendix A: Demo Queries (Pre-tested)

These queries will be used for demo and testing. They're chosen because:

They have good PubMed coverage
They're medically interesting
They show the system's capabilities

Primary Demo Query

"What existing drugs might help treat long COVID fatigue?"

Expected candidates: CoQ10, Low-dose Naltrexone, Modafinil Expected sources: 20+ PubMed papers, 2-3 clinical trials

Secondary Demo Queries

"Find existing drugs that might slow Alzheimer's progression"
"What approved medications could help with fibromyalgia pain?"
"Which diabetes drugs show promise for cancer treatment?"

Why These Queries?

Represent real clinical needs
Have substantial literature
Show diverse drug classes
Physician on team can validate results

Appendix B: Risk Assessment

Risk	Likelihood	Impact	Mitigation
PubMed rate limiting	Medium	High	Implement caching, respect 3/sec
Web search API fails	Low	Medium	DuckDuckGo fallback
LLM costs exceed budget	Medium	Medium	Hard token cap at 50K
Judge quality poor	Medium	High	Pre-test prompts, iterate
HuggingFace deploy issues	Low	High	Test deployment Day 4
Demo crashes live	Medium	High	Pre-recorded backup video

Document Status: Official Architecture Spec Review Score: 98/100 Last Updated: November 2025