VibecoderMcSwaggins's picture
fix: address CodeRabbit Phase 5 review feedback
d247864
|
raw
history blame
13.8 kB

DeepCritical: Medical Drug Repurposing Research Agent

Project Overview


Executive Summary

DeepCritical is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.

The Problem We Solve

Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:

  • Search thousands of papers across multiple databases
  • Identify molecular mechanisms
  • Find relevant clinical trials
  • Assess safety profiles
  • Synthesize evidence into actionable insights

DeepCritical automates this process from hours to minutes.

What Is Drug Repurposing?

Simple Explanation: Using existing approved drugs to treat NEW diseases they weren't originally designed for.

Real Examples:

  • Viagra (sildenafil): Originally for heart disease β†’ Now treats erectile dysfunction
  • Thalidomide: Once banned β†’ Now treats multiple myeloma
  • Aspirin: Pain reliever β†’ Heart attack prevention
  • Metformin: Diabetes drug β†’ Being tested for aging/longevity

Why It Matters:

  • Faster than developing new drugs (years vs decades)
  • Cheaper (known safety profiles)
  • Lower risk (already FDA approved)
  • Immediate patient benefit potential

Core Use Case

Primary Query Type

"What existing drugs might help treat [disease/condition]?"

Example Queries

  1. Long COVID Fatigue

    • Query: "What existing drugs might help treat long COVID fatigue?"
    • Agent searches: PubMed, clinical trials, drug databases
    • Output: List of candidate drugs with mechanisms + evidence + citations
  2. Alzheimer's Disease

    • Query: "Find existing drugs that target beta-amyloid pathways"
    • Agent identifies: Disease mechanisms β†’ Drug candidates β†’ Clinical evidence
    • Output: Comprehensive research report with drug candidates
  3. Rare Disease Treatment

    • Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
    • Agent finds: Similar conditions β†’ Shared pathways β†’ Potential treatments
    • Output: Evidence-based treatment suggestions

System Architecture

High-Level Design (Phases 1-8)

User Query
    ↓
Gradio UI (Phase 4)
    ↓
Magentic Manager (Phase 5) ← LLM-powered coordinator
    β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
    β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
    β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
    └── ReportAgent (Phase 8) ←→ Final Synthesis
    ↓
Structured Research Report

Key Components

  1. Magentic Manager (Orchestrator)

    • LLM-powered multi-agent coordinator
    • Dynamic planning and agent selection
    • Built-in stall detection and replanning
    • Microsoft Agent Framework integration
  2. SearchAgent (Phase 2+5+6)

    • PubMed E-utilities search
    • DuckDuckGo web search
    • Semantic search via ChromaDB (Phase 6)
    • Evidence deduplication
  3. HypothesisAgent (Phase 7)

    • Generates Drug β†’ Target β†’ Pathway β†’ Effect hypotheses
    • Guides targeted searches
    • Scientific reasoning about mechanisms
  4. JudgeAgent (Phase 3+5)

    • LLM-based evidence assessment
    • Mechanism score + Clinical score
    • Recommends continue/synthesize
    • Generates refined search queries
  5. ReportAgent (Phase 8)

    • Structured scientific reports
    • Executive summary, methodology
    • Hypotheses tested with evidence counts
    • Proper citations and limitations
  6. Gradio UI (Phase 4)

    • Chat interface for questions
    • Real-time progress via events
    • Mode toggle (Simple/Magentic)
    • Formatted markdown output

Design Patterns

1. Search-and-Judge Loop (Primary Pattern)

def research(question: str) -> Report:
    context = []
    for iteration in range(max_iterations):
        # SEARCH: Query relevant tools
        results = search_tools(question, context)
        context.extend(results)

        # JUDGE: Evaluate quality
        if judge.is_sufficient(question, context):
            break

        # REFINE: Adjust search strategy
        query = refine_query(question, context)

    # SYNTHESIZE: Generate report
    return synthesize_report(question, context)

Why This Pattern:

  • Simple to implement and debug
  • Clear loop termination conditions
  • Iterative improvement of search quality
  • Balances depth vs speed

2. Multi-Tool Orchestration

Question β†’ Agent decides which tools to use
           ↓
       β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       ↓        ↓         ↓          ↓
   PubMed  Web Search  Trials DB  Drug DB
       ↓        ↓         ↓          ↓
       β””β”€β”€β”€β”¬β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
    Aggregate Results β†’ Judge

Why This Pattern:

  • Different sources provide different evidence types
  • Parallel tool execution (when possible)
  • Comprehensive coverage

3. LLM-as-Judge with Token Budget

Dual Stopping Conditions:

  • Smart Stop: LLM judge says "we have sufficient evidence"
  • Hard Stop: Token budget exhausted OR max iterations reached

Why Both:

  • Judge enables early exit when answer is good
  • Budget prevents runaway costs
  • Iterations prevent infinite loops

4. Stateful Checkpointing

.deepresearch/
β”œβ”€β”€ state/
β”‚   └── query_123.json    # Current research state
β”œβ”€β”€ checkpoints/
β”‚   └── query_123_iter3/  # Checkpoint at iteration 3
└── workspace/
    └── query_123/        # Downloaded papers, data

Why This Pattern:

  • Resume interrupted research
  • Debugging and analysis
  • Cost savings (don't re-search)

Component Breakdown

Agent (Orchestrator)

  • Responsibility: Coordinate research process
  • Size: ~100 lines
  • Key Methods:
    • research(question) - Main entry point
    • plan_search_strategy() - Decide what to search
    • execute_search() - Run tool queries
    • evaluate_progress() - Call judge
    • synthesize_findings() - Generate report

Tools

  • Responsibility: Interface with external data sources
  • Size: ~50 lines per tool
  • Implementations:
    • PubMedTool - Search biomedical literature
    • WebSearchTool - General medical information
    • ClinicalTrialsTool - Trial data (optional)
    • DrugInfoTool - FDA drug database (optional)

Judge

  • Responsibility: Evaluate evidence quality
  • Size: ~50 lines
  • Key Methods:
    • is_sufficient(question, evidence) β†’ bool
    • assess_quality(evidence) β†’ score
    • identify_gaps(question, evidence) β†’ missing_info

Gradio App

  • Responsibility: User interface
  • Size: ~50 lines
  • Features:
    • Text input for questions
    • Progress indicators
    • Formatted output with citations
    • Download research report

Technical Stack

Core Dependencies

[dependencies]
python = ">=3.10"
pydantic = "^2.7"
pydantic-ai = "^0.0.16"
fastmcp = "^0.1.0"
gradio = "^5.0"
beautifulsoup4 = "^4.12"
httpx = "^0.27"

Optional Enhancements

  • modal - For GPU-accelerated local LLM
  • fastmcp - MCP server integration
  • sentence-transformers - Semantic search
  • faiss-cpu - Vector similarity

Tool APIs & Rate Limits

API Cost Rate Limit API Key? Notes
PubMed E-utilities Free 3/sec (no key), 10/sec (with key) Optional Register at NCBI for higher limits
Brave Search API Free tier 2000/month free Required Primary web search
DuckDuckGo Free Unofficial, ~1/sec No Fallback web search
ClinicalTrials.gov Free 100/min No Stretch goal
OpenFDA Free 240/min (no key), 120K/day (with key) Optional Drug info

Web Search Strategy (Priority Order):

  1. Brave Search API (free tier: 2000 queries/month) - Primary
  2. DuckDuckGo (unofficial, no API key) - Fallback
  3. SerpAPI ($50/month) - Only if free options fail

Why NOT SerpAPI first?

  • Costs money (hackathon budget = $0)
  • Free alternatives work fine for demo
  • Can upgrade later if needed

Success Criteria

Phase 1-5 (MVP) βœ… COMPLETE

Completed in ONE DAY:

  • User can ask drug repurposing question
  • Agent searches PubMed (async)
  • Agent searches web (DuckDuckGo)
  • LLM judge evaluates evidence quality
  • System respects token budget and iterations
  • Output includes drug candidates + citations
  • Works end-to-end for demo query
  • Gradio UI with streaming progress
  • Magentic multi-agent orchestration
  • 38 unit tests passing
  • CI/CD pipeline green

Hackathon Submission βœ… COMPLETE

  • Gradio UI deployed on HuggingFace Spaces
  • Example queries working and tested
  • Architecture documentation
  • README with setup instructions

Phase 6-8 (Enhanced)

Specs ready for implementation:

  • Embeddings & Semantic Search (Phase 6)
  • Hypothesis Agent (Phase 7)
  • Report Agent (Phase 8)

What's EXPLICITLY Out of Scope

NOT building (to stay focused):

  • ❌ User authentication
  • ❌ Database storage of queries
  • ❌ Multi-user support
  • ❌ Payment/billing
  • ❌ Production monitoring
  • ❌ Mobile UI

Implementation Timeline

Day 1 (Today): Architecture & Setup

  • Define use case (drug repurposing) βœ…
  • Write architecture docs βœ…
  • Create project structure
  • First PR: Structure + Docs

Day 2: Core Agent Loop

  • Implement basic orchestrator
  • Add PubMed search tool
  • Simple judge (keyword-based)
  • Test with 1 query

Day 3: Intelligence Layer

  • Upgrade to LLM judge
  • Add web search tool
  • Token budget tracking
  • Test with multiple queries

Day 4: UI & Integration

  • Build Gradio interface
  • Wire up agent to UI
  • Add progress indicators
  • Format output nicely

Day 5: Polish & Extend

  • Add more tools (clinical trials)
  • Improve judge prompts
  • Checkpoint system
  • Error handling

Day 6: Deploy & Document

  • Deploy to HuggingFace Spaces
  • Record demo video
  • Write submission materials
  • Final testing

Questions This Document Answers

For The Maintainer

Q: "What should our design pattern be?" A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)

Q: "Should we use LLM-as-judge or token budget?" A: Both - judge for smart stopping, budget for cost control

Q: "What's the break pattern?" A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)

Q: "What components do we need?" A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)

For The Team

Q: "What are we actually building?" A: Medical drug repurposing research agent (see Core Use Case)

Q: "How complex should it be?" A: Simple but complete - ~300 lines of core code (see Component sizes)

Q: "What's the timeline?" A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)

Q: "What datasets/APIs do we use?" A: PubMed (free), web search, clinical trials.gov (see Tool APIs)


Next Steps

  1. Review this document - Team feedback on architecture
  2. Finalize design - Incorporate feedback
  3. Create project structure - Scaffold repository
  4. Move to proper docs - docs/architecture/ folder
  5. Open first PR - Structure + Documentation
  6. Start implementation - Day 2 onward

Notes & Decisions

Why Drug Repurposing?

  • Clear, impressive use case
  • Real-world medical impact
  • Good data availability (PubMed, trials)
  • Easy to explain (Viagra example!)
  • Physician on team βœ…

Why Simple Architecture?

  • 6-day timeline
  • Need working end-to-end system
  • Hackathon judges value "works" over "complex"
  • Can extend later if successful

Why These Tools First?

  • PubMed: Best biomedical literature source
  • Web search: General medical knowledge
  • Clinical trials: Evidence of actual testing
  • Others: Nice-to-have, not critical for MVP


Appendix A: Demo Queries (Pre-tested)

These queries will be used for demo and testing. They're chosen because:

  1. They have good PubMed coverage
  2. They're medically interesting
  3. They show the system's capabilities

Primary Demo Query

"What existing drugs might help treat long COVID fatigue?"

Expected candidates: CoQ10, Low-dose Naltrexone, Modafinil Expected sources: 20+ PubMed papers, 2-3 clinical trials

Secondary Demo Queries

"Find existing drugs that might slow Alzheimer's progression"
"What approved medications could help with fibromyalgia pain?"
"Which diabetes drugs show promise for cancer treatment?"

Why These Queries?

  • Represent real clinical needs
  • Have substantial literature
  • Show diverse drug classes
  • Physician on team can validate results

Appendix B: Risk Assessment

Risk Likelihood Impact Mitigation
PubMed rate limiting Medium High Implement caching, respect 3/sec
Web search API fails Low Medium DuckDuckGo fallback
LLM costs exceed budget Medium Medium Hard token cap at 50K
Judge quality poor Medium High Pre-test prompts, iterate
HuggingFace deploy issues Low High Test deployment Day 4
Demo crashes live Medium High Pre-recorded backup video


Document Status: Official Architecture Spec Review Score: 98/100 Last Updated: November 2025