Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

VibecoderMcSwaggins commited on 11 days ago

Commit

388cd05

2 Parent(s): f1e4e5b d01e70d

Merge pull request #13 from The-Obstacle-Is-The-Way/dev

Browse files

feat(examples): Phase 6-8 demos for full stack demonstration

Files changed (6) hide show

examples/README.md +168 -13
examples/embeddings_demo/run_embeddings.py +210 -0
examples/full_stack_demo/run_full.py +235 -0
examples/hypothesis_demo/run_hypothesis.py +139 -0
examples/orchestrator_demo/run_agent.py +63 -33
src/prompts/report.py +26 -0

examples/README.md CHANGED Viewed

@@ -1,28 +1,183 @@
-# Examples
-Demo scripts for DeepCritical functionality.
-## 1. Search Demo (Phase 2)
-Demonstrates parallel search across PubMed and Web. **No API keys required.**
 ```bash
 uv run python examples/search_demo/run_search.py "metformin cancer"
 ```
-## 2. Agent Demo (Phase 4)
-Demonstrates the full search-judge-synthesize loop.
-**Option A: Mock Mode (No Keys)**
-Test the logic/mechanics without an LLM.
 ```bash
-uv run python examples/orchestrator_demo/run_agent.py "metformin cancer" --mock
 ```
-**Option B: Real Mode (Requires Keys)**
-Uses the real LLM Judge to evaluate evidence.
-Requires `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` in `.env`.
 ```bash
 uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
-```

+# DeepCritical Examples
+**NO MOCKS. NO FAKE DATA. REAL SCIENCE.**
+These demos run the REAL drug repurposing research pipeline with actual API calls.
+---
+## Prerequisites
+You MUST have API keys configured:
+```bash
+# Copy the example and add your keys
+cp .env.example .env
+# Required (pick one):
+OPENAI_API_KEY=sk-...
+ANTHROPIC_API_KEY=sk-ant-...
+# Optional (higher PubMed rate limits):
+NCBI_API_KEY=your-key
+```
+---
+## Examples
+### 1. Search Demo (No LLM Required)
+Demonstrates REAL parallel search across PubMed and Web.
 ```bash
 uv run python examples/search_demo/run_search.py "metformin cancer"
 ```
+**What's REAL:**
+- Actual NCBI E-utilities API calls
+- Actual DuckDuckGo web searches
+- Real papers, real URLs, real content
+---
+### 2. Embeddings Demo (No LLM Required)
+Demonstrates REAL semantic search and deduplication.
 ```bash
+uv run python examples/embeddings_demo/run_embeddings.py
 ```
+**What's REAL:**
+- Actual sentence-transformers model (all-MiniLM-L6-v2)
+- Actual ChromaDB vector storage
+- Real cosine similarity computations
+- Real semantic deduplication
+---
+### 3. Orchestrator Demo (LLM Required)
+Demonstrates the REAL search-judge-synthesize loop.
 ```bash
 uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+uv run python examples/orchestrator_demo/run_agent.py "aspirin alzheimer" --iterations 5
+```
+**What's REAL:**
+- Real PubMed + Web searches
+- Real LLM judge evaluating evidence quality
+- Real iterative refinement based on LLM decisions
+- Real research synthesis
+---
+### 4. Magentic Demo (OpenAI Required)
+Demonstrates REAL multi-agent coordination using Microsoft Agent Framework.
+```bash
+# Requires OPENAI_API_KEY specifically
+uv run python examples/orchestrator_demo/run_magentic.py "metformin cancer"
+```
+**What's REAL:**
+- Real MagenticBuilder orchestration
+- Real SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
+- Real manager-based coordination
+---
+### 5. Hypothesis Demo (LLM Required)
+Demonstrates REAL mechanistic hypothesis generation.
+```bash
+uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
+uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
+```
+**What's REAL:**
+- Real PubMed + Web search first
+- Real embedding-based deduplication
+- Real LLM generating Drug -> Target -> Pathway -> Effect chains
+- Real knowledge gap identification
+---
+### 6. Full-Stack Demo (LLM Required)
+**THE COMPLETE PIPELINE** - All phases working together.
+```bash
+uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
+uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
+```
+**What's REAL:**
+1. Real PubMed + Web evidence collection
+2. Real embedding-based semantic deduplication
+3. Real LLM mechanistic hypothesis generation
+4. Real LLM evidence quality assessment
+5. Real LLM structured scientific report generation
+Output: Publication-quality research report with validated citations.
+---
+## API Key Requirements
+| Example | LLM Required | Keys |
+|---------|--------------|------|
+| search_demo | No | Optional: `NCBI_API_KEY` |
+| embeddings_demo | No | None |
+| orchestrator_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
+| run_magentic | Yes | `OPENAI_API_KEY` (Magentic requires OpenAI) |
+| hypothesis_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
+| full_stack_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
+---
+## Architecture
+```text
+User Query
+    |
+    v
+[REAL Search] --> Actual PubMed + Web API calls
+    |
+    v
+[REAL Embeddings] --> Actual sentence-transformers
+    |
+    v
+[REAL Hypothesis] --> Actual LLM reasoning
+    |
+    v
+[REAL Judge] --> Actual LLM assessment
+    |
+    +---> Need more? --> Loop back to Search
+    |
+    +---> Sufficient --> Continue
+    |
+    v
+[REAL Report] --> Actual LLM synthesis
+    |
+    v
+Publication-Quality Research Report
+```
+---
+## Why No Mocks?
+> "Authenticity is the feature."
+Mocks belong in `tests/unit/`, not in demos. When you run these examples, you see:
+- Real papers from real databases
+- Real AI reasoning about real evidence
+- Real scientific hypotheses
+- Real research reports
+This is what DeepCritical actually does. No fake data. No canned responses.

examples/embeddings_demo/run_embeddings.py ADDED Viewed

	@@ -0,0 +1,210 @@

+#!/usr/bin/env python3
+"""
+Demo: Semantic Search & Deduplication (Phase 6).
+This script demonstrates embedding-based capabilities:
+- Text embedding with sentence-transformers
+- Semantic similarity search via ChromaDB
+- Duplicate detection by meaning (not just URL)
+Usage:
+    uv run python examples/embeddings_demo/run_embeddings.py
+No API keys required - uses local sentence-transformers model.
+"""
+import asyncio
+from src.services.embeddings import EmbeddingService
+from src.utils.models import Citation, Evidence
+def create_sample_evidence() -> list[Evidence]:
+    """Create sample evidence with some semantic duplicates."""
+    return [
+        Evidence(
+            content="Metformin activates AMPK which inhibits mTOR signaling pathway.",
+            citation=Citation(
+                source="pubmed",
+                title="Metformin and AMPK activation",
+                url="https://pubmed.ncbi.nlm.nih.gov/11111/",
+                date="2023",
+                authors=["Smith J"],
+            ),
+        ),
+        Evidence(
+            content="The drug metformin works by turning on AMPK, blocking the mTOR pathway.",
+            citation=Citation(
+                source="pubmed",
+                title="AMPK-mTOR axis in diabetes treatment",
+                url="https://pubmed.ncbi.nlm.nih.gov/22222/",
+                date="2022",
+                authors=["Jones A"],
+            ),
+        ),
+        Evidence(
+            content="Sildenafil increases nitric oxide signaling for vasodilation.",
+            citation=Citation(
+                source="web",
+                title="How Viagra Works",
+                url="https://example.com/viagra-mechanism",
+                date="2023",
+                authors=["WebMD"],
+            ),
+        ),
+        Evidence(
+            content="Clinical trials show metformin reduces cancer incidence in diabetic patients.",
+            citation=Citation(
+                source="pubmed",
+                title="Metformin and cancer prevention",
+                url="https://pubmed.ncbi.nlm.nih.gov/33333/",
+                date="2024",
+                authors=["Lee K", "Park S"],
+            ),
+        ),
+        Evidence(
+            content="Metformin inhibits mTOR through AMPK activation mechanism.",
+            citation=Citation(
+                source="pubmed",
+                title="mTOR inhibition by Metformin",
+                url="https://pubmed.ncbi.nlm.nih.gov/44444/",
+                date="2023",
+                authors=["Brown M"],
+            ),
+        ),
+    ]
+def create_fresh_service(name_suffix: str = "") -> EmbeddingService:
+    """Create a fresh embedding service with unique collection name."""
+    import uuid
+    # Create service with unique collection by modifying the internal collection
+    service = EmbeddingService.__new__(EmbeddingService)
+    service._model = __import__("sentence_transformers").SentenceTransformer("all-MiniLM-L6-v2")
+    service._client = __import__("chromadb").Client()
+    collection_name = f"demo_{name_suffix}_{uuid.uuid4().hex[:8]}"
+    service._collection = service._client.create_collection(
+        name=collection_name, metadata={"hnsw:space": "cosine"}
+    )
+    return service
+async def demo_embedding() -> None:
+    """Demo single text embedding."""
+    print("\n" + "=" * 60)
+    print("1. TEXT EMBEDDING DEMO")
+    print("=" * 60)
+    service = create_fresh_service("embed")
+    texts = [
+        "Metformin activates AMPK",
+        "Aspirin reduces inflammation",
+        "Metformin turns on the AMPK enzyme",
+    ]
+    print("\nEmbedding sample texts...")
+    embeddings = await service.embed_batch(texts)
+    for text, emb in zip(texts, embeddings, strict=False):
+        print(f"  '{text[:40]}...' -> [{emb[0]:.4f}, {emb[1]:.4f}, ... ] (dim={len(emb)})")
+    # Calculate similarity between text 0 and text 2 (semantically similar)
+    import numpy as np
+    sim_0_2 = np.dot(embeddings[0], embeddings[2]) / (
+        np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[2])
+    )
+    sim_0_1 = np.dot(embeddings[0], embeddings[1]) / (
+        np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
+    )
+    print(f"\nSimilarity (Metformin AMPK) vs (Metformin turns on AMPK): {sim_0_2:.3f}")
+    print(f"Similarity (Metformin AMPK) vs (Aspirin inflammation):    {sim_0_1:.3f}")
+    print("  -> Semantically similar texts have higher cosine similarity!")
+async def demo_semantic_search() -> None:
+    """Demo semantic similarity search."""
+    print("\n" + "=" * 60)
+    print("2. SEMANTIC SEARCH DEMO")
+    print("=" * 60)
+    service = create_fresh_service("search")
+    # Add some documents to the vector store
+    docs = [
+        ("doc1", "Metformin activates AMPK enzyme in liver cells", {"source": "pubmed"}),
+        ("doc2", "Aspirin inhibits COX-2 to reduce inflammation", {"source": "pubmed"}),
+        ("doc3", "Statins lower cholesterol by inhibiting HMG-CoA reductase", {"source": "web"}),
+        ("doc4", "AMPK activation leads to improved glucose metabolism", {"source": "pubmed"}),
+        ("doc5", "Sildenafil works via nitric oxide pathway", {"source": "web"}),
+    ]
+    print("\nIndexing documents...")
+    for doc_id, content, meta in docs:
+        await service.add_evidence(doc_id, content, meta)
+        print(f"  Added: {doc_id}")
+    # Search for semantically related content
+    query = "drugs that activate AMPK"
+    print(f"\nSearching for: '{query}'")
+    results = await service.search_similar(query, n_results=3)
+    print("\nTop 3 results:")
+    for i, r in enumerate(results, 1):
+        # Lower distance = more similar (cosine distance: 0=identical, 2=opposite)
+        similarity = 1 - r["distance"]
+        print(f"  {i}. [{similarity:.2%} similar] {r['content'][:60]}...")
+async def demo_deduplication() -> None:
+    """Demo semantic deduplication."""
+    print("\n" + "=" * 60)
+    print("3. SEMANTIC DEDUPLICATION DEMO")
+    print("=" * 60)
+    # Create fresh service for clean demo
+    service = create_fresh_service("dedup")
+    evidence = create_sample_evidence()
+    print(f"\nOriginal evidence count: {len(evidence)}")
+    for i, e in enumerate(evidence, 1):
+        print(f"  {i}. {e.citation.title}")
+    print("\nRunning semantic deduplication (threshold=0.85)...")
+    unique = await service.deduplicate(evidence, threshold=0.85)
+    print(f"\nUnique evidence count: {len(unique)}")
+    print(f"Removed {len(evidence) - len(unique)} semantic duplicates\n")
+    for i, e in enumerate(unique, 1):
+        print(f"  {i}. {e.citation.title}")
+    print("\n  -> Notice: Papers about 'Metformin AMPK mTOR' were deduplicated!")
+    print("     Different titles, same semantic meaning = duplicate removed.")
+async def main() -> None:
+    """Run all embedding demos."""
+    print("\n" + "=" * 60)
+    print("DeepCritical Embeddings Demo (Phase 6)")
+    print("Using: sentence-transformers + ChromaDB")
+    print("=" * 60)
+    await demo_embedding()
+    await demo_semantic_search()
+    await demo_deduplication()
+    print("\n" + "=" * 60)
+    print("Demo complete! Embeddings enable:")
+    print("  - Finding papers by MEANING, not just keywords")
+    print("  - Removing duplicate findings automatically")
+    print("  - Building diverse evidence sets for research")
+    print("=" * 60 + "\n")
+if __name__ == "__main__":
+    asyncio.run(main())

examples/full_stack_demo/run_full.py ADDED Viewed

	@@ -0,0 +1,235 @@

+#!/usr/bin/env python3
+"""
+Demo: Full Stack DeepCritical Agent (Phases 1-8).
+This script demonstrates the COMPLETE REAL drug repurposing research pipeline:
+- Phase 2: REAL Search (PubMed + Web API calls)
+- Phase 6: REAL Embeddings (sentence-transformers + ChromaDB)
+- Phase 7: REAL Hypothesis (LLM mechanistic reasoning)
+- Phase 3: REAL Judge (LLM evidence assessment)
+- Phase 8: REAL Report (LLM structured scientific report)
+NO MOCKS. NO FAKE DATA. REAL SCIENCE.
+Usage:
+    uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
+    uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
+Requires: OPENAI_API_KEY or ANTHROPIC_API_KEY
+"""
+import argparse
+import asyncio
+import os
+import sys
+from typing import Any
+from src.utils.models import Evidence
+def print_header(title: str) -> None:
+    """Print a formatted section header."""
+    print(f"\n{'='*70}")
+    print(f"  {title}")
+    print(f"{'='*70}\n")
+def print_step(step: int, name: str) -> None:
+    """Print a step indicator."""
+    print(f"\n[Step {step}] {name}")
+    print("-" * 50)
+_MAX_DISPLAY_LEN = 600
+def _print_truncated(text: str) -> None:
+    """Print text, truncating if too long."""
+    if len(text) > _MAX_DISPLAY_LEN:
+        print(text[:_MAX_DISPLAY_LEN] + "\n... [truncated for display]")
+    else:
+        print(text)
+async def _run_search_iteration(
+    query: str,
+    iteration: int,
+    evidence_store: dict[str, Any],
+    all_evidence: list[Evidence],
+    search_handler: Any,
+    embedding_service: Any,
+) -> list[Evidence]:
+    """Run a single search iteration with deduplication."""
+    search_queries = [query]
+    if evidence_store.get("hypotheses"):
+        for h in evidence_store["hypotheses"][-2:]:
+            search_queries.extend(h.search_suggestions[:1])
+    for q in search_queries[:2]:
+        result = await search_handler.execute(q, max_results_per_tool=5)
+        print(f"  '{q}' -> {result.total_found} results")
+        new_unique = await embedding_service.deduplicate(result.evidence)
+        print(f"  After dedup: {len(new_unique)} unique")
+        all_evidence.extend(new_unique)
+    evidence_store["current"] = all_evidence
+    evidence_store["iteration_count"] = iteration
+    return all_evidence
+async def _handle_judge_step(
+    judge_handler: Any, query: str, all_evidence: list[Evidence], evidence_store: dict[str, Any]
+) -> tuple[bool, str]:
+    """Handle the judge assessment step. Returns (should_stop, next_query)."""
+    print("\n[Judge] Assessing evidence quality (REAL LLM)...")
+    assessment = await judge_handler.assess(query, all_evidence)
+    print(f"  Mechanism Score: {assessment.details.mechanism_score}/10")
+    print(f"  Clinical Score:  {assessment.details.clinical_evidence_score}/10")
+    print(f"  Confidence:      {assessment.confidence:.0%}")
+    print(f"  Recommendation:  {assessment.recommendation.upper()}")
+    if assessment.recommendation == "synthesize":
+        print("\n[Judge] Evidence sufficient! Proceeding to report generation...")
+        evidence_store["last_assessment"] = assessment.details.model_dump()
+        return True, query
+    next_queries = assessment.next_search_queries[:2] if assessment.next_search_queries else []
+    if next_queries:
+        print(f"\n[Judge] Need more evidence. Next queries: {next_queries}")
+        return False, next_queries[0]
+    print(
+        "\n[Judge] Need more evidence but no suggested queries. " "Continuing with original query."
+    )
+    return False, query
+async def run_full_demo(query: str, max_iterations: int) -> None:
+    """Run the REAL full stack pipeline."""
+    print_header("DeepCritical Full Stack Demo (REAL)")
+    print(f"Query: {query}")
+    print(f"Max iterations: {max_iterations}")
+    print("Mode: REAL (All live API calls - no mocks)\n")
+    # Import real components
+    from src.agent_factory.judges import JudgeHandler
+    from src.agents.hypothesis_agent import HypothesisAgent
+    from src.agents.report_agent import ReportAgent
+    from src.services.embeddings import EmbeddingService
+    from src.tools.pubmed import PubMedTool
+    from src.tools.search_handler import SearchHandler
+    from src.tools.websearch import WebTool
+    # Initialize REAL services
+    print("[Init] Loading embedding model...")
+    embedding_service = EmbeddingService()
+    search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
+    judge_handler = JudgeHandler()
+    # Shared evidence store
+    evidence_store: dict[str, Any] = {"current": [], "hypotheses": [], "iteration_count": 0}
+    all_evidence: list[Evidence] = []
+    for iteration in range(1, max_iterations + 1):
+        print_step(iteration, f"ITERATION {iteration}/{max_iterations}")
+        # Step 1: REAL Search
+        print("\n[Search] Querying PubMed and Web (REAL API calls)...")
+        all_evidence = await _run_search_iteration(
+            query, iteration, evidence_store, all_evidence, search_handler, embedding_service
+        )
+        if not all_evidence:
+            print("\nNo evidence found. Try a different query.")
+            return
+        # Step 2: REAL Hypothesis generation (first iteration only)
+        if iteration == 1:
+            print("\n[Hypothesis] Generating mechanistic hypotheses (REAL LLM)...")
+            hypothesis_agent = HypothesisAgent(evidence_store, embedding_service)
+            hyp_response = await hypothesis_agent.run(query)
+            _print_truncated(hyp_response.messages[0].text)
+        # Step 3: REAL Judge
+        should_stop, query = await _handle_judge_step(
+            judge_handler, query, all_evidence, evidence_store
+        )
+        if should_stop:
+            break
+    # Step 4: REAL Report generation
+    print_step(iteration + 1, "REPORT GENERATION (REAL LLM)")
+    report_agent = ReportAgent(evidence_store, embedding_service)
+    report_response = await report_agent.run(query)
+    print("\n" + "=" * 70)
+    print("  FINAL RESEARCH REPORT")
+    print("=" * 70)
+    print(report_response.messages[0].text)
+async def main() -> None:
+    """Entry point."""
+    parser = argparse.ArgumentParser(
+        description="DeepCritical Full Stack Demo - REAL, No Mocks",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+This demo runs the COMPLETE pipeline with REAL API calls:
+  1. REAL search: Actual PubMed + DuckDuckGo queries
+  2. REAL embeddings: Actual sentence-transformers model
+  3. REAL hypothesis: Actual LLM generating mechanistic chains
+  4. REAL judge: Actual LLM assessing evidence quality
+  5. REAL report: Actual LLM generating structured report
+Examples:
+    uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
+    uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
+    uv run python examples/full_stack_demo/run_full.py "aspirin cancer prevention"
+        """,
+    )
+    parser.add_argument(
+        "query",
+        help="Research query (e.g., 'metformin Alzheimer's disease')",
+    )
+    parser.add_argument(
+        "-i",
+        "--iterations",
+        type=int,
+        default=2,
+        help="Max search iterations (default: 2)",
+    )
+    args = parser.parse_args()
+    if args.iterations < 1:
+        print("Error: iterations must be at least 1")
+        sys.exit(1)
+    # Fail fast: require API key
+    if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 70)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo. No mocks. No fake data.")
+        print("=" * 70)
+        sys.exit(1)
+    await run_full_demo(args.query, args.iterations)
+    print("\n" + "=" * 70)
+    print("  DeepCritical Full Stack Demo Complete!")
+    print("  ")
+    print("  Everything you just saw was REAL:")
+    print("    - Real PubMed/Web searches")
+    print("    - Real embedding computations")
+    print("    - Real LLM reasoning")
+    print("    - Real scientific report")
+    print("=" * 70 + "\n")
+if __name__ == "__main__":
+    asyncio.run(main())

examples/hypothesis_demo/run_hypothesis.py ADDED Viewed

	@@ -0,0 +1,139 @@

+#!/usr/bin/env python3
+"""
+Demo: Hypothesis Generation (Phase 7).
+This script demonstrates the REAL hypothesis generation pipeline:
+1. REAL search: PubMed + Web (actual API calls)
+2. REAL embeddings: Semantic deduplication
+3. REAL LLM: Mechanistic hypothesis generation
+Usage:
+    # Requires OPENAI_API_KEY or ANTHROPIC_API_KEY
+    uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
+"""
+import argparse
+import asyncio
+import os
+import sys
+from typing import Any
+from src.agents.hypothesis_agent import HypothesisAgent
+from src.services.embeddings import EmbeddingService
+from src.tools.pubmed import PubMedTool
+from src.tools.search_handler import SearchHandler
+from src.tools.websearch import WebTool
+async def run_hypothesis_demo(query: str) -> None:
+    """Run the REAL hypothesis generation pipeline."""
+    try:
+        print(f"\n{'='*60}")
+        print("DeepCritical Hypothesis Agent Demo (Phase 7)")
+        print(f"Query: {query}")
+        print("Mode: REAL (Live API calls)")
+        print(f"{'='*60}\n")
+        # Step 1: REAL Search
+        print("[Step 1] Searching PubMed + Web...")
+        search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
+        result = await search_handler.execute(query, max_results_per_tool=5)
+        print(f"  Found {result.total_found} results from {result.sources_searched}")
+        if result.errors:
+            print(f"  Warnings: {result.errors}")
+        if not result.evidence:
+            print("\nNo evidence found. Try a different query.")
+            return
+        # Step 2: REAL Embeddings - Deduplicate
+        print("\n[Step 2] Semantic deduplication...")
+        embedding_service = EmbeddingService()
+        unique_evidence = await embedding_service.deduplicate(result.evidence, threshold=0.85)
+        print(f"  {len(result.evidence)} -> {len(unique_evidence)} unique papers")
+        # Show what we found
+        print("\n[Evidence collected]")
+        max_title_len = 50
+        for i, e in enumerate(unique_evidence[:5], 1):
+            raw_title = e.citation.title
+            if len(raw_title) > max_title_len:
+                title = raw_title[:max_title_len] + "..."
+            else:
+                title = raw_title
+            print(f"  {i}. [{e.citation.source.upper()}] {title}")
+        # Step 3: REAL LLM - Generate hypotheses
+        print("\n[Step 3] Generating mechanistic hypotheses (LLM)...")
+        evidence_store: dict[str, Any] = {"current": unique_evidence, "hypotheses": []}
+        agent = HypothesisAgent(evidence_store, embedding_service)
+        print("-" * 60)
+        response = await agent.run(query)
+        print(response.messages[0].text)
+        print("-" * 60)
+        # Show stored hypotheses
+        hypotheses = evidence_store.get("hypotheses", [])
+        print(f"\n{len(hypotheses)} hypotheses stored")
+        if hypotheses:
+            print("\nGenerated search queries for further investigation:")
+            for h in hypotheses:
+                queries = h.to_search_queries()
+                print(f"  {h.drug} -> {h.target}:")
+                for q in queries[:3]:
+                    print(f"    - {q}")
+    except Exception as e:
+        print(f"\n❌ Error during hypothesis generation: {e}")
+        raise
+async def main() -> None:
+    """Entry point."""
+    parser = argparse.ArgumentParser(
+        description="Hypothesis Generation Demo (REAL - No Mocks)",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "aspirin cancer prevention"
+        """,
+    )
+    parser.add_argument(
+        "query",
+        nargs="?",
+        default="metformin Alzheimer's disease",
+        help="Research query",
+    )
+    args = parser.parse_args()
+    # Fail fast: require API key
+    if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 60)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo, not a mock. No fake data.")
+        print("=" * 60)
+        sys.exit(1)
+    await run_hypothesis_demo(args.query)
+    print("\n" + "=" * 60)
+    print("Demo complete! This was a REAL pipeline:")
+    print("  1. REAL search: Actual PubMed + Web API calls")
+    print("  2. REAL embeddings: Actual sentence-transformers")
+    print("  3. REAL LLM: Actual hypothesis generation")
+    print("=" * 60 + "\n")
+if __name__ == "__main__":
+    asyncio.run(main())

examples/orchestrator_demo/run_agent.py CHANGED Viewed

@@ -1,19 +1,20 @@
 #!/usr/bin/env python3
 """
-Demo: Full DeepCritical Agent Loop (Search + Judge + Orchestrator).
-This script demonstrates Phase 4 functionality:
-- Iterative Search (PubMed + Web)
-- Evidence Evaluation (Judge Agent)
-- Orchestration Loop
-- Final Synthesis
-Usage:
-    # Run with Mock Judge (No API Key needed)
-    uv run python examples/orchestrator_demo/run_agent.py "metformin cancer" --mock
-    # Run with Real Judge (Requires OPENAI_API_KEY or ANTHROPIC_API_KEY)
     uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
 """
 import argparse
@@ -21,61 +22,90 @@ import asyncio
 import os
 import sys
-from src.agent_factory.judges import JudgeHandler, MockJudgeHandler
 from src.orchestrator import Orchestrator
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 from src.tools.websearch import WebTool
 from src.utils.models import OrchestratorConfig
 async def main() -> None:
-    """Run the agent demo."""
-    parser = argparse.ArgumentParser(description="Run DeepCritical Agent CLI")
     parser.add_argument("query", help="Research query (e.g., 'metformin cancer')")
-    parser.add_argument("--mock", action="store_true", help="Use Mock Judge (no API key needed)")
-    parser.add_argument("--iterations", type=int, default=3, help="Max iterations")
     args = parser.parse_args()
-    # Check for keys if not mocking
-    if not args.mock and not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
-        print("Error: No API key found. Set OPENAI_API_KEY or ANTHROPIC_API_KEY, or use --mock.")
         sys.exit(1)
     print(f"\n{'='*60}")
-    print("DeepCritical Agent Demo")
     print(f"Query: {args.query}")
-    print(f"Mode: {'MOCK' if args.mock else 'REAL (LLM)'}")
-    print(f"{ '='*60}\n")
-    # 1. Setup Search Tools
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
-    # 2. Setup Judge
-    judge_handler: JudgeHandler | MockJudgeHandler
-    if args.mock:
-        judge_handler = MockJudgeHandler()
-    else:
-        judge_handler = JudgeHandler()
-    # 3. Setup Orchestrator
     config = OrchestratorConfig(max_iterations=args.iterations)
     orchestrator = Orchestrator(
         search_handler=search_handler, judge_handler=judge_handler, config=config
     )
-    # 4. Run Loop
     try:
         async for event in orchestrator.run(args.query):
-            # Print event with icon
             print(event.to_markdown().replace("**", ""))
-            # If we got data, print a snippet
             if event.type == "search_complete" and event.data:
                 print(f"   -> Found {event.data.get('new_count', 0)} new items")
     except Exception as e:
         print(f"\n❌ Error: {e}")
 if __name__ == "__main__":

 #!/usr/bin/env python3
 """
+Demo: DeepCritical Agent Loop (Search + Judge + Orchestrator).
+This script demonstrates the REAL Phase 4 orchestration:
+- REAL Iterative Search (PubMed + Web API calls)
+- REAL Evidence Evaluation (LLM Judge)
+- REAL Orchestration Loop
+- REAL Final Synthesis
+NO MOCKS. REAL API CALLS.
+Usage:
     uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+    uv run python examples/orchestrator_demo/run_agent.py "sildenafil heart failure" --iterations 5
+Requires: OPENAI_API_KEY or ANTHROPIC_API_KEY
 """
 import argparse
 import os
 import sys
+from src.agent_factory.judges import JudgeHandler
 from src.orchestrator import Orchestrator
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 from src.tools.websearch import WebTool
 from src.utils.models import OrchestratorConfig
+MAX_ITERATIONS = 10
 async def main() -> None:
+    """Run the REAL agent demo."""
+    parser = argparse.ArgumentParser(
+        description="DeepCritical Agent Demo - REAL, No Mocks",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+This demo runs the REAL search-judge-synthesize loop:
+  1. REAL search: Actual PubMed + DuckDuckGo queries
+  2. REAL judge: Actual LLM assessing evidence quality
+  3. REAL loop: Actual iterative refinement based on LLM decisions
+  4. REAL synthesis: Actual research summary generation
+Examples:
+    uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+    uv run python examples/orchestrator_demo/run_agent.py "aspirin alzheimer" --iterations 5
+        """,
+    )
     parser.add_argument("query", help="Research query (e.g., 'metformin cancer')")
+    parser.add_argument("--iterations", type=int, default=3, help="Max iterations (default: 3)")
     args = parser.parse_args()
+    if not 1 <= args.iterations <= MAX_ITERATIONS:
+        print(f"Error: iterations must be between 1 and {MAX_ITERATIONS}")
+        sys.exit(1)
+    # Fail fast: require API key
+    if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 60)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo. No mocks. No fake data.")
+        print("=" * 60)
         sys.exit(1)
     print(f"\n{'='*60}")
+    print("DeepCritical Agent Demo (REAL)")
     print(f"Query: {args.query}")
+    print(f"Max Iterations: {args.iterations}")
+    print("Mode: REAL (All live API calls)")
+    print(f"{'='*60}\n")
+    # Setup REAL components
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
+    judge_handler = JudgeHandler()  # REAL LLM judge
     config = OrchestratorConfig(max_iterations=args.iterations)
     orchestrator = Orchestrator(
         search_handler=search_handler, judge_handler=judge_handler, config=config
     )
+    # Run the REAL loop
     try:
         async for event in orchestrator.run(args.query):
+            # Print event with icon (remove markdown bold for CLI)
             print(event.to_markdown().replace("**", ""))
+            # Show search results count
             if event.type == "search_complete" and event.data:
                 print(f"   -> Found {event.data.get('new_count', 0)} new items")
     except Exception as e:
         print(f"\n❌ Error: {e}")
+        raise
+    print("\n" + "=" * 60)
+    print("Demo complete! Everything was REAL:")
+    print("  - Real PubMed/Web searches")
+    print("  - Real LLM judge decisions")
+    print("  - Real iterative refinement")
+    print("=" * 60 + "\n")
 if __name__ == "__main__":

src/prompts/report.py CHANGED Viewed

@@ -25,6 +25,32 @@ A good report:
 Write in scientific but accessible language. Be specific about evidence strength.
 ─────────────────────────────────────────────────────────────────────────────
 🚨 CRITICAL CITATION REQUIREMENTS 🚨
 ─────────────────────────────────────────────────────────────────────────────

 Write in scientific but accessible language. Be specific about evidence strength.
+─────────────────────────────────────────────────────────────────────────────
+🚨 CRITICAL: REQUIRED JSON STRUCTURE 🚨
+─────────────────────────────────────────────────────────────────────────────
+The `hypotheses_tested` field MUST be a LIST of objects, each with these fields:
+- "hypothesis": the hypothesis text
+- "supported": count of supporting evidence (integer)
+- "contradicted": count of contradicting evidence (integer)
+Example:
+  hypotheses_tested: [
+    {"hypothesis": "Metformin -> AMPK -> reduced inflammation", "supported": 3, "contradicted": 1},
+    {"hypothesis": "Aspirin inhibits COX-2 pathway", "supported": 5, "contradicted": 0}
+  ]
+The `references` field MUST be a LIST of objects, each with these fields:
+- "title": paper title (string)
+- "authors": author names (string)
+- "source": "pubmed" or "web" (string)
+- "url": the EXACT URL from evidence (string)
+Example:
+  references: [
+    {"title": "Metformin and Cancer", "authors": "Smith et al.", "source": "pubmed", "url": "https://pubmed.ncbi.nlm.nih.gov/12345678/"}
+  ]
 ─────────────────────────────────────────────────────────────────────────────
 🚨 CRITICAL CITATION REQUIREMENTS 🚨
 ─────────────────────────────────────────────────────────────────────────────