# Phase 7 Implementation Spec: Hypothesis Agent **Goal**: Add an agent that generates scientific hypotheses to guide targeted searches. **Philosophy**: "Don't just find evidence—understand the mechanisms." **Prerequisite**: Phase 6 complete (Embeddings working) --- ## 1. Why Hypothesis Agent? Current limitation: **Search is reactive, not hypothesis-driven.** Current flow: 1. User asks about "metformin alzheimer" 2. Search finds papers 3. Judge says "need more evidence" 4. Search again with slightly different keywords With Hypothesis Agent: 1. User asks about "metformin alzheimer" 2. Search finds initial papers 3. **Hypothesis Agent analyzes**: "Evidence suggests metformin → AMPK activation → autophagy → amyloid clearance" 4. Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs" **Key insight**: Scientific research is hypothesis-driven. The agent should think like a researcher. --- ## 2. Architecture ### Current (Phase 6) ``` User Query → Magentic Manager ├── SearchAgent → Evidence └── JudgeAgent → Sufficient? → Synthesize/Continue ``` ### Phase 7 ``` User Query → Magentic Manager ├── SearchAgent → Evidence ├── HypothesisAgent → Mechanistic Hypotheses ← NEW └── JudgeAgent → Sufficient? → Synthesize/Continue ↑ Uses hypotheses to guide next search ``` ### Shared Context Enhancement ```python evidence_store = { "current": [], "embeddings": {}, "vector_index": None, "hypotheses": [], # NEW: Generated hypotheses "tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence } ``` --- ## 3. Hypothesis Model ### 3.1 Data Model (`src/utils/models.py`) ```python class MechanismHypothesis(BaseModel): """A scientific hypothesis about drug mechanism.""" drug: str = Field(description="The drug being studied") target: str = Field(description="Molecular target (e.g., AMPK, mTOR)") pathway: str = Field(description="Biological pathway affected") effect: str = Field(description="Downstream effect on disease") confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis") supporting_evidence: list[str] = Field( default_factory=list, description="PMIDs or URLs supporting this hypothesis" ) contradicting_evidence: list[str] = Field( default_factory=list, description="PMIDs or URLs contradicting this hypothesis" ) search_suggestions: list[str] = Field( default_factory=list, description="Suggested searches to test this hypothesis" ) def to_search_queries(self) -> list[str]: """Generate search queries to test this hypothesis.""" return [ f"{self.drug} {self.target}", f"{self.target} {self.pathway}", f"{self.pathway} {self.effect}", *self.search_suggestions ] ``` ### 3.2 Hypothesis Assessment ```python class HypothesisAssessment(BaseModel): """Assessment of evidence against hypotheses.""" hypotheses: list[MechanismHypothesis] primary_hypothesis: MechanismHypothesis | None = Field( description="Most promising hypothesis based on current evidence" ) knowledge_gaps: list[str] = Field( description="What we don't know yet" ) recommended_searches: list[str] = Field( description="Searches to fill knowledge gaps" ) ``` --- ## 4. Implementation ### 4.0 Text Utilities (`src/utils/text_utils.py`) > **Why These Utilities?** > > The original spec used arbitrary truncation (`evidence[:10]` and `content[:300]`). > This loses important information randomly. These utilities provide: > 1. **Sentence-aware truncation** - cuts at sentence boundaries, not mid-word > 2. **Diverse evidence selection** - uses embeddings to select varied evidence (MMR) ```python """Text processing utilities for evidence handling.""" from typing import TYPE_CHECKING if TYPE_CHECKING: from src.services.embeddings import EmbeddingService from src.utils.models import Evidence def truncate_at_sentence(text: str, max_chars: int = 300) -> str: """Truncate text at sentence boundary, preserving meaning. Args: text: The text to truncate max_chars: Maximum characters (default 300) Returns: Text truncated at last complete sentence within limit """ if len(text) <= max_chars: return text # Find truncation point truncated = text[:max_chars] # Look for sentence endings: . ! ? followed by space or end for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']: last_sep = truncated.rfind(sep) if last_sep > max_chars // 2: # Don't truncate too aggressively return text[:last_sep + 1].strip() # Fallback: find last period last_period = truncated.rfind('.') if last_period > max_chars // 2: return text[:last_period + 1].strip() # Last resort: truncate at word boundary last_space = truncated.rfind(' ') if last_space > 0: return text[:last_space].strip() + "..." return truncated + "..." async def select_diverse_evidence( evidence: list["Evidence"], n: int, query: str, embeddings: "EmbeddingService | None" = None ) -> list["Evidence"]: """Select n most diverse and relevant evidence items. Uses Maximal Marginal Relevance (MMR) when embeddings available, falls back to relevance_score sorting otherwise. Args: evidence: All available evidence n: Number of items to select query: Original query for relevance scoring embeddings: Optional EmbeddingService for semantic diversity Returns: Selected evidence items, diverse and relevant """ if not evidence: return [] if n >= len(evidence): return evidence # Fallback: sort by relevance score if no embeddings if embeddings is None: return sorted( evidence, key=lambda e: e.relevance_score, reverse=True )[:n] # MMR: Maximal Marginal Relevance for diverse selection # Score = λ * relevance - (1-λ) * max_similarity_to_selected lambda_param = 0.7 # Balance relevance vs diversity # Get query embedding query_emb = await embeddings.embed(query) # Get all evidence embeddings evidence_embs = await embeddings.embed_batch([e.content for e in evidence]) # Compute relevance scores (cosine similarity to query) from numpy import dot from numpy.linalg import norm cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b))) relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs] # Greedy MMR selection selected_indices: list[int] = [] remaining = set(range(len(evidence))) for _ in range(n): best_score = float('-inf') best_idx = -1 for idx in remaining: # Relevance component relevance = relevance_scores[idx] # Diversity component: max similarity to already selected if selected_indices: max_sim = max( cosine(evidence_embs[idx], evidence_embs[sel]) for sel in selected_indices ) else: max_sim = 0 # MMR score mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim if mmr_score > best_score: best_score = mmr_score best_idx = idx if best_idx >= 0: selected_indices.append(best_idx) remaining.remove(best_idx) return [evidence[i] for i in selected_indices] ``` ### 4.1 Hypothesis Prompts (`src/prompts/hypothesis.py`) ```python """Prompts for Hypothesis Agent.""" from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing. Your role is to generate mechanistic hypotheses based on evidence. A good hypothesis: 1. Proposes a MECHANISM: Drug → Target → Pathway → Effect 2. Is TESTABLE: Can be supported or refuted by literature search 3. Is SPECIFIC: Names actual molecular targets and pathways 4. Generates SEARCH QUERIES: Helps find more evidence Example hypothesis format: - Drug: Metformin - Target: AMPK (AMP-activated protein kinase) - Pathway: mTOR inhibition → autophagy activation - Effect: Enhanced clearance of amyloid-beta in Alzheimer's - Confidence: 0.7 - Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"] Be specific. Use actual gene/protein names when possible.""" async def format_hypothesis_prompt( query: str, evidence: list, embeddings=None ) -> str: """Format prompt for hypothesis generation. Uses smart evidence selection instead of arbitrary truncation. Args: query: The research query evidence: All collected evidence embeddings: Optional EmbeddingService for diverse selection """ # Select diverse, relevant evidence (not arbitrary first 10) selected = await select_diverse_evidence( evidence, n=10, query=query, embeddings=embeddings ) # Format with sentence-aware truncation evidence_text = "\n".join([ f"- **{e.citation.title}** ({e.citation.source}): {truncate_at_sentence(e.content, 300)}" for e in selected ]) return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses. ## Evidence ({len(selected)} papers selected for diversity) {evidence_text} ## Task 1. Identify potential drug targets mentioned in the evidence 2. Propose mechanism hypotheses (Drug → Target → Pathway → Effect) 3. Rate confidence based on evidence strength 4. Suggest searches to test each hypothesis Generate 2-4 hypotheses, prioritized by confidence.""" ``` ### 4.2 Hypothesis Agent (`src/agents/hypothesis_agent.py`) ```python """Hypothesis agent for mechanistic reasoning.""" from collections.abc import AsyncIterable from typing import TYPE_CHECKING, Any from agent_framework import ( AgentRunResponse, AgentRunResponseUpdate, AgentThread, BaseAgent, ChatMessage, Role, ) from pydantic_ai import Agent from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt from src.utils.config import settings from src.utils.models import Evidence, HypothesisAssessment if TYPE_CHECKING: from src.services.embeddings import EmbeddingService class HypothesisAgent(BaseAgent): """Generates mechanistic hypotheses based on evidence.""" def __init__( self, evidence_store: dict[str, list[Evidence]], embedding_service: "EmbeddingService | None" = None, # NEW: for diverse selection ) -> None: super().__init__( name="HypothesisAgent", description="Generates scientific hypotheses about drug mechanisms to guide research", ) self._evidence_store = evidence_store self._embeddings = embedding_service # Used for MMR evidence selection self._agent = Agent( model=settings.llm_provider, # Uses configured LLM output_type=HypothesisAssessment, system_prompt=SYSTEM_PROMPT, ) async def run( self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, *, thread: AgentThread | None = None, **kwargs: Any, ) -> AgentRunResponse: """Generate hypotheses based on current evidence.""" # Extract query query = self._extract_query(messages) # Get current evidence evidence = self._evidence_store.get("current", []) if not evidence: return AgentRunResponse( messages=[ChatMessage( role=Role.ASSISTANT, text="No evidence available yet. Search for evidence first." )], response_id="hypothesis-no-evidence", ) # Generate hypotheses with diverse evidence selection # NOTE: format_hypothesis_prompt is now async prompt = await format_hypothesis_prompt( query, evidence, embeddings=self._embeddings ) result = await self._agent.run(prompt) assessment = result.output # Store hypotheses in shared context existing = self._evidence_store.get("hypotheses", []) self._evidence_store["hypotheses"] = existing + assessment.hypotheses # Format response response_text = self._format_response(assessment) return AgentRunResponse( messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)], response_id=f"hypothesis-{len(assessment.hypotheses)}", additional_properties={"assessment": assessment.model_dump()}, ) def _format_response(self, assessment: HypothesisAssessment) -> str: """Format hypothesis assessment as markdown.""" lines = ["## Generated Hypotheses\n"] for i, h in enumerate(assessment.hypotheses, 1): lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})") lines.append(f"**Mechanism**: {h.drug} → {h.target} → {h.pathway} → {h.effect}") lines.append(f"**Suggested searches**: {', '.join(h.search_suggestions)}\n") if assessment.primary_hypothesis: lines.append(f"### Primary Hypothesis") h = assessment.primary_hypothesis lines.append(f"{h.drug} → {h.target} → {h.pathway} → {h.effect}\n") if assessment.knowledge_gaps: lines.append("### Knowledge Gaps") for gap in assessment.knowledge_gaps: lines.append(f"- {gap}") if assessment.recommended_searches: lines.append("\n### Recommended Next Searches") for search in assessment.recommended_searches: lines.append(f"- `{search}`") return "\n".join(lines) def _extract_query(self, messages) -> str: """Extract query from messages.""" if isinstance(messages, str): return messages elif isinstance(messages, ChatMessage): return messages.text or "" elif isinstance(messages, list): for msg in reversed(messages): if isinstance(msg, ChatMessage) and msg.role == Role.USER: return msg.text or "" elif isinstance(msg, str): return msg return "" async def run_stream( self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, *, thread: AgentThread | None = None, **kwargs: Any, ) -> AsyncIterable[AgentRunResponseUpdate]: """Streaming wrapper.""" result = await self.run(messages, thread=thread, **kwargs) yield AgentRunResponseUpdate( messages=result.messages, response_id=result.response_id ) ``` ### 4.3 Update MagenticOrchestrator Add HypothesisAgent to the workflow: ```python # In MagenticOrchestrator.__init__ self._hypothesis_agent = HypothesisAgent(self._evidence_store) # In workflow building workflow = ( MagenticBuilder() .participants( searcher=search_agent, hypothesizer=self._hypothesis_agent, # NEW judge=judge_agent, ) .with_standard_manager(...) .build() ) # Update task instruction task = f"""Research drug repurposing opportunities for: {query} Workflow: 1. SearchAgent: Find initial evidence from PubMed and web 2. HypothesisAgent: Generate mechanistic hypotheses (Drug → Target → Pathway → Effect) 3. SearchAgent: Use hypothesis-suggested queries for targeted search 4. JudgeAgent: Evaluate if evidence supports hypotheses 5. Repeat until confident or max rounds Focus on: - Identifying specific molecular targets - Understanding mechanism of action - Finding supporting/contradicting evidence for hypotheses """ ``` --- ## 5. Directory Structure After Phase 7 ``` src/ ├── agents/ │ ├── search_agent.py │ ├── judge_agent.py │ └── hypothesis_agent.py # NEW ├── prompts/ │ ├── judge.py │ └── hypothesis.py # NEW ├── services/ │ └── embeddings.py └── utils/ └── models.py # Updated with hypothesis models ``` --- ## 6. Tests ### 6.1 Unit Tests (`tests/unit/agents/test_hypothesis_agent.py`) ```python """Unit tests for HypothesisAgent.""" import pytest from unittest.mock import AsyncMock, MagicMock, patch from src.agents.hypothesis_agent import HypothesisAgent from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis @pytest.fixture def sample_evidence(): return [ Evidence( content="Metformin activates AMPK, which inhibits mTOR signaling...", citation=Citation( source="pubmed", title="Metformin and AMPK", url="https://pubmed.ncbi.nlm.nih.gov/12345/", date="2023" ) ) ] @pytest.fixture def mock_assessment(): return HypothesisAssessment( hypotheses=[ MechanismHypothesis( drug="Metformin", target="AMPK", pathway="mTOR inhibition", effect="Reduced cancer cell proliferation", confidence=0.75, search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"] ) ], primary_hypothesis=None, knowledge_gaps=["Clinical trial data needed"], recommended_searches=["metformin clinical trial cancer"] ) @pytest.mark.asyncio async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment): """HypothesisAgent should generate mechanistic hypotheses.""" store = {"current": sample_evidence, "hypotheses": []} with patch("src.agents.hypothesis_agent.Agent") as MockAgent: mock_result = MagicMock() mock_result.output = mock_assessment MockAgent.return_value.run = AsyncMock(return_value=mock_result) agent = HypothesisAgent(store) response = await agent.run("metformin cancer") assert "AMPK" in response.messages[0].text assert len(store["hypotheses"]) == 1 @pytest.mark.asyncio async def test_hypothesis_agent_no_evidence(): """HypothesisAgent should handle empty evidence gracefully.""" store = {"current": [], "hypotheses": []} agent = HypothesisAgent(store) response = await agent.run("test query") assert "No evidence" in response.messages[0].text ``` --- ## 7. Definition of Done Phase 7 is **COMPLETE** when: 1. `MechanismHypothesis` and `HypothesisAssessment` models implemented 2. `HypothesisAgent` generates hypotheses from evidence 3. Hypotheses stored in shared context 4. Search queries generated from hypotheses 5. Magentic workflow includes HypothesisAgent 6. All unit tests pass --- ## 8. Value Delivered | Before (Phase 6) | After (Phase 7) | |------------------|-----------------| | Reactive search | Hypothesis-driven search | | Generic queries | Mechanism-targeted queries | | No scientific reasoning | Drug → Target → Pathway → Effect | | Judge says "need more" | Hypothesis says "search for X to test Y" | **Real example improvement:** - Query: "metformin alzheimer" - Before: "metformin alzheimer mechanism", "metformin brain" - After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance" The search becomes **scientifically targeted** rather than keyword variations.