Spaces:
Running
Phase 13 Implementation Spec: Modal Pipeline Integration
Goal: Wire existing Modal code execution into the agent pipeline. Philosophy: "Sandboxed execution makes AI-generated code trustworthy." Prerequisite: Phase 12 complete (MCP server working) Priority: P1 - HIGH VALUE ($2,500 Modal Innovation Award) Estimated Time: 2-3 hours
1. Why Modal Integration?
Current State Analysis
Mario already implemented src/tools/code_execution.py:
| Component | Status | Notes |
|---|---|---|
ModalCodeExecutor class |
Built | Executes Python in Modal sandbox |
SANDBOX_LIBRARIES |
Defined | pandas, numpy, scipy, etc. |
execute() method |
Implemented | Stdout/stderr capture |
execute_with_return() |
Implemented | Returns result variable |
AnalysisAgent |
Built | Uses Modal for statistical analysis |
| Pipeline Integration | MISSING | Not wired into main orchestrator |
What's Missing
Current Flow:
User Query β Orchestrator β Search β Judge β [Report] β Done
With Modal:
User Query β Orchestrator β Search β Judge β [Analysis*] β Report β Done
β
Modal Sandbox Execution
*The AnalysisAgent exists but is NOT called by either orchestrator.
2. Critical Dependency Analysis
The Problem (Senior Feedback)
# src/agents/analysis_agent.py - Line 8
from agent_framework import (
AgentRunResponse,
BaseAgent,
...
)
# pyproject.toml - agent-framework is OPTIONAL
[project.optional-dependencies]
magentic = [
"agent-framework-core",
]
If we import AnalysisAgent in the simple orchestrator without the magentic extra installed, the app CRASHES on startup.
The SOLID Solution
Single Responsibility Principle: Decouple Modal execution logic from agent_framework.
BEFORE (Coupled):
AnalysisAgent (requires agent_framework)
β
ModalCodeExecutor
AFTER (Decoupled):
StatisticalAnalyzer (no agent_framework dependency) β Simple mode uses this
β
ModalCodeExecutor
β
AnalysisAgent (wraps StatisticalAnalyzer) β Magentic mode uses this
Key insight: Create src/services/statistical_analyzer.py with ZERO agent_framework imports.
3. Prize Opportunity
Modal Innovation Award: $2,500
Judging Criteria:
- Sandbox Isolation - Code runs in container, not local
- Scientific Computing - Real pandas/scipy analysis
- Safety - Can't access local filesystem
- Speed - Modal's fast cold starts
What We Need to Show
# LLM generates analysis code
code = """
import pandas as pd
import scipy.stats as stats
data = pd.DataFrame({
'study': ['Study1', 'Study2', 'Study3'],
'effect_size': [0.45, 0.52, 0.38],
'sample_size': [120, 85, 200]
})
weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)
print(f"Weighted Effect Size: {weighted_mean:.3f}")
print(f"P-value: {p_value:.4f}")
result = "SUPPORTED" if p_value < 0.05 else "INCONCLUSIVE"
"""
# Executed SAFELY in Modal sandbox
executor = get_code_executor()
output = executor.execute(code) # Runs in isolated container!
4. Technical Specification
4.1 Dependencies
# pyproject.toml - NO CHANGES to dependencies
# StatisticalAnalyzer uses only:
# - pydantic-ai (already in main deps)
# - modal (already in main deps)
# - src.tools.code_execution (no agent_framework)
4.2 Environment Variables
# .env
MODAL_TOKEN_ID=your-token-id
MODAL_TOKEN_SECRET=your-token-secret
4.3 Integration Points
| Integration Point | File | Change Required |
|---|---|---|
| New Service | src/services/statistical_analyzer.py |
CREATE (no agent_framework) |
| Simple Orchestrator | src/orchestrator.py |
Use StatisticalAnalyzer |
| Config | src/utils/config.py |
Add enable_modal_analysis setting |
| AnalysisAgent | src/agents/analysis_agent.py |
Refactor to wrap StatisticalAnalyzer |
| MCP Tool | src/mcp_tools.py |
Add analyze_hypothesis tool |
5. Implementation
5.1 Configuration Update (src/utils/config.py)
class Settings(BaseSettings):
# ... existing settings ...
# Modal Configuration
modal_token_id: str | None = None
modal_token_secret: str | None = None
enable_modal_analysis: bool = False # Opt-in for hackathon demo
@property
def modal_available(self) -> bool:
"""Check if Modal credentials are configured."""
return bool(self.modal_token_id and self.modal_token_secret)
5.2 StatisticalAnalyzer Service (src/services/statistical_analyzer.py)
This is the key fix - NO agent_framework imports.
"""Statistical analysis service using Modal code execution.
This module provides Modal-based statistical analysis WITHOUT depending on
agent_framework. This allows it to be used in the simple orchestrator mode
without requiring the magentic optional dependency.
The AnalysisAgent (in src/agents/) wraps this service for magentic mode.
"""
import asyncio
import re
from functools import partial
from typing import Any
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from src.agent_factory.judges import get_model
from src.tools.code_execution import (
CodeExecutionError,
get_code_executor,
get_sandbox_library_prompt,
)
from src.utils.models import Evidence
class AnalysisResult(BaseModel):
"""Result of statistical analysis."""
verdict: str = Field(
description="SUPPORTED, REFUTED, or INCONCLUSIVE",
)
confidence: float = Field(ge=0.0, le=1.0, description="Confidence in verdict (0-1)")
statistical_evidence: str = Field(
description="Summary of statistical findings from code execution"
)
code_generated: str = Field(description="Python code that was executed")
execution_output: str = Field(description="Output from code execution")
key_findings: list[str] = Field(default_factory=list, description="Key takeaways")
limitations: list[str] = Field(default_factory=list, description="Limitations")
class StatisticalAnalyzer:
"""Performs statistical analysis using Modal code execution.
This service:
1. Generates Python code for statistical analysis using LLM
2. Executes code in Modal sandbox
3. Interprets results
4. Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)
Note: This class has NO agent_framework dependency, making it safe
to use in the simple orchestrator without the magentic extra.
"""
def __init__(self) -> None:
"""Initialize the analyzer."""
self._code_executor: Any = None
self._agent: Agent[None, str] | None = None
def _get_code_executor(self) -> Any:
"""Lazy initialization of code executor."""
if self._code_executor is None:
self._code_executor = get_code_executor()
return self._code_executor
def _get_agent(self) -> Agent[None, str]:
"""Lazy initialization of LLM agent for code generation."""
if self._agent is None:
library_versions = get_sandbox_library_prompt()
self._agent = Agent(
model=get_model(),
output_type=str,
system_prompt=f"""You are a biomedical data scientist.
Generate Python code to analyze research evidence and test hypotheses.
Guidelines:
1. Use pandas, numpy, scipy.stats for analysis
2. Print clear, interpretable results
3. Include statistical tests (t-tests, chi-square, etc.)
4. Calculate effect sizes and confidence intervals
5. Keep code concise (<50 lines)
6. Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE
Available libraries:
{library_versions}
Output format: Return ONLY executable Python code, no explanations.""",
)
return self._agent
async def analyze(
self,
query: str,
evidence: list[Evidence],
hypothesis: dict[str, Any] | None = None,
) -> AnalysisResult:
"""Run statistical analysis on evidence.
Args:
query: The research question
evidence: List of Evidence objects to analyze
hypothesis: Optional hypothesis dict with drug, target, pathway, effect
Returns:
AnalysisResult with verdict and statistics
"""
# Build analysis prompt
evidence_summary = self._summarize_evidence(evidence[:10])
hypothesis_text = ""
if hypothesis:
hypothesis_text = f"""
Hypothesis: {hypothesis.get('drug', 'Unknown')} β {hypothesis.get('target', '?')} β {hypothesis.get('pathway', '?')} β {hypothesis.get('effect', '?')}
Confidence: {hypothesis.get('confidence', 0.5):.0%}
"""
prompt = f"""Generate Python code to statistically analyze:
**Research Question**: {query}
{hypothesis_text}
**Evidence Summary**:
{evidence_summary}
Generate executable Python code to analyze this evidence."""
try:
# Generate code
agent = self._get_agent()
code_result = await agent.run(prompt)
generated_code = code_result.output
# Execute in Modal sandbox
loop = asyncio.get_running_loop()
executor = self._get_code_executor()
execution = await loop.run_in_executor(
None, partial(executor.execute, generated_code, timeout=120)
)
if not execution["success"]:
return AnalysisResult(
verdict="INCONCLUSIVE",
confidence=0.0,
statistical_evidence=f"Execution failed: {execution['error']}",
code_generated=generated_code,
execution_output=execution.get("stderr", ""),
key_findings=[],
limitations=["Code execution failed"],
)
# Interpret results
return self._interpret_results(generated_code, execution)
except CodeExecutionError as e:
return AnalysisResult(
verdict="INCONCLUSIVE",
confidence=0.0,
statistical_evidence=str(e),
code_generated="",
execution_output="",
key_findings=[],
limitations=[f"Analysis error: {e}"],
)
def _summarize_evidence(self, evidence: list[Evidence]) -> str:
"""Summarize evidence for code generation prompt."""
if not evidence:
return "No evidence available."
lines = []
for i, ev in enumerate(evidence[:5], 1):
lines.append(f"{i}. {ev.content[:200]}...")
lines.append(f" Source: {ev.citation.title}")
lines.append(f" Relevance: {ev.relevance:.0%}\n")
return "\n".join(lines)
def _interpret_results(
self,
code: str,
execution: dict[str, Any],
) -> AnalysisResult:
"""Interpret code execution results."""
stdout = execution["stdout"]
stdout_upper = stdout.upper()
# Extract verdict with robust word-boundary matching
verdict = "INCONCLUSIVE"
if re.search(r"\bSUPPORTED\b", stdout_upper) and not re.search(
r"\b(?:NOT|UN)SUPPORTED\b", stdout_upper
):
verdict = "SUPPORTED"
elif re.search(r"\bREFUTED\b", stdout_upper):
verdict = "REFUTED"
# Extract key findings
key_findings = []
for line in stdout.split("\n"):
line_lower = line.lower()
if any(kw in line_lower for kw in ["p-value", "significant", "effect", "mean"]):
key_findings.append(line.strip())
# Calculate confidence from p-values
confidence = self._calculate_confidence(stdout)
return AnalysisResult(
verdict=verdict,
confidence=confidence,
statistical_evidence=stdout.strip(),
code_generated=code,
execution_output=stdout,
key_findings=key_findings[:5],
limitations=[
"Analysis based on summary data only",
"Limited to available evidence",
"Statistical tests assume data independence",
],
)
def _calculate_confidence(self, output: str) -> float:
"""Calculate confidence based on statistical results."""
p_values = re.findall(r"p[-\s]?value[:\s]+(\d+\.?\d*)", output.lower())
if p_values:
try:
min_p = min(float(p) for p in p_values)
if min_p < 0.001:
return 0.95
elif min_p < 0.01:
return 0.90
elif min_p < 0.05:
return 0.80
else:
return 0.60
except ValueError:
pass
return 0.70 # Default
# Singleton for reuse
_analyzer: StatisticalAnalyzer | None = None
def get_statistical_analyzer() -> StatisticalAnalyzer:
"""Get or create singleton StatisticalAnalyzer instance."""
global _analyzer
if _analyzer is None:
_analyzer = StatisticalAnalyzer()
return _analyzer
5.3 Simple Orchestrator Update (src/orchestrator.py)
Uses StatisticalAnalyzer directly - NO agent_framework import.
"""Main orchestrator with optional Modal analysis."""
from src.utils.config import settings
# ... existing imports ...
class Orchestrator:
"""Search-Judge-Analyze orchestration loop."""
def __init__(
self,
search_handler: SearchHandlerProtocol,
judge_handler: JudgeHandlerProtocol,
config: OrchestratorConfig | None = None,
enable_analysis: bool = False, # New parameter
) -> None:
self.search = search_handler
self.judge = judge_handler
self.config = config or OrchestratorConfig()
self.history: list[dict[str, Any]] = []
self._enable_analysis = enable_analysis and settings.modal_available
# Lazy-load analysis (NO agent_framework dependency!)
self._analyzer: Any = None
def _get_analyzer(self) -> Any:
"""Lazy initialization of StatisticalAnalyzer.
Note: This imports from src.services, NOT src.agents,
so it works without the magentic optional dependency.
"""
if self._analyzer is None:
from src.services.statistical_analyzer import get_statistical_analyzer
self._analyzer = get_statistical_analyzer()
return self._analyzer
async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
"""Main orchestration loop with optional Modal analysis."""
# ... existing search/judge loop ...
# After judge says "synthesize", optionally run analysis
if self._enable_analysis and assessment.recommendation == "synthesize":
yield AgentEvent(
type="analyzing",
message="Running statistical analysis in Modal sandbox...",
data={},
iteration=iteration,
)
try:
analyzer = self._get_analyzer()
# Run Modal analysis (no agent_framework needed!)
analysis_result = await analyzer.analyze(
query=query,
evidence=all_evidence,
hypothesis=None, # Could add hypothesis generation later
)
yield AgentEvent(
type="analysis_complete",
message=f"Analysis verdict: {analysis_result.verdict}",
data=analysis_result.model_dump(),
iteration=iteration,
)
except Exception as e:
yield AgentEvent(
type="error",
message=f"Modal analysis failed: {e}",
data={"error": str(e)},
iteration=iteration,
)
# Continue to synthesis...
5.4 Refactor AnalysisAgent (src/agents/analysis_agent.py)
Wrap StatisticalAnalyzer for magentic mode.
"""Analysis agent for statistical analysis using Modal code execution.
This agent wraps StatisticalAnalyzer for use in magentic multi-agent mode.
The core logic is in src/services/statistical_analyzer.py to avoid
coupling agent_framework to the simple orchestrator.
"""
from collections.abc import AsyncIterable
from typing import TYPE_CHECKING, Any
from agent_framework import (
AgentRunResponse,
AgentRunResponseUpdate,
AgentThread,
BaseAgent,
ChatMessage,
Role,
)
from src.services.statistical_analyzer import (
AnalysisResult,
get_statistical_analyzer,
)
from src.utils.models import Evidence
if TYPE_CHECKING:
from src.services.embeddings import EmbeddingService
class AnalysisAgent(BaseAgent): # type: ignore[misc]
"""Wraps StatisticalAnalyzer for magentic multi-agent mode."""
def __init__(
self,
evidence_store: dict[str, Any],
embedding_service: "EmbeddingService | None" = None,
) -> None:
super().__init__(
name="AnalysisAgent",
description="Performs statistical analysis using Modal sandbox",
)
self._evidence_store = evidence_store
self._embeddings = embedding_service
self._analyzer = get_statistical_analyzer()
async def run(
self,
messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
*,
thread: AgentThread | None = None,
**kwargs: Any,
) -> AgentRunResponse:
"""Analyze evidence and return verdict."""
query = self._extract_query(messages)
hypotheses = self._evidence_store.get("hypotheses", [])
evidence = self._evidence_store.get("current", [])
if not evidence:
return self._error_response("No evidence available.")
# Get primary hypothesis if available
hypothesis_dict = None
if hypotheses:
h = hypotheses[0]
hypothesis_dict = {
"drug": getattr(h, "drug", "Unknown"),
"target": getattr(h, "target", "?"),
"pathway": getattr(h, "pathway", "?"),
"effect": getattr(h, "effect", "?"),
"confidence": getattr(h, "confidence", 0.5),
}
# Delegate to StatisticalAnalyzer
result = await self._analyzer.analyze(
query=query,
evidence=evidence,
hypothesis=hypothesis_dict,
)
# Store in shared context
self._evidence_store["analysis"] = result.model_dump()
# Format response
response_text = self._format_response(result)
return AgentRunResponse(
messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
response_id=f"analysis-{result.verdict.lower()}",
additional_properties={"analysis": result.model_dump()},
)
def _format_response(self, result: AnalysisResult) -> str:
"""Format analysis result as markdown."""
lines = [
"## Statistical Analysis Complete\n",
f"### Verdict: **{result.verdict}**",
f"**Confidence**: {result.confidence:.0%}\n",
"### Key Findings",
]
for finding in result.key_findings:
lines.append(f"- {finding}")
lines.extend([
"\n### Statistical Evidence",
"```",
result.statistical_evidence,
"```",
])
return "\n".join(lines)
def _error_response(self, message: str) -> AgentRunResponse:
"""Create error response."""
return AgentRunResponse(
messages=[ChatMessage(role=Role.ASSISTANT, text=f"**Error**: {message}")],
response_id="analysis-error",
)
def _extract_query(
self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None
) -> str:
"""Extract query from messages."""
if isinstance(messages, str):
return messages
elif isinstance(messages, ChatMessage):
return messages.text or ""
elif isinstance(messages, list):
for msg in reversed(messages):
if isinstance(msg, ChatMessage) and msg.role == Role.USER:
return msg.text or ""
elif isinstance(msg, str):
return msg
return ""
async def run_stream(
self,
messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
*,
thread: AgentThread | None = None,
**kwargs: Any,
) -> AsyncIterable[AgentRunResponseUpdate]:
"""Streaming wrapper."""
result = await self.run(messages, thread=thread, **kwargs)
yield AgentRunResponseUpdate(messages=result.messages, response_id=result.response_id)
5.5 MCP Tool for Modal Analysis (src/mcp_tools.py)
Add to existing MCP tools:
async def analyze_hypothesis(
drug: str,
condition: str,
evidence_summary: str,
) -> str:
"""Perform statistical analysis of drug repurposing hypothesis using Modal.
Executes AI-generated Python code in a secure Modal sandbox to analyze
the statistical evidence for a drug repurposing hypothesis.
Args:
drug: The drug being evaluated (e.g., "metformin")
condition: The target condition (e.g., "Alzheimer's disease")
evidence_summary: Summary of evidence to analyze
Returns:
Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
"""
from src.services.statistical_analyzer import get_statistical_analyzer
from src.utils.config import settings
from src.utils.models import Citation, Evidence
if not settings.modal_available:
return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."
# Create evidence from summary
evidence = [
Evidence(
content=evidence_summary,
citation=Citation(
source="pubmed",
title=f"Evidence for {drug} in {condition}",
url="https://example.com",
date="2024-01-01",
authors=["User Provided"],
),
relevance=0.9,
)
]
analyzer = get_statistical_analyzer()
result = await analyzer.analyze(
query=f"Can {drug} treat {condition}?",
evidence=evidence,
hypothesis={"drug": drug, "target": "unknown", "pathway": "unknown", "effect": condition},
)
return f"""## Statistical Analysis: {drug} for {condition}
### Verdict: **{result.verdict}**
**Confidence**: {result.confidence:.0%}
### Key Findings
{chr(10).join(f"- {f}" for f in result.key_findings) or "- No specific findings extracted"}
### Execution Output
{result.execution_output}
### Generated Code
```python
{result.code_generated}
Executed in Modal Sandbox - Isolated, secure, reproducible. """
### 5.6 Demo Scripts
#### `examples/modal_demo/verify_sandbox.py`
```python
#!/usr/bin/env python3
"""Verify that Modal sandbox is properly isolated.
This script proves to judges that code runs in Modal, not locally.
NO agent_framework dependency - uses only src.tools.code_execution.
Usage:
uv run python examples/modal_demo/verify_sandbox.py
"""
import asyncio
from functools import partial
from src.tools.code_execution import get_code_executor
from src.utils.config import settings
async def main() -> None:
"""Verify Modal sandbox isolation."""
if not settings.modal_available:
print("Error: Modal credentials not configured.")
print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
return
executor = get_code_executor()
loop = asyncio.get_running_loop()
print("=" * 60)
print("Modal Sandbox Isolation Verification")
print("=" * 60 + "\n")
# Test 1: Hostname
print("Test 1: Check hostname (should NOT be your machine)")
code1 = "import socket; print(f'Hostname: {socket.gethostname()}')"
result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
print(f" {result1['stdout'].strip()}\n")
# Test 2: Scientific libraries
print("Test 2: Verify scientific libraries")
code2 = """
import pandas as pd
import numpy as np
import scipy
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"scipy: {scipy.__version__}")
"""
result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
print(f" {result2['stdout'].strip()}\n")
# Test 3: Network blocked
print("Test 3: Verify network isolation")
code3 = """
import urllib.request
try:
urllib.request.urlopen("https://google.com", timeout=2)
print("Network: ALLOWED (unexpected!)")
except Exception:
print("Network: BLOCKED (as expected)")
"""
result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
print(f" {result3['stdout'].strip()}\n")
# Test 4: Real statistics
print("Test 4: Execute statistical analysis")
code4 = """
import pandas as pd
import scipy.stats as stats
data = pd.DataFrame({'effect': [0.42, 0.38, 0.51]})
mean = data['effect'].mean()
t_stat, p_val = stats.ttest_1samp(data['effect'], 0)
print(f"Mean Effect: {mean:.3f}")
print(f"P-value: {p_val:.4f}")
print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
"""
result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
print(f" {result4['stdout'].strip()}\n")
print("=" * 60)
print("All tests complete - Modal sandbox verified!")
print("=" * 60)
if __name__ == "__main__":
asyncio.run(main())
examples/modal_demo/run_analysis.py
#!/usr/bin/env python3
"""Demo: Modal-powered statistical analysis.
This script uses StatisticalAnalyzer directly (NO agent_framework dependency).
Usage:
uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
"""
import argparse
import asyncio
import os
import sys
from src.services.statistical_analyzer import get_statistical_analyzer
from src.tools.pubmed import PubMedTool
from src.utils.config import settings
async def main() -> None:
"""Run the Modal analysis demo."""
parser = argparse.ArgumentParser(description="Modal Analysis Demo")
parser.add_argument("query", help="Research query")
args = parser.parse_args()
if not settings.modal_available:
print("Error: Modal credentials not configured.")
sys.exit(1)
if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
print("Error: No LLM API key found.")
sys.exit(1)
print(f"\n{'=' * 60}")
print("DeepCritical Modal Analysis Demo")
print(f"Query: {args.query}")
print(f"{'=' * 60}\n")
# Step 1: Gather Evidence
print("Step 1: Gathering evidence from PubMed...")
pubmed = PubMedTool()
evidence = await pubmed.search(args.query, max_results=5)
print(f" Found {len(evidence)} papers\n")
# Step 2: Run Modal Analysis
print("Step 2: Running statistical analysis in Modal sandbox...")
analyzer = get_statistical_analyzer()
result = await analyzer.analyze(query=args.query, evidence=evidence)
# Step 3: Display Results
print("\n" + "=" * 60)
print("ANALYSIS RESULTS")
print("=" * 60)
print(f"\nVerdict: {result.verdict}")
print(f"Confidence: {result.confidence:.0%}")
print("\nKey Findings:")
for finding in result.key_findings:
print(f" - {finding}")
print("\n[Demo Complete - Code executed in Modal, not locally]")
if __name__ == "__main__":
asyncio.run(main())
6. TDD Test Suite
6.1 Unit Tests (tests/unit/services/test_statistical_analyzer.py)
"""Unit tests for StatisticalAnalyzer service."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from src.services.statistical_analyzer import (
AnalysisResult,
StatisticalAnalyzer,
get_statistical_analyzer,
)
from src.utils.models import Citation, Evidence
@pytest.fixture
def sample_evidence() -> list[Evidence]:
"""Sample evidence for testing."""
return [
Evidence(
content="Metformin shows effect size of 0.45.",
citation=Citation(
source="pubmed",
title="Metformin Study",
url="https://pubmed.ncbi.nlm.nih.gov/12345/",
date="2024-01-15",
authors=["Smith J"],
),
relevance=0.9,
)
]
class TestStatisticalAnalyzer:
"""Tests for StatisticalAnalyzer (no agent_framework dependency)."""
def test_no_agent_framework_import(self) -> None:
"""StatisticalAnalyzer must NOT import agent_framework."""
import src.services.statistical_analyzer as module
# Check module doesn't import agent_framework
source = open(module.__file__).read()
assert "agent_framework" not in source
assert "BaseAgent" not in source
@pytest.mark.asyncio
async def test_analyze_returns_result(
self, sample_evidence: list[Evidence]
) -> None:
"""analyze() should return AnalysisResult."""
analyzer = StatisticalAnalyzer()
with patch.object(analyzer, "_get_agent") as mock_agent, \
patch.object(analyzer, "_get_code_executor") as mock_executor:
# Mock LLM
mock_agent.return_value.run = AsyncMock(
return_value=MagicMock(output="print('SUPPORTED')")
)
# Mock Modal
mock_executor.return_value.execute.return_value = {
"stdout": "SUPPORTED\np-value: 0.01",
"stderr": "",
"success": True,
}
result = await analyzer.analyze("test query", sample_evidence)
assert isinstance(result, AnalysisResult)
assert result.verdict == "SUPPORTED"
def test_singleton(self) -> None:
"""get_statistical_analyzer should return singleton."""
a1 = get_statistical_analyzer()
a2 = get_statistical_analyzer()
assert a1 is a2
class TestAnalysisResult:
"""Tests for AnalysisResult model."""
def test_verdict_values(self) -> None:
"""Verdict should be one of the expected values."""
for verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]:
result = AnalysisResult(
verdict=verdict,
confidence=0.8,
statistical_evidence="test",
code_generated="print('test')",
execution_output="test",
)
assert result.verdict == verdict
def test_confidence_bounds(self) -> None:
"""Confidence must be 0.0-1.0."""
with pytest.raises(ValueError):
AnalysisResult(
verdict="SUPPORTED",
confidence=1.5, # Invalid
statistical_evidence="test",
code_generated="test",
execution_output="test",
)
6.2 Integration Test (tests/integration/test_modal.py)
"""Integration tests for Modal (requires credentials)."""
import pytest
from src.utils.config import settings
@pytest.mark.integration
@pytest.mark.skipif(not settings.modal_available, reason="Modal not configured")
class TestModalIntegration:
"""Integration tests requiring Modal credentials."""
@pytest.mark.asyncio
async def test_sandbox_executes_code(self) -> None:
"""Modal sandbox should execute Python code."""
import asyncio
from functools import partial
from src.tools.code_execution import get_code_executor
executor = get_code_executor()
code = "import pandas as pd; print(pd.DataFrame({'a': [1,2,3]})['a'].sum())"
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(
None, partial(executor.execute, code, timeout=30)
)
assert result["success"]
assert "6" in result["stdout"]
@pytest.mark.asyncio
async def test_statistical_analyzer_works(self) -> None:
"""StatisticalAnalyzer should work end-to-end."""
from src.services.statistical_analyzer import get_statistical_analyzer
from src.utils.models import Citation, Evidence
evidence = [
Evidence(
content="Drug shows 40% improvement in trial.",
citation=Citation(
source="pubmed",
title="Test",
url="https://test.com",
date="2024-01-01",
authors=["Test"],
),
relevance=0.9,
)
]
analyzer = get_statistical_analyzer()
result = await analyzer.analyze("test drug efficacy", evidence)
assert result.verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
assert 0.0 <= result.confidence <= 1.0
7. Verification Commands
# 1. Verify NO agent_framework in StatisticalAnalyzer
grep -r "agent_framework" src/services/statistical_analyzer.py
# Should return nothing!
# 2. Run unit tests (no Modal needed)
uv run pytest tests/unit/services/test_statistical_analyzer.py -v
# 3. Run verification script (requires Modal)
uv run python examples/modal_demo/verify_sandbox.py
# 4. Run analysis demo (requires Modal + LLM)
uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
# 5. Run integration tests
uv run pytest tests/integration/test_modal.py -v -m integration
# 6. Full test suite
make check
8. Definition of Done
Phase 13 is COMPLETE when:
-
src/services/statistical_analyzer.pycreated (NO agent_framework) -
src/utils/config.pyhasenable_modal_analysissetting -
src/orchestrator.pyusesStatisticalAnalyzerdirectly -
src/agents/analysis_agent.pyrefactored to wrapStatisticalAnalyzer -
src/mcp_tools.pyhasanalyze_hypothesistool -
examples/modal_demo/verify_sandbox.pyworking -
examples/modal_demo/run_analysis.pyworking - Unit tests pass WITHOUT magentic extra installed
- Integration tests pass WITH Modal credentials
- All lints pass
9. Architecture After Phase 13
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Clients β
β (Claude Desktop, Cursor, etc.) β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β MCP Protocol
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio App + MCP Server β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MCP Tools: search_pubmed, search_trials, search_biorxiv β β
β β search_all, analyze_hypothesis β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββ΄ββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββ βββββββββββββββββββββββββββββ
β Simple Orchestrator β β Magentic Orchestrator β
β (no agent_framework) β β (with agent_framework) β
β β β β
β SearchHandler β β SearchAgent β
β JudgeHandler β β JudgeAgent β
β StatisticalAnalyzer ββΌβββββββββββββΌβ AnalysisAgent ββββββββββββ€
β β β (wraps StatisticalAnalyzer)
βββββββββββββ¬ββββββββββββ βββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β StatisticalAnalyzer β
β (src/services/statistical_analyzer.py) β
β NO agent_framework dependency β
β β
β 1. Generate code with pydantic-ai β
β 2. Execute in Modal sandbox β
β 3. Return AnalysisResult β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Modal Sandbox β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - pandas, numpy, scipy, sklearn, statsmodels β β
β β - Network: BLOCKED β β
β β - Filesystem: ISOLATED β β
β β - Timeout: ENFORCED β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This is the dependency-safe Modal stack.
10. Files Summary
| File | Action | Purpose |
|---|---|---|
src/services/statistical_analyzer.py |
CREATE | Core analysis (no agent_framework) |
src/utils/config.py |
MODIFY | Add enable_modal_analysis |
src/orchestrator.py |
MODIFY | Use StatisticalAnalyzer |
src/agents/analysis_agent.py |
MODIFY | Wrap StatisticalAnalyzer |
src/mcp_tools.py |
MODIFY | Add analyze_hypothesis |
examples/modal_demo/verify_sandbox.py |
CREATE | Sandbox verification |
examples/modal_demo/run_analysis.py |
CREATE | Demo script |
tests/unit/services/test_statistical_analyzer.py |
CREATE | Unit tests |
tests/integration/test_modal.py |
CREATE | Integration tests |
Key Fix: StatisticalAnalyzer has ZERO agent_framework imports, making it safe for the simple orchestrator.