Spaces:
Running
Running
| # Phase 10 Implementation Spec: ClinicalTrials.gov Integration | |
| **Goal**: Add clinical trial search for drug repurposing evidence. | |
| **Philosophy**: "Clinical trials are the bridge from hypothesis to therapy." | |
| **Prerequisite**: Phase 9 complete (DuckDuckGo removed) | |
| **Estimated Time**: 2-3 hours | |
| --- | |
| ## 1. Why ClinicalTrials.gov? | |
| ### Scientific Value | |
| | Feature | Value for Drug Repurposing | | |
| |---------|---------------------------| | |
| | **400,000+ studies** | Massive evidence base | | |
| | **Trial phase data** | Phase I/II/III = evidence strength | | |
| | **Intervention details** | Exact drug + dosing | | |
| | **Outcome measures** | What was measured | | |
| | **Status tracking** | Completed vs recruiting | | |
| | **Free API** | No cost, no key required | | |
| ### Example Query Response | |
| Query: "metformin Alzheimer's" | |
| ```json | |
| { | |
| "studies": [ | |
| { | |
| "nctId": "NCT04098666", | |
| "briefTitle": "Metformin in Alzheimer's Dementia Prevention", | |
| "phase": "Phase 2", | |
| "status": "Recruiting", | |
| "conditions": ["Alzheimer Disease"], | |
| "interventions": ["Drug: Metformin"] | |
| } | |
| ] | |
| } | |
| ``` | |
| **This is GOLD for drug repurposing** - actual trials testing the hypothesis! | |
| --- | |
| ## 2. API Specification | |
| ### Endpoint | |
| ``` | |
| Base URL: https://clinicaltrials.gov/api/v2/studies | |
| ``` | |
| ### Key Parameters | |
| | Parameter | Description | Example | | |
| |-----------|-------------|---------| | |
| | `query.cond` | Condition/disease | `Alzheimer` | | |
| | `query.intr` | Intervention/drug | `Metformin` | | |
| | `query.term` | General search | `metformin alzheimer` | | |
| | `pageSize` | Results per page | `20` | | |
| | `fields` | Fields to return | See below | | |
| ### Fields We Need | |
| ``` | |
| NCTId, BriefTitle, Phase, OverallStatus, Condition, | |
| InterventionName, StartDate, CompletionDate, BriefSummary | |
| ``` | |
| ### Rate Limits | |
| - ~50 requests/minute per IP | |
| - No authentication required | |
| - Paginated (100 results max per call) | |
| ### Documentation | |
| - [API v2 Docs](https://clinicaltrials.gov/data-api/api) | |
| - [Migration Guide](https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_clinicaltrials_api.html) | |
| --- | |
| ## 3. Data Model | |
| ### 3.1 Update Citation Source Type (`src/utils/models.py`) | |
| ```python | |
| # BEFORE | |
| source: Literal["pubmed", "web"] | |
| # AFTER | |
| source: Literal["pubmed", "clinicaltrials", "biorxiv"] | |
| ``` | |
| ### 3.2 Evidence from Clinical Trials | |
| Clinical trial data maps to our existing `Evidence` model: | |
| ```python | |
| Evidence( | |
| content=f"{brief_summary}. Phase: {phase}. Status: {status}.", | |
| citation=Citation( | |
| source="clinicaltrials", | |
| title=brief_title, | |
| url=f"https://clinicaltrials.gov/study/{nct_id}", | |
| date=start_date or "Unknown", | |
| authors=[] # Trials don't have authors in the same way | |
| ), | |
| relevance=0.8 # Trials are highly relevant for repurposing | |
| ) | |
| ``` | |
| --- | |
| ## 4. Implementation | |
| ### 4.0 Important: HTTP Client Selection | |
| **ClinicalTrials.gov's WAF blocks `httpx`'s TLS fingerprint.** Use `requests` instead. | |
| | Library | Status | Notes | | |
| |---------|--------|-------| | |
| | `httpx` | ❌ 403 Blocked | TLS/JA3 fingerprint flagged | | |
| | `httpx[http2]` | ❌ 403 Blocked | HTTP/2 doesn't help | | |
| | `requests` | ✅ Works | Industry standard, not blocked | | |
| | `urllib` | ✅ Works | Stdlib alternative | | |
| We use `requests` wrapped in `asyncio.to_thread()` for async compatibility. | |
| ### 4.1 ClinicalTrials Tool (`src/tools/clinicaltrials.py`) | |
| ```python | |
| """ClinicalTrials.gov search tool using API v2.""" | |
| import asyncio | |
| from typing import Any, ClassVar | |
| import requests | |
| from tenacity import retry, stop_after_attempt, wait_exponential | |
| from src.utils.exceptions import SearchError | |
| from src.utils.models import Citation, Evidence | |
| class ClinicalTrialsTool: | |
| """Search tool for ClinicalTrials.gov. | |
| Note: Uses `requests` library instead of `httpx` because ClinicalTrials.gov's | |
| WAF blocks httpx's TLS fingerprint. The `requests` library is not blocked. | |
| """ | |
| BASE_URL = "https://clinicaltrials.gov/api/v2/studies" | |
| FIELDS: ClassVar[list[str]] = [ | |
| "NCTId", | |
| "BriefTitle", | |
| "Phase", | |
| "OverallStatus", | |
| "Condition", | |
| "InterventionName", | |
| "StartDate", | |
| "BriefSummary", | |
| ] | |
| @property | |
| def name(self) -> str: | |
| return "clinicaltrials" | |
| @retry( | |
| stop=stop_after_attempt(3), | |
| wait=wait_exponential(multiplier=1, min=1, max=10), | |
| reraise=True, | |
| ) | |
| async def search(self, query: str, max_results: int = 10) -> list[Evidence]: | |
| """Search ClinicalTrials.gov for studies.""" | |
| params = { | |
| "query.term": query, | |
| "pageSize": min(max_results, 100), | |
| "fields": "|".join(self.FIELDS), | |
| } | |
| try: | |
| # Run blocking requests.get in a separate thread for async compatibility | |
| response = await asyncio.to_thread( | |
| requests.get, | |
| self.BASE_URL, | |
| params=params, | |
| headers={"User-Agent": "DeepCritical-Research-Agent/1.0"}, | |
| timeout=30, | |
| ) | |
| response.raise_for_status() | |
| data = response.json() | |
| studies = data.get("studies", []) | |
| return [self._study_to_evidence(study) for study in studies[:max_results]] | |
| except requests.HTTPError as e: | |
| raise SearchError(f"ClinicalTrials.gov API error: {e}") from e | |
| except requests.RequestException as e: | |
| raise SearchError(f"ClinicalTrials.gov request failed: {e}") from e | |
| def _study_to_evidence(self, study: dict) -> Evidence: | |
| """Convert a clinical trial study to Evidence.""" | |
| # Navigate nested structure | |
| protocol = study.get("protocolSection", {}) | |
| id_module = protocol.get("identificationModule", {}) | |
| status_module = protocol.get("statusModule", {}) | |
| desc_module = protocol.get("descriptionModule", {}) | |
| design_module = protocol.get("designModule", {}) | |
| conditions_module = protocol.get("conditionsModule", {}) | |
| arms_module = protocol.get("armsInterventionsModule", {}) | |
| nct_id = id_module.get("nctId", "Unknown") | |
| title = id_module.get("briefTitle", "Untitled Study") | |
| status = status_module.get("overallStatus", "Unknown") | |
| start_date = status_module.get("startDateStruct", {}).get("date", "Unknown") | |
| # Get phase (might be a list) | |
| phases = design_module.get("phases", []) | |
| phase = phases[0] if phases else "Not Applicable" | |
| # Get conditions | |
| conditions = conditions_module.get("conditions", []) | |
| conditions_str = ", ".join(conditions[:3]) if conditions else "Unknown" | |
| # Get interventions | |
| interventions = arms_module.get("interventions", []) | |
| intervention_names = [i.get("name", "") for i in interventions[:3]] | |
| interventions_str = ", ".join(intervention_names) if intervention_names else "Unknown" | |
| # Get summary | |
| summary = desc_module.get("briefSummary", "No summary available.") | |
| # Build content with key trial info | |
| content = ( | |
| f"{summary[:500]}... " | |
| f"Trial Phase: {phase}. " | |
| f"Status: {status}. " | |
| f"Conditions: {conditions_str}. " | |
| f"Interventions: {interventions_str}." | |
| ) | |
| return Evidence( | |
| content=content[:2000], | |
| citation=Citation( | |
| source="clinicaltrials", | |
| title=title[:500], | |
| url=f"https://clinicaltrials.gov/study/{nct_id}", | |
| date=start_date, | |
| authors=[], # Trials don't have traditional authors | |
| ), | |
| relevance=0.85, # Trials are highly relevant for repurposing | |
| ) | |
| ``` | |
| --- | |
| ## 5. TDD Test Suite | |
| ### 5.1 Unit Tests (`tests/unit/tools/test_clinicaltrials.py`) | |
| Uses `unittest.mock.patch` to mock `requests.get` (not `respx` since we're not using `httpx`). | |
| ```python | |
| """Unit tests for ClinicalTrials.gov tool.""" | |
| from unittest.mock import MagicMock, patch | |
| import pytest | |
| import requests | |
| from src.tools.clinicaltrials import ClinicalTrialsTool | |
| from src.utils.exceptions import SearchError | |
| from src.utils.models import Evidence | |
| @pytest.fixture | |
| def mock_clinicaltrials_response() -> dict: | |
| """Mock ClinicalTrials.gov API response.""" | |
| return { | |
| "studies": [ | |
| { | |
| "protocolSection": { | |
| "identificationModule": { | |
| "nctId": "NCT04098666", | |
| "briefTitle": "Metformin in Alzheimer's Dementia Prevention", | |
| }, | |
| "statusModule": { | |
| "overallStatus": "Recruiting", | |
| "startDateStruct": {"date": "2020-01-15"}, | |
| }, | |
| "descriptionModule": { | |
| "briefSummary": "This study evaluates metformin for Alzheimer's prevention." | |
| }, | |
| "designModule": {"phases": ["PHASE2"]}, | |
| "conditionsModule": {"conditions": ["Alzheimer Disease", "Dementia"]}, | |
| "armsInterventionsModule": { | |
| "interventions": [{"name": "Metformin", "type": "Drug"}] | |
| }, | |
| } | |
| } | |
| ] | |
| } | |
| class TestClinicalTrialsTool: | |
| """Tests for ClinicalTrialsTool.""" | |
| def test_tool_name(self) -> None: | |
| """Tool should have correct name.""" | |
| tool = ClinicalTrialsTool() | |
| assert tool.name == "clinicaltrials" | |
| @pytest.mark.asyncio | |
| async def test_search_returns_evidence( | |
| self, mock_clinicaltrials_response: dict | |
| ) -> None: | |
| """Search should return Evidence objects.""" | |
| with patch("src.tools.clinicaltrials.requests.get") as mock_get: | |
| mock_response = MagicMock() | |
| mock_response.json.return_value = mock_clinicaltrials_response | |
| mock_response.raise_for_status = MagicMock() | |
| mock_get.return_value = mock_response | |
| tool = ClinicalTrialsTool() | |
| results = await tool.search("metformin alzheimer", max_results=5) | |
| assert len(results) == 1 | |
| assert isinstance(results[0], Evidence) | |
| assert results[0].citation.source == "clinicaltrials" | |
| assert "NCT04098666" in results[0].citation.url | |
| assert "Metformin" in results[0].citation.title | |
| @pytest.mark.asyncio | |
| async def test_search_api_error(self) -> None: | |
| """Search should raise SearchError on API failure.""" | |
| with patch("src.tools.clinicaltrials.requests.get") as mock_get: | |
| mock_response = MagicMock() | |
| mock_response.raise_for_status.side_effect = requests.HTTPError( | |
| "500 Server Error" | |
| ) | |
| mock_get.return_value = mock_response | |
| tool = ClinicalTrialsTool() | |
| with pytest.raises(SearchError): | |
| await tool.search("metformin alzheimer") | |
| class TestClinicalTrialsIntegration: | |
| """Integration tests (marked for separate run).""" | |
| @pytest.mark.integration | |
| @pytest.mark.asyncio | |
| async def test_real_api_call(self) -> None: | |
| """Test actual API call (requires network).""" | |
| tool = ClinicalTrialsTool() | |
| results = await tool.search("metformin diabetes", max_results=3) | |
| assert len(results) > 0 | |
| assert all(isinstance(r, Evidence) for r in results) | |
| assert all(r.citation.source == "clinicaltrials" for r in results) | |
| ``` | |
| --- | |
| ## 6. Integration with SearchHandler | |
| ### 6.1 Update Example Files | |
| ```python | |
| # examples/search_demo/run_search.py | |
| from src.tools.clinicaltrials import ClinicalTrialsTool | |
| from src.tools.pubmed import PubMedTool | |
| from src.tools.search_handler import SearchHandler | |
| search_handler = SearchHandler( | |
| tools=[PubMedTool(), ClinicalTrialsTool()], | |
| timeout=30.0 | |
| ) | |
| ``` | |
| ### 6.2 Update SearchResult Type | |
| ```python | |
| # src/utils/models.py | |
| sources_searched: list[Literal["pubmed", "clinicaltrials"]] | |
| ``` | |
| --- | |
| ## 7. Definition of Done | |
| Phase 10 is **COMPLETE** when: | |
| - [ ] `src/tools/clinicaltrials.py` implemented | |
| - [ ] Unit tests in `tests/unit/tools/test_clinicaltrials.py` | |
| - [ ] Integration test marked with `@pytest.mark.integration` | |
| - [ ] SearchHandler updated to include ClinicalTrialsTool | |
| - [ ] Type definitions updated in models.py | |
| - [ ] Example files updated | |
| - [ ] All unit tests pass | |
| - [ ] Lints pass | |
| - [ ] Manual verification with real API | |
| --- | |
| ## 8. Verification Commands | |
| ```bash | |
| # 1. Run unit tests | |
| uv run pytest tests/unit/tools/test_clinicaltrials.py -v | |
| # 2. Run integration test (requires network) | |
| uv run pytest tests/unit/tools/test_clinicaltrials.py -v -m integration | |
| # 3. Run full test suite | |
| uv run pytest tests/unit/ -v | |
| # 4. Run example | |
| source .env && uv run python examples/search_demo/run_search.py "metformin alzheimer" | |
| # Should show results from BOTH PubMed AND ClinicalTrials.gov | |
| ``` | |
| --- | |
| ## 9. Value Delivered | |
| | Before | After | | |
| |--------|-------| | |
| | Papers only | Papers + Clinical Trials | | |
| | "Drug X might help" | "Drug X is in Phase II trial" | | |
| | No trial status | Recruiting/Completed/Terminated | | |
| | No phase info | Phase I/II/III evidence strength | | |
| **Demo pitch addition**: | |
| > "DeepCritical searches PubMed for peer-reviewed evidence AND ClinicalTrials.gov for 400,000+ clinical trials." | |