Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /bugs /P0_MAGENTIC_AND_SEARCH_AUDIT.md

VibecoderMcSwaggins

refactor(tools): replace BioRxiv with Europe PMC (Phase 01)

2f8ae1f 21 days ago

preview code

raw

history blame

6.68 kB

P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools

Date: November 27, 2025 Auditor: Claude Code Status: VERIFIED

TL;DR

Component	Status	Verdict
Microsoft Agent Framework	✅ WORKING	Correctly wired, no bugs
GPT-5.1 Model Config	✅ CORRECT	Using `gpt-5.1` as configured
Search Tools	❌ BROKEN	Root cause of garbage results

The orchestration framework is fine. The search layer is garbage.

Microsoft Agent Framework Verification

Import Test: PASSED

from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful

Agent Creation Test: PASSED

from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)

Workflow Build Test: PASSED

from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>

Model Configuration: CORRECT

settings.openai_model = "gpt-5.1"  # ✅ Using GPT-5.1, not GPT-4o
settings.openai_api_key = True     # ✅ API key is set

What Magentic Provides (Working)

Multi-Agent Coordination
- Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
- Uses MagenticBuilder().with_standard_manager() for coordination
ChatAgent Pattern
- Each agent has internal LLM (GPT-5.1)
- Can call tools via @ai_function decorator
- Has proper instructions for domain-specific tasks
Workflow Streaming
- Events: MagenticAgentMessageEvent, MagenticFinalResultEvent, etc.
- Real-time UI updates via workflow.run_stream(task)
State Management
- MagenticState persists evidence across agents
- get_bibliography() tool for ReportAgent

What's Actually Broken: The Search Tools

File: `src/agents/tools.py`

The Magentic agents call these tools:

search_pubmed → Uses PubMedTool
search_clinical_trials → Uses ClinicalTrialsTool
search_preprints → Uses BioRxivTool

These tools are the problem, not the framework.

Search Tool Bugs (Detailed)

BUG 1: BioRxiv API Does Not Support Search

File: src/tools/biorxiv.py:248-286

# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"

# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)

Problem:

Fetches 100 random chronological papers
Filters for ANY keyword match in title/abstract
"Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once

Fix: Remove BioRxiv or use Europe PMC (which has actual search)

BUG 2: PubMed Query Not Optimized

File: src/tools/pubmed.py:54-71

search_params = self._build_params(
    db="pubmed",
    term=query,  # RAW USER QUERY - no preprocessing!
    retmax=max_results,
    sort="relevance",
)

Problem:

User enters: "What medications show promise for Long COVID?"
PubMed receives: What medications show promise for Long COVID?
Should receive: ("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])

Fix: Add query preprocessing:

Strip question words (what, which, how, etc.)
Expand medical synonyms (Long COVID → PASC, Post-COVID)
Use MeSH terms for better recall

BUG 3: ClinicalTrials.gov No Filtering

File: src/tools/clinicaltrials.py

Returns ALL trials including:

Withdrawn trials
Terminated trials
Observational studies (not drug interventions)
Phase 1 (no efficacy data)

Fix: Filter by:

studyType=INTERVENTIONAL
phase=PHASE2,PHASE3,PHASE4
status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING

Evidence: Garbage In → Garbage Out

When the Magentic SearchAgent calls these tools:

SearchAgent: "Find evidence for Long COVID medications"
    │
    ▼
search_pubmed("Long COVID medications")
    → Returns 1 semi-relevant paper (raw query hits)

search_preprints("Long COVID medications")
    → Returns garbage (BioRxiv API doesn't search)
    → "Calf muscle adaptations" (has "COVID" somewhere)
    → "Ophthalmologist work-life balance" (mentions COVID)

search_clinical_trials("Long COVID medications")
    → Returns all trials, no filtering
    │
    ▼
JudgeAgent receives garbage evidence
    │
    ▼
HypothesisAgent can't generate good hypotheses from garbage
    │
    ▼
ReportAgent produces garbage report

The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.

Recommended Fixes

Priority 1: Delete or Fix BioRxiv (30 min)

Option A: Delete it

# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)

Option B: Replace with Europe PMC Europe PMC has preprints AND proper search API:

https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json

Priority 2: Fix PubMed Query (1 hour)

Add query preprocessor:

def preprocess_query(raw_query: str) -> str:
    """Convert natural language to PubMed query syntax."""
    # Strip question words
    # Expand medical synonyms
    # Add field tags [Title/Abstract]
    # Return optimized query

Priority 3: Filter ClinicalTrials (30 min)

Add parameters to API call:

params = {
    "query.term": query,
    "filter.overallStatus": "COMPLETED,RECRUITING",
    "filter.studyType": "INTERVENTIONAL",
    "pageSize": max_results,
}

Conclusion

Microsoft Agent Framework: NO BUGS FOUND

Imports work ✅
Agent creation works ✅
Workflow building works ✅
Model config correct (GPT-5.1) ✅
Streaming events work ✅

Search Tools: CRITICALLY BROKEN

BioRxiv: API doesn't support search (fundamental)
PubMed: No query optimization (fixable)
ClinicalTrials: No filtering (fixable)

Recommendation:

Delete BioRxiv immediately (unusable)
Add PubMed query preprocessing
Add ClinicalTrials filtering
Then the Magentic multi-agent system will work as designed