Spaces:
Running
P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools
Date: November 27, 2025 Auditor: Claude Code Status: VERIFIED
TL;DR
| Component | Status | Verdict |
|---|---|---|
| Microsoft Agent Framework | β WORKING | Correctly wired, no bugs |
| GPT-5.1 Model Config | β CORRECT | Using gpt-5.1 as configured |
| Search Tools | β BROKEN | Root cause of garbage results |
The orchestration framework is fine. The search layer is garbage.
Microsoft Agent Framework Verification
Import Test: PASSED
from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful
Agent Creation Test: PASSED
from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)
Workflow Build Test: PASSED
from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>
Model Configuration: CORRECT
settings.openai_model = "gpt-5.1" # β
Using GPT-5.1, not GPT-4o
settings.openai_api_key = True # β
API key is set
What Magentic Provides (Working)
Multi-Agent Coordination
- Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
- Uses
MagenticBuilder().with_standard_manager()for coordination
ChatAgent Pattern
- Each agent has internal LLM (GPT-5.1)
- Can call tools via
@ai_functiondecorator - Has proper instructions for domain-specific tasks
Workflow Streaming
- Events:
MagenticAgentMessageEvent,MagenticFinalResultEvent, etc. - Real-time UI updates via
workflow.run_stream(task)
- Events:
State Management
MagenticStatepersists evidence across agentsget_bibliography()tool for ReportAgent
What's Actually Broken: The Search Tools
File: src/agents/tools.py
The Magentic agents call these tools:
search_pubmedβ UsesPubMedToolsearch_clinical_trialsβ UsesClinicalTrialsToolsearch_preprintsβ UsesBioRxivTool
These tools are the problem, not the framework.
Search Tool Bugs (Detailed)
BUG 1: BioRxiv API Does Not Support Search
File: src/tools/biorxiv.py:248-286
# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)
Problem:
- Fetches 100 random chronological papers
- Filters for ANY keyword match in title/abstract
- "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once
Fix: Remove BioRxiv or use Europe PMC (which has actual search)
BUG 2: PubMed Query Not Optimized
File: src/tools/pubmed.py:54-71
search_params = self._build_params(
db="pubmed",
term=query, # RAW USER QUERY - no preprocessing!
retmax=max_results,
sort="relevance",
)
Problem:
- User enters: "What medications show promise for Long COVID?"
- PubMed receives:
What medications show promise for Long COVID? - Should receive:
("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])
Fix: Add query preprocessing:
- Strip question words (what, which, how, etc.)
- Expand medical synonyms (Long COVID β PASC, Post-COVID)
- Use MeSH terms for better recall
BUG 3: ClinicalTrials.gov No Filtering
File: src/tools/clinicaltrials.py
Returns ALL trials including:
- Withdrawn trials
- Terminated trials
- Observational studies (not drug interventions)
- Phase 1 (no efficacy data)
Fix: Filter by:
studyType=INTERVENTIONALphase=PHASE2,PHASE3,PHASE4status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING
Evidence: Garbage In β Garbage Out
When the Magentic SearchAgent calls these tools:
SearchAgent: "Find evidence for Long COVID medications"
β
βΌ
search_pubmed("Long COVID medications")
β Returns 1 semi-relevant paper (raw query hits)
search_preprints("Long COVID medications")
β Returns garbage (BioRxiv API doesn't search)
β "Calf muscle adaptations" (has "COVID" somewhere)
β "Ophthalmologist work-life balance" (mentions COVID)
search_clinical_trials("Long COVID medications")
β Returns all trials, no filtering
β
βΌ
JudgeAgent receives garbage evidence
β
βΌ
HypothesisAgent can't generate good hypotheses from garbage
β
βΌ
ReportAgent produces garbage report
The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.
Recommended Fixes
Priority 1: Delete or Fix BioRxiv (30 min)
Option A: Delete it
# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)
Option B: Replace with Europe PMC Europe PMC has preprints AND proper search API:
https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json
Priority 2: Fix PubMed Query (1 hour)
Add query preprocessor:
def preprocess_query(raw_query: str) -> str:
"""Convert natural language to PubMed query syntax."""
# Strip question words
# Expand medical synonyms
# Add field tags [Title/Abstract]
# Return optimized query
Priority 3: Filter ClinicalTrials (30 min)
Add parameters to API call:
params = {
"query.term": query,
"filter.overallStatus": "COMPLETED,RECRUITING",
"filter.studyType": "INTERVENTIONAL",
"pageSize": max_results,
}
Conclusion
Microsoft Agent Framework: NO BUGS FOUND
- Imports work β
- Agent creation works β
- Workflow building works β
- Model config correct (GPT-5.1) β
- Streaming events work β
Search Tools: CRITICALLY BROKEN
- BioRxiv: API doesn't support search (fundamental)
- PubMed: No query optimization (fixable)
- ClinicalTrials: No filtering (fixable)
Recommendation:
- Delete BioRxiv immediately (unusable)
- Add PubMed query preprocessing
- Add ClinicalTrials filtering
- Then the Magentic multi-agent system will work as designed