DeepCritical / docs /bugs /P0_MAGENTIC_AND_SEARCH_AUDIT.md
VibecoderMcSwaggins's picture
refactor(tools): replace BioRxiv with Europe PMC (Phase 01)
2f8ae1f
|
raw
history blame
6.68 kB

P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools

Date: November 27, 2025 Auditor: Claude Code Status: VERIFIED


TL;DR

Component Status Verdict
Microsoft Agent Framework βœ… WORKING Correctly wired, no bugs
GPT-5.1 Model Config βœ… CORRECT Using gpt-5.1 as configured
Search Tools ❌ BROKEN Root cause of garbage results

The orchestration framework is fine. The search layer is garbage.


Microsoft Agent Framework Verification

Import Test: PASSED

from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful

Agent Creation Test: PASSED

from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)

Workflow Build Test: PASSED

from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>

Model Configuration: CORRECT

settings.openai_model = "gpt-5.1"  # βœ… Using GPT-5.1, not GPT-4o
settings.openai_api_key = True     # βœ… API key is set

What Magentic Provides (Working)

  1. Multi-Agent Coordination

    • Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
    • Uses MagenticBuilder().with_standard_manager() for coordination
  2. ChatAgent Pattern

    • Each agent has internal LLM (GPT-5.1)
    • Can call tools via @ai_function decorator
    • Has proper instructions for domain-specific tasks
  3. Workflow Streaming

    • Events: MagenticAgentMessageEvent, MagenticFinalResultEvent, etc.
    • Real-time UI updates via workflow.run_stream(task)
  4. State Management

    • MagenticState persists evidence across agents
    • get_bibliography() tool for ReportAgent

What's Actually Broken: The Search Tools

File: src/agents/tools.py

The Magentic agents call these tools:

  • search_pubmed β†’ Uses PubMedTool
  • search_clinical_trials β†’ Uses ClinicalTrialsTool
  • search_preprints β†’ Uses BioRxivTool

These tools are the problem, not the framework.


Search Tool Bugs (Detailed)

BUG 1: BioRxiv API Does Not Support Search

File: src/tools/biorxiv.py:248-286

# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"

# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)

Problem:

  • Fetches 100 random chronological papers
  • Filters for ANY keyword match in title/abstract
  • "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once

Fix: Remove BioRxiv or use Europe PMC (which has actual search)


BUG 2: PubMed Query Not Optimized

File: src/tools/pubmed.py:54-71

search_params = self._build_params(
    db="pubmed",
    term=query,  # RAW USER QUERY - no preprocessing!
    retmax=max_results,
    sort="relevance",
)

Problem:

  • User enters: "What medications show promise for Long COVID?"
  • PubMed receives: What medications show promise for Long COVID?
  • Should receive: ("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])

Fix: Add query preprocessing:

  1. Strip question words (what, which, how, etc.)
  2. Expand medical synonyms (Long COVID β†’ PASC, Post-COVID)
  3. Use MeSH terms for better recall

BUG 3: ClinicalTrials.gov No Filtering

File: src/tools/clinicaltrials.py

Returns ALL trials including:

  • Withdrawn trials
  • Terminated trials
  • Observational studies (not drug interventions)
  • Phase 1 (no efficacy data)

Fix: Filter by:

  • studyType=INTERVENTIONAL
  • phase=PHASE2,PHASE3,PHASE4
  • status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING

Evidence: Garbage In β†’ Garbage Out

When the Magentic SearchAgent calls these tools:

SearchAgent: "Find evidence for Long COVID medications"
    β”‚
    β–Ό
search_pubmed("Long COVID medications")
    β†’ Returns 1 semi-relevant paper (raw query hits)

search_preprints("Long COVID medications")
    β†’ Returns garbage (BioRxiv API doesn't search)
    β†’ "Calf muscle adaptations" (has "COVID" somewhere)
    β†’ "Ophthalmologist work-life balance" (mentions COVID)

search_clinical_trials("Long COVID medications")
    β†’ Returns all trials, no filtering
    β”‚
    β–Ό
JudgeAgent receives garbage evidence
    β”‚
    β–Ό
HypothesisAgent can't generate good hypotheses from garbage
    β”‚
    β–Ό
ReportAgent produces garbage report

The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.


Recommended Fixes

Priority 1: Delete or Fix BioRxiv (30 min)

Option A: Delete it

# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)

Option B: Replace with Europe PMC Europe PMC has preprints AND proper search API:

https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json

Priority 2: Fix PubMed Query (1 hour)

Add query preprocessor:

def preprocess_query(raw_query: str) -> str:
    """Convert natural language to PubMed query syntax."""
    # Strip question words
    # Expand medical synonyms
    # Add field tags [Title/Abstract]
    # Return optimized query

Priority 3: Filter ClinicalTrials (30 min)

Add parameters to API call:

params = {
    "query.term": query,
    "filter.overallStatus": "COMPLETED,RECRUITING",
    "filter.studyType": "INTERVENTIONAL",
    "pageSize": max_results,
}

Conclusion

Microsoft Agent Framework: NO BUGS FOUND

  • Imports work βœ…
  • Agent creation works βœ…
  • Workflow building works βœ…
  • Model config correct (GPT-5.1) βœ…
  • Streaming events work βœ…

Search Tools: CRITICALLY BROKEN

  • BioRxiv: API doesn't support search (fundamental)
  • PubMed: No query optimization (fixable)
  • ClinicalTrials: No filtering (fixable)

Recommendation:

  1. Delete BioRxiv immediately (unusable)
  2. Add PubMed query preprocessing
  3. Add ClinicalTrials filtering
  4. Then the Magentic multi-agent system will work as designed