VibecoderMcSwaggins commited on
Commit
fb7b8d7
Β·
1 Parent(s): 7fab6d4

feat: add documentation for Magentic mode bug and implementation spec

Browse files

- Introduced a new bug report for Magentic mode, detailing its non-functionality and root causes.
- Updated the implementation specification for Magentic integration, emphasizing the architecture, critical insights, and necessary changes for agent coordination.
- Enhanced clarity on the roles of various agents and their interactions within the Magentic workflow.
- Provided recommendations for fixing or abandoning the Magentic mode based on observed issues.

This commit aims to improve understanding and troubleshooting of the Magentic mode within the project.

docs/bugs/006_magentic_mode_broken.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bug 006: Magentic Mode Deeply Broken
2
+
3
+ **Date:** November 26, 2025
4
+ **Severity:** HIGH
5
+ **Status:** Open (Low Priority - Simple Mode Works)
6
+
7
+ ## 1. The Problem
8
+
9
+ Magentic mode (`mode="magentic"`) is **non-functional**. When enabled:
10
+ - Workflow hangs indefinitely (observed in local testing)
11
+ - No events are yielded to the UI
12
+ - API calls may be made but responses are not processed
13
+
14
+ ## 2. Root Cause Analysis
15
+
16
+ ### 2.1 Architecture Complexity
17
+
18
+ ```
19
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
20
+ β”‚ MagenticOrchestrator β”‚
21
+ β”‚ β”‚
22
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
23
+ β”‚ β”‚ SearchAgent β”‚ β”‚ HypothesisAgβ”‚ β”‚ JudgeAgent β”‚ β”‚
24
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
25
+ β”‚ β”‚ β”‚ β”‚ β”‚
26
+ β”‚ β–Ό β–Ό β–Ό β”‚
27
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
28
+ β”‚ β”‚ MagenticBuilder Standard Manager β”‚ β”‚
29
+ β”‚ β”‚ (OpenAIChatClient orchestration) β”‚ β”‚
30
+ β”‚ β”‚ β”‚ β”‚
31
+ β”‚ β”‚ - Decides which agent to call β”‚ β”‚
32
+ β”‚ β”‚ - Parses agent responses β”‚ β”‚
33
+ β”‚ β”‚ - Loops until "final result" β”‚ β”‚
34
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
35
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
36
+ ```
37
+
38
+ The issue is in the **Standard Manager** layer from `agent-framework-core`:
39
+ - It uses an LLM to decide which agent to call next
40
+ - The LLM response parsing is fragile
41
+ - The loop can stall or hang if parsing fails
42
+
43
+ ### 2.2 Specific Issues
44
+
45
+ | Issue | Location | Impact |
46
+ |-------|----------|--------|
47
+ | OpenAI-only | `orchestrator_magentic.py:103` | Can't use Anthropic |
48
+ | Manager parsing | `agent-framework` library | Hangs on malformed responses |
49
+ | No timeout | `MagenticBuilder` | Workflow runs forever |
50
+ | Round limits insufficient | `max_round_count=10` | Still hangs within rounds |
51
+
52
+ ### 2.3 Observed Behavior
53
+
54
+ ```bash
55
+ # Test magentic mode
56
+ uv run python -c "
57
+ from src.orchestrator_factory import create_orchestrator
58
+ ...
59
+ orch = create_orchestrator(mode='magentic')
60
+ async for event in orch.run('metformin alzheimer'):
61
+ print(event.type)
62
+ "
63
+
64
+ # Result: Hangs indefinitely after "started" event
65
+ # No search, no judge, no completion
66
+ ```
67
+
68
+ ## 3. Technical Deep Dive
69
+
70
+ ### 3.1 The Manager's Role
71
+
72
+ The `MagenticBuilder.with_standard_manager()` creates an LLM-powered router:
73
+
74
+ ```python
75
+ # From orchestrator_magentic.py lines 94-111
76
+ MagenticBuilder()
77
+ .participants(
78
+ searcher=search_agent,
79
+ hypothesizer=hypothesis_agent,
80
+ judge=judge_agent,
81
+ reporter=report_agent,
82
+ )
83
+ .with_standard_manager(
84
+ chat_client=OpenAIChatClient(
85
+ model_id=settings.openai_model,
86
+ api_key=settings.openai_api_key
87
+ ),
88
+ max_round_count=self._max_rounds, # 10
89
+ max_stall_count=3,
90
+ max_reset_count=2,
91
+ )
92
+ ```
93
+
94
+ The manager:
95
+ 1. Receives the task
96
+ 2. Calls OpenAI to decide: "Which agent should handle this?"
97
+ 3. Parses response to extract agent name
98
+ 4. Calls that agent
99
+ 5. Receives result
100
+ 6. Calls OpenAI again: "What next?"
101
+ 7. Repeat until "final result"
102
+
103
+ ### 3.2 Where It Breaks
104
+
105
+ The manager's LLM parsing expects specific response formats. If OpenAI returns:
106
+ - Unexpected JSON structure β†’ parse error β†’ stall
107
+ - Agent name with typo β†’ agent not found β†’ reset
108
+ - Verbose explanation β†’ extraction fails β†’ hang
109
+
110
+ ### 3.3 The Event Processing
111
+
112
+ ```python
113
+ # orchestrator_magentic.py lines 178-191
114
+ async for event in workflow.run_stream(task):
115
+ agent_event = self._process_event(event, iteration)
116
+ if agent_event:
117
+ # Events are processed but may never arrive
118
+ yield agent_event
119
+ ```
120
+
121
+ If `workflow.run_stream()` never yields events (manager stuck), the UI sees nothing.
122
+
123
+ ## 4. Why Simple Mode Works
124
+
125
+ Simple mode bypasses all of this:
126
+
127
+ ```python
128
+ # orchestrator.py
129
+ while iteration < self.config.max_iterations:
130
+ # Direct calls - no LLM routing
131
+ search_results = await self.search.execute(query)
132
+ assessment = await self.judge.assess(query, evidence)
133
+
134
+ if assessment.sufficient:
135
+ return synthesis
136
+ else:
137
+ continue # Deterministic loop
138
+ ```
139
+
140
+ No LLM-powered routing. No parsing. No hangs.
141
+
142
+ ## 5. Fix Options
143
+
144
+ ### Option A: Abandon Magentic (Recommended)
145
+
146
+ Simple mode + HFInferenceJudgeHandler provides:
147
+ - Free AI analysis
148
+ - Reliable execution
149
+ - No complex dependencies
150
+
151
+ Mark magentic as "experimental" or remove entirely.
152
+
153
+ ### Option B: Fix the Manager (Hard)
154
+
155
+ 1. Add timeout to `workflow.run_stream()`
156
+ 2. Implement custom manager without LLM routing
157
+ 3. Use deterministic agent selection
158
+ 4. Add better error handling in event processing
159
+
160
+ ### Option C: Replace agent-framework (Medium)
161
+
162
+ Use a different multi-agent framework:
163
+ - LangGraph
164
+ - AutoGen
165
+ - Custom implementation
166
+
167
+ ## 6. Recommendation
168
+
169
+ **Do not use magentic mode for the hackathon.**
170
+
171
+ Simple mode with HFInferenceJudgeHandler:
172
+ - Works reliably
173
+ - Provides real AI analysis
174
+ - No extra dependencies
175
+ - No API routing issues
176
+
177
+ ## 7. Files Involved
178
+
179
+ ```
180
+ src/orchestrator_magentic.py ← Main orchestrator (broken)
181
+ src/agents/search_agent.py ← Works in isolation
182
+ src/agents/judge_agent.py ← Works in isolation
183
+ src/agents/hypothesis_agent.py ← Works in isolation
184
+ src/agents/report_agent.py ← Works in isolation
185
+ ```
186
+
187
+ The agents themselves work. The **manager** coordination is broken.
188
+
189
+ ## 8. Verification
190
+
191
+ To verify this bug still exists:
192
+
193
+ ```bash
194
+ # This should hang
195
+ uv run python -c "
196
+ import asyncio
197
+ from src.app import configure_orchestrator
198
+
199
+ orch, name = configure_orchestrator(mode='magentic', use_mock=False)
200
+ print(f'Backend: {name}')
201
+
202
+ async def test():
203
+ async for event in orch.run('test query'):
204
+ print(event.type)
205
+
206
+ asyncio.run(test())
207
+ "
208
+ ```
209
+
210
+ Expected: Hangs after "started"
211
+ Working: Would show search_complete, judge_complete, etc.
docs/implementation/05_phase_magentic.md CHANGED
@@ -1,4 +1,4 @@
1
- # Phase 5 Implementation Spec: Magentic Integration (Optional)
2
 
3
  **Goal**: Upgrade orchestrator to use Microsoft Agent Framework's Magentic-One pattern.
4
  **Philosophy**: "Same API, Better Engine."
@@ -15,385 +15,744 @@ Magentic-One provides:
15
  - **Event streaming** for real-time UI updates
16
  - **Multi-agent coordination** with round limits and reset logic
17
 
18
- This is **NOT required for MVP**. Only implement if time permits after Phase 4.
19
-
20
  ---
21
 
22
- ## 2. Architecture Alignment
23
 
24
- ### Current Phase 4 Architecture
25
- ```
26
- User Query
27
- ↓
28
- Orchestrator (while loop)
29
- β”œβ”€β”€ SearchHandler.execute() β†’ Evidence
30
- β”œβ”€β”€ JudgeHandler.assess() β†’ JudgeAssessment
31
- └── Loop/Synthesize decision
32
- ↓
33
- Research Report
34
- ```
35
 
36
- ### Phase 5 Magentic Architecture
37
  ```
38
- User Query
39
- ↓
40
- MagenticBuilder
41
- β”œβ”€β”€ SearchAgent (wraps SearchHandler)
42
- β”œβ”€β”€ JudgeAgent (wraps JudgeHandler)
43
- └── StandardMagenticManager (LLM coordinator)
44
- ↓
45
- Research Report (same output format)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ```
47
 
48
- **Key Insight**: We wrap existing handlers as `AgentProtocol` implementations. The domain logic stays the same.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ---
51
 
52
- ## 3. Design for Seamless Integration
53
 
54
- ### 3.1 Protocol-Based Design (Phase 4 prep)
55
 
56
- In Phase 4, define handlers using Protocols so they can be wrapped later:
 
 
57
 
58
  ```python
59
- # src/orchestrator.py (Phase 4)
60
- from typing import Protocol, List
61
- from src.utils.models import Evidence, SearchResult, JudgeAssessment
62
 
 
 
 
 
 
63
 
64
- class SearchHandlerProtocol(Protocol):
65
- """Protocol for search handler - can be wrapped as Agent later."""
66
- async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult:
67
- ...
68
 
 
69
 
70
- class JudgeHandlerProtocol(Protocol):
71
- """Protocol for judge handler - can be wrapped as Agent later."""
72
- async def assess(self, question: str, evidence: List[Evidence]) -> JudgeAssessment:
73
- ...
74
 
 
75
 
76
- class OrchestratorProtocol(Protocol):
77
- """Protocol for orchestrator - allows swapping implementations."""
78
- async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
79
- ...
80
- ```
81
 
82
- ### 3.2 Facade Pattern
83
 
84
- The `Orchestrator` class is a facade. In Phase 5, we create `MagenticOrchestrator` with the same interface:
 
85
 
86
- ```python
87
- # Phase 4: Simple orchestrator
88
- orchestrator = Orchestrator(search_handler, judge_handler)
 
89
 
90
- # Phase 5: Magentic orchestrator (SAME API)
91
- orchestrator = MagenticOrchestrator(search_handler, judge_handler)
 
 
92
 
93
- # Usage is identical
94
- async for event in orchestrator.run("metformin alzheimer"):
95
- print(event.to_markdown())
96
- ```
 
 
 
 
 
 
97
 
98
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
- ## 4. Phase 5 Implementation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
- ### 4.1 Install Agent Framework
 
 
 
103
 
104
- Add to `pyproject.toml`:
105
 
106
- ```toml
107
- [project.optional-dependencies]
108
- magentic = [
109
- "agent-framework-core>=0.1.0",
110
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  ```
112
 
113
- ### 4.2 Agent Wrappers (`src/agents/search_agent.py`)
114
 
115
- Wrap `SearchHandler` as an `AgentProtocol`.
116
- **Note**: `AgentProtocol` requires `id`, `name`, `display_name`, `description`, `run`, `run_stream`, and `get_new_thread`.
117
 
118
  ```python
119
- """Search agent wrapper for Magentic integration."""
120
- from typing import Any, AsyncIterable
121
- from agent_framework import AgentProtocol, AgentRunResponse, AgentRunResponseUpdate, ChatMessage, Role, AgentThread
122
 
123
- from src.tools.search_handler import SearchHandler
124
- from src.utils.models import SearchResult
 
125
 
 
 
 
126
 
127
- class SearchAgent:
128
- """Wraps SearchHandler as an AgentProtocol for Magentic."""
 
 
129
 
130
- def __init__(self, search_handler: SearchHandler):
131
- self._handler = search_handler
132
- self._id = "search-agent"
133
- self._name = "SearchAgent"
134
- self._description = "Searches PubMed and web for drug repurposing evidence"
135
 
136
- @property
137
- def id(self) -> str:
138
- return self._id
139
 
140
- @property
141
- def name(self) -> str | None:
142
- return self._name
 
143
 
144
- @property
145
- def display_name(self) -> str:
146
- return self._name
 
 
 
147
 
148
- @property
149
- def description(self) -> str | None:
150
- return self._description
151
 
152
- async def run(
153
- self,
154
- messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
155
- *,
156
- thread: AgentThread | None = None,
157
- **kwargs: Any,
158
- ) -> AgentRunResponse:
159
- """Execute search based on the last user message."""
160
- # Extract query from messages
161
- query = ""
162
- if isinstance(messages, list):
163
- for msg in reversed(messages):
164
- if isinstance(msg, ChatMessage) and msg.role == Role.USER and msg.text:
165
- query = msg.text
166
- break
167
- elif isinstance(msg, str):
168
- query = msg
169
- break
170
- elif isinstance(messages, str):
171
- query = messages
172
-
173
- if not query:
174
- return AgentRunResponse(
175
- messages=[ChatMessage(role=Role.ASSISTANT, text="No query provided")],
176
- response_id="search-no-query",
177
- )
178
 
179
- # Execute search
180
- result: SearchResult = await self._handler.execute(query, max_results_per_tool=10)
 
181
 
182
- # Format response
183
- evidence_text = "\n".join([
184
- f"- [{e.citation.title}]({e.citation.url}): {e.content[:200]}..."
185
- for e in result.evidence[:5]
186
- ])
187
 
188
- response_text = f"Found {result.total_found} sources:\n\n{evidence_text}"
 
 
189
 
190
- return AgentRunResponse(
191
- messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
192
- response_id=f"search-{result.total_found}",
193
- additional_properties={"evidence": [e.model_dump() for e in result.evidence]},
194
- )
195
 
196
- async def run_stream(
197
- self,
198
- messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
199
- *,
200
- thread: AgentThread | None = None,
201
- **kwargs: Any,
202
- ) -> AsyncIterable[AgentRunResponseUpdate]:
203
- """Streaming wrapper for search (search itself isn't streaming)."""
204
- result = await self.run(messages, thread=thread, **kwargs)
205
- # Yield single update with full result
206
- yield AgentRunResponseUpdate(
207
- messages=result.messages,
208
- response_id=result.response_id
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
  )
210
 
211
- def get_new_thread(self, **kwargs: Any) -> AgentThread:
212
- """Create a new thread."""
213
- return AgentThread(**kwargs)
214
  ```
215
 
216
- ### 4.3 Judge Agent Wrapper (`src/agents/judge_agent.py`)
217
 
218
  ```python
219
- """Judge agent wrapper for Magentic integration."""
220
- from typing import Any, List, AsyncIterable
221
- from agent_framework import AgentProtocol, AgentRunResponse, AgentRunResponseUpdate, ChatMessage, Role, AgentThread
222
 
223
- from src.agent_factory.judges import JudgeHandler
224
- from src.utils.models import Evidence, JudgeAssessment
 
 
 
 
 
 
225
 
226
 
227
- class JudgeAgent:
228
- """Wraps JudgeHandler as an AgentProtocol for Magentic."""
229
 
230
- def __init__(self, judge_handler: JudgeHandler, evidence_store: dict[str, List[Evidence]]):
231
- self._handler = judge_handler
232
- self._evidence_store = evidence_store # Shared state for evidence
233
- self._id = "judge-agent"
234
- self._name = "JudgeAgent"
235
- self._description = "Evaluates evidence quality and determines if sufficient for synthesis"
236
 
237
- @property
238
- def id(self) -> str:
239
- return self._id
 
 
 
 
240
 
241
- @property
242
- def name(self) -> str | None:
243
- return self._name
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
 
245
- @property
246
- def display_name(self) -> str:
247
- return self._name
248
 
249
- @property
250
- def description(self) -> str | None:
251
- return self._description
252
 
253
- async def run(
254
- self,
255
- messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
256
- *,
257
- thread: AgentThread | None = None,
258
- **kwargs: Any,
259
- ) -> AgentRunResponse:
260
- """Assess evidence quality."""
261
- # Extract original question from messages
262
- question = ""
263
- if isinstance(messages, list):
264
- for msg in reversed(messages):
265
- if isinstance(msg, ChatMessage) and msg.role == Role.USER and msg.text:
266
- question = msg.text
267
- break
268
- elif isinstance(msg, str):
269
- question = msg
270
- break
271
- elif isinstance(messages, str):
272
- question = messages
273
-
274
- # Get evidence from shared store
275
- evidence = self._evidence_store.get("current", [])
276
-
277
- # Assess
278
- assessment: JudgeAssessment = await self._handler.assess(question, evidence)
279
-
280
- # Format response
281
- response_text = f"""## Assessment
282
-
283
- **Sufficient**: {assessment.sufficient}
284
- **Confidence**: {assessment.confidence:.0%}
285
- **Recommendation**: {assessment.recommendation}
286
-
287
- ### Scores
288
- - Mechanism: {assessment.details.mechanism_score}/10
289
- - Clinical: {assessment.details.clinical_evidence_score}/10
290
-
291
- ### Reasoning
292
- {assessment.reasoning}
293
- """
294
 
295
- if assessment.next_search_queries:
296
- response_text += f"\n### Next Queries\n" + "\n".join(
297
- f"- {q}" for q in assessment.next_search_queries
298
- )
 
 
 
299
 
300
- return AgentRunResponse(
301
- messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
302
- response_id=f"judge-{assessment.recommendation}",
303
- additional_properties={"assessment": assessment.model_dump()},
304
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
305
 
306
- async def run_stream(
307
- self,
308
- messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
309
- *,
310
- thread: AgentThread | None = None,
311
- **kwargs: Any,
312
- ) -> AsyncIterable[AgentRunResponseUpdate]:
313
- """Streaming wrapper for judge."""
314
- result = await self.run(messages, thread=thread, **kwargs)
315
- yield AgentRunResponseUpdate(
316
- messages=result.messages,
317
- response_id=result.response_id
318
- )
319
-
320
- def get_new_thread(self, **kwargs: Any) -> AgentThread:
321
- """Create a new thread."""
322
- return AgentThread(**kwargs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
  ```
324
 
325
- ### 4.4 Magentic Orchestrator (`src/orchestrator_magentic.py`)
326
 
327
  ```python
328
- """Magentic-based orchestrator for DeepCritical."""
329
- from typing import AsyncGenerator, List
330
- import structlog
331
 
 
332
  from agent_framework import (
 
 
333
  MagenticBuilder,
334
  MagenticFinalResultEvent,
335
- MagenticAgentMessageEvent,
336
  MagenticOrchestratorMessageEvent,
337
- MagenticAgentDeltaEvent,
338
  WorkflowOutputEvent,
339
  )
340
  from agent_framework.openai import OpenAIChatClient
341
 
342
- from src.agents.search_agent import SearchAgent
343
- from src.agents.judge_agent import JudgeAgent
344
- from src.tools.search_handler import SearchHandler
345
- from src.agent_factory.judges import JudgeHandler
346
- from src.utils.models import AgentEvent, Evidence
 
 
 
 
 
347
 
348
  logger = structlog.get_logger()
349
 
350
 
351
  class MagenticOrchestrator:
352
  """
353
- Magentic-based orchestrator - same API as Orchestrator.
354
 
355
- Uses Microsoft Agent Framework's MagenticBuilder for multi-agent coordination.
 
356
  """
357
 
358
  def __init__(
359
  self,
360
- search_handler: SearchHandler,
361
- judge_handler: JudgeHandler,
362
  max_rounds: int = 10,
363
- ):
364
- self._search_handler = search_handler
365
- self._judge_handler = judge_handler
366
- self._max_rounds = max_rounds
367
- self._evidence_store: dict[str, List[Evidence]] = {"current": []}
368
-
369
- async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
370
- """
371
- Run the Magentic workflow - same API as simple Orchestrator.
372
 
373
- Yields AgentEvent objects for real-time UI updates.
 
 
374
  """
375
- logger.info("Starting Magentic orchestrator", query=query)
 
 
 
 
376
 
377
- yield AgentEvent(
378
- type="started",
379
- message=f"Starting research (Magentic mode): {query}",
380
- iteration=0,
 
 
 
 
 
 
 
 
 
 
 
381
  )
382
 
383
- # Create agent wrappers
384
- search_agent = SearchAgent(self._search_handler)
385
- judge_agent = JudgeAgent(self._judge_handler, self._evidence_store)
386
-
387
- # Build Magentic workflow
388
- # Note: MagenticBuilder.participants takes named arguments for agent instances
389
- workflow = (
390
  MagenticBuilder()
391
  .participants(
392
  searcher=search_agent,
 
393
  judge=judge_agent,
 
394
  )
395
  .with_standard_manager(
396
- chat_client=OpenAIChatClient(),
397
  max_round_count=self._max_rounds,
398
  max_stall_count=3,
399
  max_reset_count=2,
@@ -401,139 +760,173 @@ class MagenticOrchestrator:
401
  .build()
402
  )
403
 
404
- # Task instruction for the manager
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
405
  task = f"""Research drug repurposing opportunities for: {query}
406
 
407
- Instructions:
408
- 1. Use SearcherAgent to find evidence from PubMed and web
409
- 2. Use JudgeAgent to evaluate if evidence is sufficient
410
- 3. If JudgeAgent says "continue", search with refined queries
411
- 4. If JudgeAgent says "synthesize", provide final synthesis
412
- 5. Stop when synthesis is ready or max rounds reached
413
-
414
- Focus on finding:
415
- - Mechanism of action evidence
416
- - Clinical/preclinical studies
417
- - Specific drug candidates
418
- """
 
419
 
420
  iteration = 0
421
  try:
422
- # workflow.run_stream returns an async generator of workflow events
423
  async for event in workflow.run_stream(task):
424
- if isinstance(event, MagenticOrchestratorMessageEvent):
425
- # Manager events (planning, instruction, ledger)
426
- message_text = event.message.text if event.message else ""
427
- yield AgentEvent(
428
- type="judging",
429
- message=f"Manager ({event.kind}): {message_text[:100]}...",
430
- iteration=iteration,
431
- )
432
-
433
- elif isinstance(event, MagenticAgentMessageEvent):
434
- # Complete agent response
435
- iteration += 1
436
- agent_name = event.agent_id or "unknown"
437
- msg_text = event.message.text if event.message else ""
438
-
439
- if "search" in agent_name.lower():
440
- # Check if we found evidence (based on SearchAgent logic)
441
- # In a real implementation we might extract metadata
442
- yield AgentEvent(
443
- type="search_complete",
444
- message=f"Search agent: {msg_text[:100]}...",
445
- iteration=iteration,
446
- )
447
- elif "judge" in agent_name.lower():
448
- yield AgentEvent(
449
- type="judge_complete",
450
- message=f"Judge agent: {msg_text[:100]}...",
451
- iteration=iteration,
452
- )
453
-
454
- elif isinstance(event, MagenticFinalResultEvent):
455
- # Final workflow result
456
- final_text = event.message.text if event.message else "No result"
457
- yield AgentEvent(
458
- type="complete",
459
- message=final_text,
460
- data={"iterations": iteration},
461
- iteration=iteration,
462
- )
463
-
464
- elif isinstance(event, MagenticAgentDeltaEvent):
465
- # Streaming token chunks from agents (optional "typing" effect)
466
- # Only emit if we have actual text content
467
- if event.text:
468
- yield AgentEvent(
469
- type="streaming",
470
- message=event.text,
471
- data={"agent_id": event.agent_id},
472
- iteration=iteration,
473
- )
474
-
475
- elif isinstance(event, WorkflowOutputEvent):
476
- # Alternative final output event
477
- if event.data:
478
- yield AgentEvent(
479
- type="complete",
480
- message=str(event.data),
481
- iteration=iteration,
482
- )
483
 
484
  except Exception as e:
485
  logger.error("Magentic workflow failed", error=str(e))
486
  yield AgentEvent(
487
  type="error",
488
- message=f"Workflow error: {str(e)}",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489
  iteration=iteration,
490
  )
491
- ```
492
 
493
- ### 4.5 Factory Pattern (`src/orchestrator_factory.py`)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
494
 
495
- Allow switching between implementations:
496
 
497
  ```python
498
  """Factory for creating orchestrators."""
499
- from typing import Literal
500
 
501
- from src.orchestrator import Orchestrator
502
- from src.tools.search_handler import SearchHandler
503
- from src.agent_factory.judges import JudgeHandler
504
  from src.utils.models import OrchestratorConfig
505
 
506
 
507
  def create_orchestrator(
508
- search_handler: SearchHandler,
509
- judge_handler: JudgeHandler,
510
  config: OrchestratorConfig | None = None,
511
  mode: Literal["simple", "magentic"] = "simple",
512
- ):
513
  """
514
  Create an orchestrator instance.
515
 
516
  Args:
517
- search_handler: The search handler
518
- judge_handler: The judge handler
519
  config: Optional configuration
520
- mode: "simple" for Phase 4 loop, "magentic" for Phase 5 multi-agent
521
 
522
  Returns:
523
- Orchestrator instance (same interface regardless of mode)
 
 
 
 
524
  """
525
  if mode == "magentic":
526
  try:
527
  from src.orchestrator_magentic import MagenticOrchestrator
 
528
  return MagenticOrchestrator(
529
- search_handler=search_handler,
530
- judge_handler=judge_handler,
531
  max_rounds=config.max_iterations if config else 10,
532
  )
533
  except ImportError:
534
  # Fallback to simple if agent-framework not installed
535
  pass
536
 
 
 
 
 
537
  return Orchestrator(
538
  search_handler=search_handler,
539
  judge_handler=judge_handler,
@@ -543,96 +936,156 @@ def create_orchestrator(
543
 
544
  ---
545
 
546
- ## 5. Directory Structure After Phase 5
 
 
547
 
548
  ```
549
- src/
550
- β”œβ”€β”€ app.py # Gradio UI (unchanged)
551
- β”œβ”€β”€ orchestrator.py # Phase 4 simple orchestrator
552
- β”œβ”€β”€ orchestrator_magentic.py # Phase 5 Magentic orchestrator
553
- β”œβ”€β”€ orchestrator_factory.py # Factory to switch implementations
554
- β”œβ”€β”€ agents/ # NEW: Agent wrappers
555
- β”‚ β”œβ”€β”€ __init__.py
556
- β”‚ β”œβ”€β”€ search_agent.py # SearchHandler as AgentProtocol
557
- β”‚ └── judge_agent.py # JudgeHandler as AgentProtocol
558
- β”œβ”€β”€ agent_factory/
559
- β”‚ └── judges.py # JudgeHandler (unchanged)
560
- β”œβ”€β”€ tools/
561
- β”‚ β”œβ”€β”€ pubmed.py # PubMed tool (unchanged)
562
- β”‚ β”œβ”€β”€ websearch.py # Web tool (unchanged)
563
- β”‚ └── search_handler.py # SearchHandler (unchanged)
564
- └── utils/
565
- └── models.py # Models (unchanged)
 
 
 
 
 
 
 
 
566
  ```
567
 
568
  ---
569
 
570
- ## 6. Implementation Checklist
571
 
572
- - [ ] Ensure Phase 4 uses Protocol-based handler interfaces
573
- - [ ] Add `agent-framework-core` to optional dependencies
574
- - [ ] Create `src/agents/` directory
575
- - [ ] Implement `SearchAgent` wrapper
576
- - [ ] Implement `JudgeAgent` wrapper
577
- - [ ] Implement `MagenticOrchestrator`
578
- - [ ] Implement `orchestrator_factory.py`
579
- - [ ] Add tests for agent wrappers
580
- - [ ] Test Magentic flow end-to-end
581
- - [ ] Update `src/app.py` to use factory with mode toggle
 
 
 
582
 
583
  ---
584
 
585
- ## 7. Definition of Done
586
 
587
- Phase 5 is **COMPLETE** when:
 
 
 
 
 
 
 
 
 
 
588
 
589
- 1. All Phase 4 tests still pass (no regression)
590
- 2. `MagenticOrchestrator` has same API as `Orchestrator`
591
- 3. Can switch between modes via factory:
592
 
593
- ```python
594
- # Simple mode (Phase 4)
595
- orchestrator = create_orchestrator(search, judge, mode="simple")
 
596
 
597
- # Magentic mode (Phase 5)
598
- orchestrator = create_orchestrator(search, judge, mode="magentic")
599
 
600
- # Same usage!
601
- async for event in orchestrator.run("metformin alzheimer"):
602
- print(event.to_markdown())
603
- ```
604
 
605
- 4. UI works with both modes
606
- 5. Graceful fallback if agent-framework not installed
607
 
608
- ---
 
 
 
 
 
 
 
 
 
609
 
610
- ## 8. When to Implement
611
 
612
- **Priority**: LOW (optional enhancement)
613
 
614
- Implement ONLY after:
615
- 1. βœ… Phase 1: Foundation
616
- 2. βœ… Phase 2: Search
617
- 3. βœ… Phase 3: Judge
618
- 4. βœ… Phase 4: Orchestrator + UI (MVP SHIPPED)
619
 
620
- If hackathon deadline is approaching, **SKIP Phase 5**. Ship the MVP.
 
 
 
 
 
621
 
622
  ---
623
 
624
- ## 9. Benefits of This Design
625
 
626
- 1. **No breaking changes** - Phase 4 code works unchanged
627
- 2. **Same API** - `run()` returns `AsyncGenerator[AgentEvent, None]`
628
- 3. **Gradual adoption** - Optional dependency, factory fallback
629
- 4. **Testable** - Each component can be tested independently
630
- 5. **Aligns with Tonic's vision** - Uses Microsoft Agent Framework patterns
631
 
632
- ---
 
 
 
 
 
 
 
633
 
634
- ## 10. Reference
 
 
 
 
 
 
 
 
 
 
 
635
 
636
- - Microsoft Agent Framework: `reference_repos/agent-framework/`
637
- - Magentic samples: `reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/magentic.py`
638
- - AgentProtocol: `reference_repos/agent-framework/python/packages/core/agent_framework/_agents.py`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 5 Implementation Spec: Magentic Integration
2
 
3
  **Goal**: Upgrade orchestrator to use Microsoft Agent Framework's Magentic-One pattern.
4
  **Philosophy**: "Same API, Better Engine."
 
15
  - **Event streaming** for real-time UI updates
16
  - **Multi-agent coordination** with round limits and reset logic
17
 
 
 
18
  ---
19
 
20
+ ## 2. Critical Architecture Understanding
21
 
22
+ ### 2.1 How Magentic Actually Works
 
 
 
 
 
 
 
 
 
 
23
 
 
24
  ```
25
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
26
+ β”‚ MagenticBuilder Workflow β”‚
27
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
28
+ β”‚ β”‚
29
+ β”‚ User Task: "Research drug repurposing for metformin alzheimer" β”‚
30
+ β”‚ ↓ β”‚
31
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
32
+ β”‚ β”‚ StandardMagenticManager β”‚ β”‚
33
+ β”‚ β”‚ β”‚ β”‚
34
+ β”‚ β”‚ 1. plan() β†’ LLM generates facts & plan β”‚ β”‚
35
+ β”‚ β”‚ 2. create_progress_ledger() β†’ LLM decides: β”‚ β”‚
36
+ β”‚ β”‚ - is_request_satisfied? β”‚ β”‚
37
+ β”‚ β”‚ - next_speaker: "searcher" β”‚ β”‚
38
+ β”‚ β”‚ - instruction_or_question: "Search for clinical trials..." β”‚ β”‚
39
+ β”‚ β”‚ β”‚ β”‚
40
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
41
+ β”‚ ↓ β”‚
42
+ β”‚ NATURAL LANGUAGE INSTRUCTION sent to agent β”‚
43
+ β”‚ "Search for clinical trials about metformin..." β”‚
44
+ β”‚ ↓ β”‚
45
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
46
+ β”‚ β”‚ ChatAgent (searcher) β”‚ β”‚
47
+ β”‚ β”‚ β”‚ β”‚
48
+ β”‚ β”‚ chat_client (INTERNAL LLM) ← understands instruction β”‚ β”‚
49
+ β”‚ β”‚ ↓ β”‚ β”‚
50
+ β”‚ β”‚ "I'll search for metformin alzheimer clinical trials" β”‚ β”‚
51
+ β”‚ β”‚ ↓ β”‚ β”‚
52
+ β”‚ β”‚ tools=[search_pubmed, search_clinicaltrials] ← calls tools β”‚ β”‚
53
+ β”‚ β”‚ ↓ β”‚ β”‚
54
+ β”‚ β”‚ Returns natural language response to manager β”‚ β”‚
55
+ β”‚ β”‚ β”‚ β”‚
56
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
57
+ β”‚ ↓ β”‚
58
+ β”‚ Manager evaluates response β”‚
59
+ β”‚ Decides next agent or completion β”‚
60
+ β”‚ β”‚
61
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
62
  ```
63
 
64
+ ### 2.2 The Critical Insight
65
+
66
+ **Microsoft's ChatAgent has an INTERNAL LLM (`chat_client`) that:**
67
+ 1. Receives natural language instructions from the manager
68
+ 2. Understands what action to take
69
+ 3. Calls attached tools (functions)
70
+ 4. Returns natural language responses
71
+
72
+ **Our previous implementation was WRONG because:**
73
+ - We wrapped handlers as bare `BaseAgent` subclasses
74
+ - No internal LLM to understand instructions
75
+ - Raw instruction text was passed directly to APIs (PubMed doesn't understand "Search for clinical trials...")
76
+
77
+ ### 2.3 Correct Pattern: ChatAgent with Tools
78
+
79
+ ```python
80
+ # CORRECT: Agent backed by LLM that calls tools
81
+ from agent_framework import ChatAgent, AIFunction
82
+ from agent_framework.openai import OpenAIChatClient
83
+
84
+ # Define tool that ChatAgent can call
85
+ @AIFunction
86
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
87
+ """Search PubMed for biomedical literature.
88
+
89
+ Args:
90
+ query: Search keywords (e.g., "metformin alzheimer mechanism")
91
+ max_results: Maximum number of results to return
92
+ """
93
+ result = await pubmed_tool.search(query, max_results)
94
+ return format_results(result)
95
+
96
+ # ChatAgent with internal LLM + tools
97
+ search_agent = ChatAgent(
98
+ name="SearchAgent",
99
+ description="Searches biomedical databases for drug repurposing evidence",
100
+ instructions="You search PubMed, ClinicalTrials.gov, and bioRxiv for evidence.",
101
+ chat_client=OpenAIChatClient(model_id="gpt-4o-mini"), # INTERNAL LLM
102
+ tools=[search_pubmed, search_clinicaltrials, search_biorxiv], # TOOLS
103
+ )
104
+ ```
105
 
106
  ---
107
 
108
+ ## 3. Correct Implementation
109
 
110
+ ### 3.1 Shared State Module (`src/agents/state.py`)
111
 
112
+ **CRITICAL**: Tools must update shared state so:
113
+ 1. EmbeddingService can deduplicate across searches
114
+ 2. ReportAgent can access structured Evidence objects for citations
115
 
116
  ```python
117
+ """Shared state for Magentic agents.
 
 
118
 
119
+ This module provides global state that tools update as a side effect.
120
+ ChatAgent tools return strings to the LLM, but also update this state
121
+ for semantic deduplication and structured citation access.
122
+ """
123
+ from __future__ import annotations
124
 
125
+ from typing import TYPE_CHECKING
 
 
 
126
 
127
+ import structlog
128
 
129
+ if TYPE_CHECKING:
130
+ from src.services.embeddings import EmbeddingService
 
 
131
 
132
+ from src.utils.models import Evidence
133
 
134
+ logger = structlog.get_logger()
 
 
 
 
135
 
 
136
 
137
+ class MagenticState:
138
+ """Shared state container for Magentic workflow.
139
 
140
+ Maintains:
141
+ - evidence_store: All collected Evidence objects (for citations)
142
+ - embedding_service: Optional semantic search (for deduplication)
143
+ """
144
 
145
+ def __init__(self) -> None:
146
+ self.evidence_store: list[Evidence] = []
147
+ self.embedding_service: EmbeddingService | None = None
148
+ self._seen_urls: set[str] = set()
149
 
150
+ def init_embedding_service(self) -> None:
151
+ """Lazy-initialize embedding service if available."""
152
+ if self.embedding_service is not None:
153
+ return
154
+ try:
155
+ from src.services.embeddings import get_embedding_service
156
+ self.embedding_service = get_embedding_service()
157
+ logger.info("Embedding service enabled for Magentic mode")
158
+ except Exception as e:
159
+ logger.warning("Embedding service unavailable", error=str(e))
160
 
161
+ async def add_evidence(self, evidence_list: list[Evidence]) -> list[Evidence]:
162
+ """Add evidence with semantic deduplication.
163
+
164
+ Args:
165
+ evidence_list: New evidence from search
166
+
167
+ Returns:
168
+ List of unique evidence (not duplicates)
169
+ """
170
+ if not evidence_list:
171
+ return []
172
+
173
+ # URL-based deduplication first (fast)
174
+ url_unique = [
175
+ e for e in evidence_list
176
+ if e.citation.url not in self._seen_urls
177
+ ]
178
+
179
+ # Semantic deduplication if available
180
+ if self.embedding_service and url_unique:
181
+ try:
182
+ unique = await self.embedding_service.deduplicate(url_unique, threshold=0.85)
183
+ logger.info(
184
+ "Semantic deduplication",
185
+ before=len(url_unique),
186
+ after=len(unique),
187
+ )
188
+ except Exception as e:
189
+ logger.warning("Deduplication failed, using URL-based", error=str(e))
190
+ unique = url_unique
191
+ else:
192
+ unique = url_unique
193
+
194
+ # Update state
195
+ for e in unique:
196
+ self._seen_urls.add(e.citation.url)
197
+ self.evidence_store.append(e)
198
+
199
+ return unique
200
+
201
+ async def search_related(self, query: str, n_results: int = 5) -> list[Evidence]:
202
+ """Find semantically related evidence from vector store.
203
+
204
+ Args:
205
+ query: Search query
206
+ n_results: Number of related items
207
+
208
+ Returns:
209
+ Related Evidence objects (reconstructed from vector store)
210
+ """
211
+ if not self.embedding_service:
212
+ return []
213
 
214
+ try:
215
+ from src.utils.models import Citation
216
+
217
+ related = await self.embedding_service.search_similar(query, n_results)
218
+ evidence = []
219
+
220
+ for item in related:
221
+ if item["id"] in self._seen_urls:
222
+ continue # Already in results
223
+
224
+ meta = item.get("metadata", {})
225
+ authors_str = meta.get("authors", "")
226
+ authors = [a.strip() for a in authors_str.split(",") if a.strip()]
227
+
228
+ ev = Evidence(
229
+ content=item["content"],
230
+ citation=Citation(
231
+ title=meta.get("title", "Related Evidence"),
232
+ url=item["id"],
233
+ source=meta.get("source", "pubmed"),
234
+ date=meta.get("date", "n.d."),
235
+ authors=authors,
236
+ ),
237
+ relevance=max(0.0, 1.0 - item.get("distance", 0.5)),
238
+ )
239
+ evidence.append(ev)
240
+
241
+ return evidence
242
+ except Exception as e:
243
+ logger.warning("Related search failed", error=str(e))
244
+ return []
245
 
246
+ def reset(self) -> None:
247
+ """Reset state for new workflow run."""
248
+ self.evidence_store.clear()
249
+ self._seen_urls.clear()
250
 
 
251
 
252
+ # Global singleton for workflow
253
+ _state: MagenticState | None = None
254
+
255
+
256
+ def get_magentic_state() -> MagenticState:
257
+ """Get or create the global Magentic state."""
258
+ global _state
259
+ if _state is None:
260
+ _state = MagenticState()
261
+ return _state
262
+
263
+
264
+ def reset_magentic_state() -> None:
265
+ """Reset state for a fresh workflow run."""
266
+ global _state
267
+ if _state is not None:
268
+ _state.reset()
269
+ else:
270
+ _state = MagenticState()
271
  ```
272
 
273
+ ### 3.2 Tool Functions (`src/agents/tools.py`)
274
 
275
+ Tools call APIs AND update shared state. Return strings to LLM, but also store structured Evidence.
 
276
 
277
  ```python
278
+ """Tool functions for Magentic agents.
 
 
279
 
280
+ IMPORTANT: These tools do TWO things:
281
+ 1. Return formatted strings to the ChatAgent's internal LLM
282
+ 2. Update shared state (evidence_store, embeddings) as a side effect
283
 
284
+ This preserves semantic deduplication and structured citation access.
285
+ """
286
+ from agent_framework import AIFunction
287
 
288
+ from src.agents.state import get_magentic_state
289
+ from src.tools.biorxiv import BioRxivTool
290
+ from src.tools.clinicaltrials import ClinicalTrialsTool
291
+ from src.tools.pubmed import PubMedTool
292
 
293
+ # Singleton tool instances
294
+ _pubmed = PubMedTool()
295
+ _clinicaltrials = ClinicalTrialsTool()
296
+ _biorxiv = BioRxivTool()
 
297
 
 
 
 
298
 
299
+ def _format_results(results: list, source_name: str, query: str) -> str:
300
+ """Format search results for LLM consumption."""
301
+ if not results:
302
+ return f"No {source_name} results found for: {query}"
303
 
304
+ output = [f"Found {len(results)} {source_name} results:\n"]
305
+ for i, r in enumerate(results[:10], 1):
306
+ output.append(f"{i}. **{r.citation.title}**")
307
+ output.append(f" Source: {r.citation.source} | Date: {r.citation.date}")
308
+ output.append(f" {r.content[:300]}...")
309
+ output.append(f" URL: {r.citation.url}\n")
310
 
311
+ return "\n".join(output)
 
 
312
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
313
 
314
+ @AIFunction
315
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
316
+ """Search PubMed for biomedical research papers.
317
 
318
+ Use this tool to find peer-reviewed scientific literature about
319
+ drugs, diseases, mechanisms of action, and clinical studies.
 
 
 
320
 
321
+ Args:
322
+ query: Search keywords (e.g., "metformin alzheimer mechanism")
323
+ max_results: Maximum results to return (default 10)
324
 
325
+ Returns:
326
+ Formatted list of papers with titles, abstracts, and citations
327
+ """
328
+ # 1. Execute search
329
+ results = await _pubmed.search(query, max_results)
330
 
331
+ # 2. Update shared state (semantic dedup + evidence store)
332
+ state = get_magentic_state()
333
+ unique = await state.add_evidence(results)
334
+
335
+ # 3. Also get related evidence from vector store
336
+ related = await state.search_related(query, n_results=3)
337
+ if related:
338
+ await state.add_evidence(related)
339
+
340
+ # 4. Return formatted string for LLM
341
+ total_new = len(unique)
342
+ total_stored = len(state.evidence_store)
343
+
344
+ output = _format_results(results, "PubMed", query)
345
+ output += f"\n[State: {total_new} new, {total_stored} total in evidence store]"
346
+
347
+ if related:
348
+ output += f"\n[Also found {len(related)} semantically related items from previous searches]"
349
+
350
+ return output
351
+
352
+
353
+ @AIFunction
354
+ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
355
+ """Search ClinicalTrials.gov for clinical studies.
356
+
357
+ Use this tool to find ongoing and completed clinical trials
358
+ for drug repurposing candidates.
359
+
360
+ Args:
361
+ query: Search terms (e.g., "metformin cancer phase 3")
362
+ max_results: Maximum results to return (default 10)
363
+
364
+ Returns:
365
+ Formatted list of clinical trials with status and details
366
+ """
367
+ # 1. Execute search
368
+ results = await _clinicaltrials.search(query, max_results)
369
+
370
+ # 2. Update shared state
371
+ state = get_magentic_state()
372
+ unique = await state.add_evidence(results)
373
+
374
+ # 3. Return formatted string
375
+ total_new = len(unique)
376
+ total_stored = len(state.evidence_store)
377
+
378
+ output = _format_results(results, "ClinicalTrials.gov", query)
379
+ output += f"\n[State: {total_new} new, {total_stored} total in evidence store]"
380
+
381
+ return output
382
+
383
+
384
+ @AIFunction
385
+ async def search_preprints(query: str, max_results: int = 10) -> str:
386
+ """Search bioRxiv/medRxiv for preprint papers.
387
+
388
+ Use this tool to find the latest research that hasn't been
389
+ peer-reviewed yet. Good for cutting-edge findings.
390
+
391
+ Args:
392
+ query: Search terms (e.g., "long covid treatment")
393
+ max_results: Maximum results to return (default 10)
394
+
395
+ Returns:
396
+ Formatted list of preprints with abstracts and links
397
+ """
398
+ # 1. Execute search
399
+ results = await _biorxiv.search(query, max_results)
400
+
401
+ # 2. Update shared state
402
+ state = get_magentic_state()
403
+ unique = await state.add_evidence(results)
404
+
405
+ # 3. Return formatted string
406
+ total_new = len(unique)
407
+ total_stored = len(state.evidence_store)
408
+
409
+ output = _format_results(results, "bioRxiv/medRxiv", query)
410
+ output += f"\n[State: {total_new} new, {total_stored} total in evidence store]"
411
+
412
+ return output
413
+
414
+
415
+ @AIFunction
416
+ async def get_evidence_summary() -> str:
417
+ """Get summary of all collected evidence.
418
+
419
+ Use this tool when you need to review what evidence has been collected
420
+ before making an assessment or generating a report.
421
+
422
+ Returns:
423
+ Summary of evidence store with counts and key citations
424
+ """
425
+ state = get_magentic_state()
426
+ evidence = state.evidence_store
427
+
428
+ if not evidence:
429
+ return "No evidence collected yet."
430
+
431
+ # Group by source
432
+ by_source: dict[str, list] = {}
433
+ for e in evidence:
434
+ src = e.citation.source
435
+ if src not in by_source:
436
+ by_source[src] = []
437
+ by_source[src].append(e)
438
+
439
+ output = [f"**Evidence Store Summary** ({len(evidence)} total items)\n"]
440
+
441
+ for source, items in by_source.items():
442
+ output.append(f"\n### {source.upper()} ({len(items)} items)")
443
+ for e in items[:5]: # First 5 per source
444
+ output.append(f"- {e.citation.title[:80]}...")
445
+
446
+ return "\n".join(output)
447
+
448
+
449
+ @AIFunction
450
+ async def get_bibliography() -> str:
451
+ """Get full bibliography of all collected evidence.
452
+
453
+ Use this tool when generating a final report to get properly
454
+ formatted citations for all evidence.
455
+
456
+ Returns:
457
+ Numbered bibliography with full citation details
458
+ """
459
+ state = get_magentic_state()
460
+ evidence = state.evidence_store
461
+
462
+ if not evidence:
463
+ return "No evidence collected for bibliography."
464
+
465
+ output = ["## References\n"]
466
+
467
+ for i, e in enumerate(evidence, 1):
468
+ # Format: Authors (Year). Title. Source. URL
469
+ authors = ", ".join(e.citation.authors[:3]) if e.citation.authors else "Unknown"
470
+ if e.citation.authors and len(e.citation.authors) > 3:
471
+ authors += " et al."
472
+
473
+ year = e.citation.date[:4] if e.citation.date else "n.d."
474
+
475
+ output.append(
476
+ f"{i}. {authors} ({year}). {e.citation.title}. "
477
+ f"*{e.citation.source.upper()}*. [{e.citation.url}]({e.citation.url})"
478
  )
479
 
480
+ return "\n".join(output)
 
 
481
  ```
482
 
483
+ ### 3.3 ChatAgent-Based Agents (`src/agents/magentic_agents.py`)
484
 
485
  ```python
486
+ """Magentic-compatible agents using ChatAgent pattern."""
487
+ from agent_framework import ChatAgent
488
+ from agent_framework.openai import OpenAIChatClient
489
 
490
+ from src.agents.tools import (
491
+ get_bibliography,
492
+ get_evidence_summary,
493
+ search_clinical_trials,
494
+ search_preprints,
495
+ search_pubmed,
496
+ )
497
+ from src.utils.config import settings
498
 
499
 
500
+ def create_search_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
501
+ """Create a search agent with internal LLM and search tools.
502
 
503
+ Args:
504
+ chat_client: Optional custom chat client. If None, uses default.
 
 
 
 
505
 
506
+ Returns:
507
+ ChatAgent configured for biomedical search
508
+ """
509
+ client = chat_client or OpenAIChatClient(
510
+ model_id="gpt-4o-mini", # Fast, cheap for tool orchestration
511
+ api_key=settings.openai_api_key,
512
+ )
513
 
514
+ return ChatAgent(
515
+ name="SearchAgent",
516
+ description="Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) for drug repurposing evidence",
517
+ instructions="""You are a biomedical search specialist. When asked to find evidence:
518
+
519
+ 1. Analyze the request to determine what to search for
520
+ 2. Extract key search terms (drug names, disease names, mechanisms)
521
+ 3. Use the appropriate search tools:
522
+ - search_pubmed for peer-reviewed papers
523
+ - search_clinical_trials for clinical studies
524
+ - search_preprints for cutting-edge findings
525
+ 4. Summarize what you found and highlight key evidence
526
+
527
+ Be thorough - search multiple databases when appropriate.
528
+ Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
529
+ chat_client=client,
530
+ tools=[search_pubmed, search_clinical_trials, search_preprints],
531
+ temperature=0.3, # More deterministic for tool use
532
+ )
533
 
 
 
 
534
 
535
+ def create_judge_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
536
+ """Create a judge agent that evaluates evidence quality.
 
537
 
538
+ Args:
539
+ chat_client: Optional custom chat client. If None, uses default.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
540
 
541
+ Returns:
542
+ ChatAgent configured for evidence assessment
543
+ """
544
+ client = chat_client or OpenAIChatClient(
545
+ model_id="gpt-4o", # Better model for nuanced judgment
546
+ api_key=settings.openai_api_key,
547
+ )
548
 
549
+ return ChatAgent(
550
+ name="JudgeAgent",
551
+ description="Evaluates evidence quality and determines if sufficient for synthesis",
552
+ instructions="""You are an evidence quality assessor. When asked to evaluate:
553
+
554
+ 1. First, call get_evidence_summary() to see all collected evidence
555
+ 2. Score on two dimensions (0-10 each):
556
+ - Mechanism Score: How well is the biological mechanism explained?
557
+ - Clinical Score: How strong is the clinical/preclinical evidence?
558
+ 3. Determine if evidence is SUFFICIENT for a final report:
559
+ - Sufficient: Clear mechanism + supporting clinical data
560
+ - Insufficient: Gaps in mechanism OR weak clinical evidence
561
+ 4. If insufficient, suggest specific search queries to fill gaps
562
+
563
+ Be rigorous but fair. Look for:
564
+ - Molecular targets and pathways
565
+ - Animal model studies
566
+ - Human clinical trials
567
+ - Safety data
568
+ - Drug-drug interactions""",
569
+ chat_client=client,
570
+ tools=[get_evidence_summary], # Can review collected evidence
571
+ temperature=0.2, # Consistent judgments
572
+ )
573
 
574
+
575
+ def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
576
+ """Create a hypothesis generation agent.
577
+
578
+ Args:
579
+ chat_client: Optional custom chat client. If None, uses default.
580
+
581
+ Returns:
582
+ ChatAgent configured for hypothesis generation
583
+ """
584
+ client = chat_client or OpenAIChatClient(
585
+ model_id="gpt-4o",
586
+ api_key=settings.openai_api_key,
587
+ )
588
+
589
+ return ChatAgent(
590
+ name="HypothesisAgent",
591
+ description="Generates mechanistic hypotheses for drug repurposing",
592
+ instructions="""You are a biomedical hypothesis generator. Based on evidence:
593
+
594
+ 1. Identify the key molecular targets involved
595
+ 2. Map the biological pathways affected
596
+ 3. Generate testable hypotheses in this format:
597
+
598
+ DRUG β†’ TARGET β†’ PATHWAY β†’ THERAPEUTIC EFFECT
599
+
600
+ Example:
601
+ Metformin β†’ AMPK activation β†’ mTOR inhibition β†’ Reduced tau phosphorylation
602
+
603
+ 4. Explain the rationale for each hypothesis
604
+ 5. Suggest what additional evidence would support or refute it
605
+
606
+ Focus on mechanistic plausibility and existing evidence.""",
607
+ chat_client=client,
608
+ temperature=0.5, # Some creativity for hypothesis generation
609
+ )
610
+
611
+
612
+ def create_report_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgent:
613
+ """Create a report synthesis agent.
614
+
615
+ Args:
616
+ chat_client: Optional custom chat client. If None, uses default.
617
+
618
+ Returns:
619
+ ChatAgent configured for report generation
620
+ """
621
+ client = chat_client or OpenAIChatClient(
622
+ model_id="gpt-4o",
623
+ api_key=settings.openai_api_key,
624
+ )
625
+
626
+ return ChatAgent(
627
+ name="ReportAgent",
628
+ description="Synthesizes research findings into structured reports",
629
+ instructions="""You are a scientific report writer. When asked to synthesize:
630
+
631
+ 1. First, call get_evidence_summary() to review all collected evidence
632
+ 2. Then call get_bibliography() to get properly formatted citations
633
+
634
+ Generate a structured report with these sections:
635
+
636
+ ## Executive Summary
637
+ Brief overview of findings and recommendation
638
+
639
+ ## Methodology
640
+ Databases searched, queries used, evidence reviewed
641
+
642
+ ## Key Findings
643
+ ### Mechanism of Action
644
+ - Molecular targets
645
+ - Biological pathways
646
+ - Proposed mechanism
647
+
648
+ ### Clinical Evidence
649
+ - Preclinical studies
650
+ - Clinical trials
651
+ - Safety profile
652
+
653
+ ## Drug Candidates
654
+ List specific drugs with repurposing potential
655
+
656
+ ## Limitations
657
+ Gaps in evidence, conflicting data, caveats
658
+
659
+ ## Conclusion
660
+ Final recommendation with confidence level
661
+
662
+ ## References
663
+ Use the output from get_bibliography() - do not make up citations!
664
+
665
+ Be comprehensive but concise. Cite evidence for all claims.""",
666
+ chat_client=client,
667
+ tools=[get_evidence_summary, get_bibliography], # Access to collected evidence
668
+ temperature=0.3,
669
+ )
670
  ```
671
 
672
+ ### 3.4 Magentic Orchestrator (`src/orchestrator_magentic.py`)
673
 
674
  ```python
675
+ """Magentic-based orchestrator using ChatAgent pattern."""
676
+ from collections.abc import AsyncGenerator
677
+ from typing import Any
678
 
679
+ import structlog
680
  from agent_framework import (
681
+ MagenticAgentDeltaEvent,
682
+ MagenticAgentMessageEvent,
683
  MagenticBuilder,
684
  MagenticFinalResultEvent,
 
685
  MagenticOrchestratorMessageEvent,
 
686
  WorkflowOutputEvent,
687
  )
688
  from agent_framework.openai import OpenAIChatClient
689
 
690
+ from src.agents.magentic_agents import (
691
+ create_hypothesis_agent,
692
+ create_judge_agent,
693
+ create_report_agent,
694
+ create_search_agent,
695
+ )
696
+ from src.agents.state import get_magentic_state, reset_magentic_state
697
+ from src.utils.config import settings
698
+ from src.utils.exceptions import ConfigurationError
699
+ from src.utils.models import AgentEvent
700
 
701
  logger = structlog.get_logger()
702
 
703
 
704
  class MagenticOrchestrator:
705
  """
706
+ Magentic-based orchestrator using ChatAgent pattern.
707
 
708
+ Each agent has an internal LLM that understands natural language
709
+ instructions from the manager and can call tools appropriately.
710
  """
711
 
712
  def __init__(
713
  self,
 
 
714
  max_rounds: int = 10,
715
+ chat_client: OpenAIChatClient | None = None,
716
+ ) -> None:
717
+ """Initialize orchestrator.
 
 
 
 
 
 
718
 
719
+ Args:
720
+ max_rounds: Maximum coordination rounds
721
+ chat_client: Optional shared chat client for agents
722
  """
723
+ if not settings.openai_api_key:
724
+ raise ConfigurationError(
725
+ "Magentic mode requires OPENAI_API_KEY. "
726
+ "Set the key or use mode='simple'."
727
+ )
728
 
729
+ self._max_rounds = max_rounds
730
+ self._chat_client = chat_client
731
+
732
+ def _build_workflow(self) -> Any:
733
+ """Build the Magentic workflow with ChatAgent participants."""
734
+ # Create agents with internal LLMs
735
+ search_agent = create_search_agent(self._chat_client)
736
+ judge_agent = create_judge_agent(self._chat_client)
737
+ hypothesis_agent = create_hypothesis_agent(self._chat_client)
738
+ report_agent = create_report_agent(self._chat_client)
739
+
740
+ # Manager chat client (orchestrates the agents)
741
+ manager_client = OpenAIChatClient(
742
+ model_id="gpt-4o", # Good model for planning/coordination
743
+ api_key=settings.openai_api_key,
744
  )
745
 
746
+ return (
 
 
 
 
 
 
747
  MagenticBuilder()
748
  .participants(
749
  searcher=search_agent,
750
+ hypothesizer=hypothesis_agent,
751
  judge=judge_agent,
752
+ reporter=report_agent,
753
  )
754
  .with_standard_manager(
755
+ chat_client=manager_client,
756
  max_round_count=self._max_rounds,
757
  max_stall_count=3,
758
  max_reset_count=2,
 
760
  .build()
761
  )
762
 
763
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
764
+ """
765
+ Run the Magentic workflow.
766
+
767
+ Args:
768
+ query: User's research question
769
+
770
+ Yields:
771
+ AgentEvent objects for real-time UI updates
772
+ """
773
+ logger.info("Starting Magentic orchestrator", query=query)
774
+
775
+ # CRITICAL: Reset state for fresh workflow run
776
+ reset_magentic_state()
777
+
778
+ # Initialize embedding service if available
779
+ state = get_magentic_state()
780
+ state.init_embedding_service()
781
+
782
+ yield AgentEvent(
783
+ type="started",
784
+ message=f"Starting research (Magentic mode): {query}",
785
+ iteration=0,
786
+ )
787
+
788
+ workflow = self._build_workflow()
789
+
790
  task = f"""Research drug repurposing opportunities for: {query}
791
 
792
+ Workflow:
793
+ 1. SearchAgent: Find evidence from PubMed, ClinicalTrials.gov, and bioRxiv
794
+ 2. HypothesisAgent: Generate mechanistic hypotheses (Drug β†’ Target β†’ Pathway β†’ Effect)
795
+ 3. JudgeAgent: Evaluate if evidence is sufficient
796
+ 4. If insufficient β†’ SearchAgent refines search based on gaps
797
+ 5. If sufficient β†’ ReportAgent synthesizes final report
798
+
799
+ Focus on:
800
+ - Identifying specific molecular targets
801
+ - Understanding mechanism of action
802
+ - Finding clinical evidence supporting hypotheses
803
+
804
+ The final output should be a structured research report."""
805
 
806
  iteration = 0
807
  try:
 
808
  async for event in workflow.run_stream(task):
809
+ agent_event = self._process_event(event, iteration)
810
+ if agent_event:
811
+ if isinstance(event, MagenticAgentMessageEvent):
812
+ iteration += 1
813
+ yield agent_event
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
814
 
815
  except Exception as e:
816
  logger.error("Magentic workflow failed", error=str(e))
817
  yield AgentEvent(
818
  type="error",
819
+ message=f"Workflow error: {e!s}",
820
+ iteration=iteration,
821
+ )
822
+
823
+ def _process_event(self, event: Any, iteration: int) -> AgentEvent | None:
824
+ """Process workflow event into AgentEvent."""
825
+ if isinstance(event, MagenticOrchestratorMessageEvent):
826
+ text = event.message.text if event.message else ""
827
+ if text:
828
+ return AgentEvent(
829
+ type="judging",
830
+ message=f"Manager ({event.kind}): {text[:200]}...",
831
+ iteration=iteration,
832
+ )
833
+
834
+ elif isinstance(event, MagenticAgentMessageEvent):
835
+ agent_name = event.agent_id or "unknown"
836
+ text = event.message.text if event.message else ""
837
+
838
+ event_type = "judging"
839
+ if "search" in agent_name.lower():
840
+ event_type = "search_complete"
841
+ elif "judge" in agent_name.lower():
842
+ event_type = "judge_complete"
843
+ elif "hypothes" in agent_name.lower():
844
+ event_type = "hypothesizing"
845
+ elif "report" in agent_name.lower():
846
+ event_type = "synthesizing"
847
+
848
+ return AgentEvent(
849
+ type=event_type,
850
+ message=f"{agent_name}: {text[:200]}...",
851
+ iteration=iteration + 1,
852
+ )
853
+
854
+ elif isinstance(event, MagenticFinalResultEvent):
855
+ text = event.message.text if event.message else "No result"
856
+ return AgentEvent(
857
+ type="complete",
858
+ message=text,
859
+ data={"iterations": iteration},
860
  iteration=iteration,
861
  )
 
862
 
863
+ elif isinstance(event, MagenticAgentDeltaEvent):
864
+ if event.text:
865
+ return AgentEvent(
866
+ type="streaming",
867
+ message=event.text,
868
+ data={"agent_id": event.agent_id},
869
+ iteration=iteration,
870
+ )
871
+
872
+ elif isinstance(event, WorkflowOutputEvent):
873
+ if event.data:
874
+ return AgentEvent(
875
+ type="complete",
876
+ message=str(event.data),
877
+ iteration=iteration,
878
+ )
879
+
880
+ return None
881
+ ```
882
 
883
+ ### 3.4 Updated Factory (`src/orchestrator_factory.py`)
884
 
885
  ```python
886
  """Factory for creating orchestrators."""
887
+ from typing import Any, Literal
888
 
889
+ from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
 
 
890
  from src.utils.models import OrchestratorConfig
891
 
892
 
893
  def create_orchestrator(
894
+ search_handler: SearchHandlerProtocol | None = None,
895
+ judge_handler: JudgeHandlerProtocol | None = None,
896
  config: OrchestratorConfig | None = None,
897
  mode: Literal["simple", "magentic"] = "simple",
898
+ ) -> Any:
899
  """
900
  Create an orchestrator instance.
901
 
902
  Args:
903
+ search_handler: The search handler (required for simple mode)
904
+ judge_handler: The judge handler (required for simple mode)
905
  config: Optional configuration
906
+ mode: "simple" for Phase 4 loop, "magentic" for ChatAgent-based multi-agent
907
 
908
  Returns:
909
+ Orchestrator instance
910
+
911
+ Note:
912
+ Magentic mode does NOT use search_handler/judge_handler.
913
+ It creates ChatAgent instances with internal LLMs that call tools directly.
914
  """
915
  if mode == "magentic":
916
  try:
917
  from src.orchestrator_magentic import MagenticOrchestrator
918
+
919
  return MagenticOrchestrator(
 
 
920
  max_rounds=config.max_iterations if config else 10,
921
  )
922
  except ImportError:
923
  # Fallback to simple if agent-framework not installed
924
  pass
925
 
926
+ # Simple mode requires handlers
927
+ if search_handler is None or judge_handler is None:
928
+ raise ValueError("Simple mode requires search_handler and judge_handler")
929
+
930
  return Orchestrator(
931
  search_handler=search_handler,
932
  judge_handler=judge_handler,
 
936
 
937
  ---
938
 
939
+ ## 4. Why This Works
940
+
941
+ ### 4.1 The Manager β†’ Agent Communication
942
 
943
  ```
944
+ Manager LLM decides: "Tell SearchAgent to find clinical trials for metformin"
945
+ ↓
946
+ Sends instruction: "Search for clinical trials about metformin and cancer"
947
+ ↓
948
+ SearchAgent's INTERNAL LLM receives this
949
+ ↓
950
+ Internal LLM understands: "I should call search_clinical_trials('metformin cancer')"
951
+ ↓
952
+ Tool executes: ClinicalTrials.gov API
953
+ ↓
954
+ Internal LLM formats response: "I found 15 trials. Here are the key ones..."
955
+ ↓
956
+ Manager receives natural language response
957
+ ```
958
+
959
+ ### 4.2 Why Our Old Implementation Failed
960
+
961
+ ```
962
+ Manager sends: "Search for clinical trials about metformin..."
963
+ ↓
964
+ OLD SearchAgent.run() extracts: query = "Search for clinical trials about metformin..."
965
+ ↓
966
+ Passes to PubMed: pubmed.search("Search for clinical trials about metformin...")
967
+ ↓
968
+ PubMed doesn't understand English instructions β†’ garbage results or error
969
  ```
970
 
971
  ---
972
 
973
+ ## 5. Directory Structure
974
 
975
+ ```text
976
+ src/
977
+ β”œβ”€β”€ agents/
978
+ β”‚ β”œβ”€β”€ __init__.py
979
+ β”‚ β”œβ”€β”€ state.py # MagenticState (evidence_store + embeddings)
980
+ β”‚ β”œβ”€β”€ tools.py # AIFunction tool definitions (update state)
981
+ β”‚ └── magentic_agents.py # ChatAgent factory functions
982
+ β”œβ”€β”€ services/
983
+ β”‚ └── embeddings.py # EmbeddingService (semantic dedup)
984
+ β”œβ”€β”€ orchestrator.py # Simple mode (unchanged)
985
+ β”œβ”€β”€ orchestrator_magentic.py # Magentic mode with ChatAgents
986
+ └── orchestrator_factory.py # Mode selection
987
+ ```
988
 
989
  ---
990
 
991
+ ## 6. Dependencies
992
 
993
+ ```toml
994
+ [project.optional-dependencies]
995
+ magentic = [
996
+ "agent-framework-core>=1.0.0b",
997
+ "agent-framework-openai>=1.0.0b", # For OpenAIChatClient
998
+ ]
999
+ embeddings = [
1000
+ "chromadb>=0.4.0",
1001
+ "sentence-transformers>=2.2.0",
1002
+ ]
1003
+ ```
1004
 
1005
+ **IMPORTANT: Magentic mode REQUIRES OpenAI API key.**
 
 
1006
 
1007
+ The Microsoft Agent Framework's standard manager and ChatAgent use OpenAIChatClient internally.
1008
+ There is no AnthropicChatClient in the framework. If only `ANTHROPIC_API_KEY` is set:
1009
+ - `mode="simple"` works fine
1010
+ - `mode="magentic"` throws `ConfigurationError`
1011
 
1012
+ This is enforced in `MagenticOrchestrator.__init__`.
 
1013
 
1014
+ ---
 
 
 
1015
 
1016
+ ## 7. Implementation Checklist
 
1017
 
1018
+ - [ ] Create `src/agents/state.py` with MagenticState class
1019
+ - [ ] Create `src/agents/tools.py` with AIFunction search tools + state updates
1020
+ - [ ] Create `src/agents/magentic_agents.py` with ChatAgent factories
1021
+ - [ ] Rewrite `src/orchestrator_magentic.py` to use ChatAgent pattern
1022
+ - [ ] Update `src/orchestrator_factory.py` for new signature
1023
+ - [ ] Test with real OpenAI API
1024
+ - [ ] Verify manager properly coordinates agents
1025
+ - [ ] Ensure tools are called with correct parameters
1026
+ - [ ] Verify semantic deduplication works (evidence_store populates)
1027
+ - [ ] Verify bibliography generation in final reports
1028
 
1029
+ ---
1030
 
1031
+ ## 8. Definition of Done
1032
 
1033
+ Phase 5 is **COMPLETE** when:
 
 
 
 
1034
 
1035
+ 1. Magentic mode runs without hanging
1036
+ 2. Manager successfully coordinates agents via natural language
1037
+ 3. SearchAgent calls tools with proper search keywords (not raw instructions)
1038
+ 4. JudgeAgent evaluates evidence from conversation history
1039
+ 5. ReportAgent generates structured final report
1040
+ 6. Events stream to UI correctly
1041
 
1042
  ---
1043
 
1044
+ ## 9. Testing Magentic Mode
1045
 
1046
+ ```bash
1047
+ # Test with real API
1048
+ OPENAI_API_KEY=sk-... uv run python -c "
1049
+ import asyncio
1050
+ from src.orchestrator_factory import create_orchestrator
1051
 
1052
+ async def test():
1053
+ orch = create_orchestrator(mode='magentic')
1054
+ async for event in orch.run('metformin alzheimer'):
1055
+ print(f'[{event.type}] {event.message[:100]}')
1056
+
1057
+ asyncio.run(test())
1058
+ "
1059
+ ```
1060
 
1061
+ Expected output:
1062
+ ```
1063
+ [started] Starting research (Magentic mode): metformin alzheimer
1064
+ [judging] Manager (plan): I will coordinate the agents to research...
1065
+ [search_complete] SearchAgent: Found 25 PubMed results for metformin alzheimer...
1066
+ [hypothesizing] HypothesisAgent: Based on the evidence, I propose...
1067
+ [judge_complete] JudgeAgent: Mechanism Score: 7/10, Clinical Score: 6/10...
1068
+ [synthesizing] ReportAgent: ## Executive Summary...
1069
+ [complete] <full research report>
1070
+ ```
1071
+
1072
+ ---
1073
 
1074
+ ## 10. Key Differences from Old Spec
1075
+
1076
+ | Aspect | OLD (Wrong) | NEW (Correct) |
1077
+ |--------|-------------|---------------|
1078
+ | Agent type | `BaseAgent` subclass | `ChatAgent` with `chat_client` |
1079
+ | Internal LLM | None | OpenAIChatClient |
1080
+ | How tools work | Handler.execute(raw_instruction) | LLM understands instruction, calls AIFunction |
1081
+ | Message handling | Extract text β†’ pass to API | LLM interprets β†’ extracts keywords β†’ calls tool |
1082
+ | State management | Passed to agent constructors | Global MagenticState singleton |
1083
+ | Evidence storage | In agent instance | In MagenticState.evidence_store |
1084
+ | Semantic search | Coupled to agents | Tools call state.add_evidence() |
1085
+ | Citations for report | From agent's store | Via get_bibliography() tool |
1086
+
1087
+ **Key Insights:**
1088
+ 1. Magentic agents must have internal LLMs to understand natural language instructions
1089
+ 2. Tools must update shared state as a side effect (return strings, but also store Evidence)
1090
+ 3. ReportAgent uses `get_bibliography()` tool to access structured citations
1091
+ 4. State is reset at start of each workflow run via `reset_magentic_state()`