VibecoderMcSwaggins commited on
Commit
a941b78
·
1 Parent(s): 474538c

chore: remove resolved bug documentation

Browse files

- Delete 005_services_not_integrated.md - embeddings now wired to simple orchestrator
(enable_embeddings=True is the default in orchestrator.py)
- Delete 006_magentic_mode_broken.md - magentic mode is experimental/optional,
documented as requiring OpenAI (not a bug)

docs/bugs/005_services_not_integrated.md DELETED
@@ -1,142 +0,0 @@
1
- # Bug 005: Embedding Services Built But Not Wired to Default Orchestrator
2
-
3
- **Date:** November 26, 2025
4
- **Severity:** CRITICAL
5
- **Status:** Open
6
-
7
- ## 1. The Problem
8
-
9
- Two complete semantic search services exist but are **NOT USED** by the default orchestrator:
10
-
11
- | Service | Location | Status |
12
- | ------- | -------- | ------ |
13
- | EmbeddingService | `src/services/embeddings.py` | BUILT, not wired to simple mode |
14
- | LlamaIndexRAGService | `src/services/llamaindex_rag.py` | BUILT, not wired to simple mode |
15
-
16
- ## 2. Root Cause: Two Orchestrators
17
-
18
- ```
19
- ┌─────────────────────────────────────────────────────────────────┐
20
- │ orchestrator.py (SIMPLE MODE - DEFAULT) │
21
- │ - Basic search → judge → loop │
22
- │ - NO embeddings │
23
- │ - NO semantic search │
24
- │ - Hand-rolled keyword matching │
25
- └─────────────────────────────────────────────────────────────────┘
26
-
27
- ┌─────────────────────────────────────────────────────────────────┐
28
- │ orchestrator_magentic.py (MAGENTIC MODE) │
29
- │ - Multi-agent architecture │
30
- │ - USES EmbeddingService │
31
- │ - USES semantic search │
32
- │ - Requires agent-framework (optional dep) │
33
- │ - OpenAI only │
34
- └─────────────────────────────────────────────────────────────────┘
35
- ```
36
-
37
- **The UI defaults to simple mode**, which bypasses all the semantic search infrastructure.
38
-
39
- ## 3. What's Built (Not Wired)
40
-
41
- ### EmbeddingService (NO API KEY NEEDED)
42
-
43
- ```python
44
- # src/services/embeddings.py
45
- class EmbeddingService:
46
- async def embed(text) -> list[float]
47
- async def search_similar(query) -> list[dict] # SEMANTIC SEARCH
48
- async def deduplicate(evidence) -> list # DEDUPLICATION
49
- ```
50
-
51
- - Uses local sentence-transformers
52
- - ChromaDB vector store
53
- - **Works without API keys**
54
-
55
- ### LlamaIndexRAGService
56
-
57
- ```python
58
- # src/services/llamaindex_rag.py
59
- class LlamaIndexRAGService:
60
- def ingest_evidence(evidence_list)
61
- def retrieve(query) -> list[dict] # Semantic retrieval
62
- def query(query_str) -> str # Synthesized response
63
- ```
64
-
65
- ## 4. Where Services ARE Used
66
-
67
- ```
68
- src/orchestrator_magentic.py ← Uses EmbeddingService
69
- src/agents/search_agent.py ← Uses EmbeddingService
70
- src/agents/report_agent.py ← Uses EmbeddingService
71
- src/agents/hypothesis_agent.py ← Uses EmbeddingService
72
- src/agents/analysis_agent.py ← Uses EmbeddingService
73
- ```
74
-
75
- All in magentic mode agents, NOT in simple orchestrator.
76
-
77
- ## 5. The Fix Options
78
-
79
- ### Option A: Add Embeddings to Simple Orchestrator (RECOMMENDED)
80
-
81
- Modify `src/orchestrator.py` to optionally use EmbeddingService:
82
-
83
- ```python
84
- class Orchestrator:
85
- def __init__(self, ..., use_embeddings: bool = True):
86
- if use_embeddings:
87
- from src.services.embeddings import get_embedding_service
88
- self.embeddings = get_embedding_service()
89
- else:
90
- self.embeddings = None
91
-
92
- async def run(self, query):
93
- # ... search phase ...
94
-
95
- if self.embeddings:
96
- # Semantic ranking
97
- all_evidence = await self._rank_by_relevance(all_evidence, query)
98
- # Deduplication
99
- all_evidence = await self.embeddings.deduplicate(all_evidence)
100
- ```
101
-
102
- ### Option B: Make Magentic Mode Default
103
-
104
- Change app.py to default to "magentic" mode when deps available.
105
-
106
- ### Option C: Merge Best of Both
107
-
108
- Create a new orchestrator that:
109
- - Has the simplicity of simple mode
110
- - Uses embeddings for ranking/dedup
111
- - Doesn't require agent-framework
112
-
113
- ## 6. Implementation Plan
114
-
115
- ### Phase 1: Wire EmbeddingService to Simple Orchestrator
116
-
117
- 1. Import EmbeddingService in orchestrator.py
118
- 2. Add semantic ranking after search
119
- 3. Add deduplication before judge
120
- 4. Test end-to-end
121
-
122
- ### Phase 2: Add Relevance to Evidence
123
-
124
- 1. Use embedding similarity as relevance score
125
- 2. Sort evidence by relevance
126
- 3. Only send top-K to judge
127
-
128
- ## 7. Files to Modify
129
-
130
- ```
131
- src/orchestrator.py ← Add embedding integration
132
- src/orchestrator_factory.py ← Pass embeddings flag
133
- src/app.py ← Enable embeddings by default
134
- ```
135
-
136
- ## 8. Success Criteria
137
-
138
- - [ ] Default mode uses semantic search
139
- - [ ] Evidence ranked by relevance
140
- - [ ] Duplicates removed
141
- - [ ] No new API keys required (sentence-transformers is local)
142
- - [ ] Magentic mode still works as before
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/006_magentic_mode_broken.md DELETED
@@ -1,211 +0,0 @@
1
- # Bug 006: Magentic Mode Deeply Broken
2
-
3
- **Date:** November 26, 2025
4
- **Severity:** HIGH
5
- **Status:** Open (Low Priority - Simple Mode Works)
6
-
7
- ## 1. The Problem
8
-
9
- Magentic mode (`mode="magentic"`) is **non-functional**. When enabled:
10
- - Workflow hangs indefinitely (observed in local testing)
11
- - No events are yielded to the UI
12
- - API calls may be made but responses are not processed
13
-
14
- ## 2. Root Cause Analysis
15
-
16
- ### 2.1 Architecture Complexity
17
-
18
- ```
19
- ┌─────────────────────────────────────────────────────────────────┐
20
- │ MagenticOrchestrator │
21
- │ │
22
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
23
- │ │ SearchAgent │ │ HypothesisAg│ │ JudgeAgent │ │
24
- │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
25
- │ │ │ │ │
26
- │ ▼ ▼ ▼ │
27
- │ ┌─────────────────────────────────────────────────────────┐ │
28
- │ │ MagenticBuilder Standard Manager │ │
29
- │ │ (OpenAIChatClient orchestration) │ │
30
- │ │ │ │
31
- │ │ - Decides which agent to call │ │
32
- │ │ - Parses agent responses │ │
33
- │ │ - Loops until "final result" │ │
34
- │ └─────────────────────────────────────────────────────────┘ │
35
- └─────────────────────────────────────────────────────────────────┘
36
- ```
37
-
38
- The issue is in the **Standard Manager** layer from `agent-framework-core`:
39
- - It uses an LLM to decide which agent to call next
40
- - The LLM response parsing is fragile
41
- - The loop can stall or hang if parsing fails
42
-
43
- ### 2.2 Specific Issues
44
-
45
- | Issue | Location | Impact |
46
- |-------|----------|--------|
47
- | OpenAI-only | `orchestrator_magentic.py:103` | Can't use Anthropic |
48
- | Manager parsing | `agent-framework` library | Hangs on malformed responses |
49
- | No timeout | `MagenticBuilder` | Workflow runs forever |
50
- | Round limits insufficient | `max_round_count=10` | Still hangs within rounds |
51
-
52
- ### 2.3 Observed Behavior
53
-
54
- ```bash
55
- # Test magentic mode
56
- uv run python -c "
57
- from src.orchestrator_factory import create_orchestrator
58
- ...
59
- orch = create_orchestrator(mode='magentic')
60
- async for event in orch.run('metformin alzheimer'):
61
- print(event.type)
62
- "
63
-
64
- # Result: Hangs indefinitely after "started" event
65
- # No search, no judge, no completion
66
- ```
67
-
68
- ## 3. Technical Deep Dive
69
-
70
- ### 3.1 The Manager's Role
71
-
72
- The `MagenticBuilder.with_standard_manager()` creates an LLM-powered router:
73
-
74
- ```python
75
- # From orchestrator_magentic.py lines 94-111
76
- MagenticBuilder()
77
- .participants(
78
- searcher=search_agent,
79
- hypothesizer=hypothesis_agent,
80
- judge=judge_agent,
81
- reporter=report_agent,
82
- )
83
- .with_standard_manager(
84
- chat_client=OpenAIChatClient(
85
- model_id=settings.openai_model,
86
- api_key=settings.openai_api_key
87
- ),
88
- max_round_count=self._max_rounds, # 10
89
- max_stall_count=3,
90
- max_reset_count=2,
91
- )
92
- ```
93
-
94
- The manager:
95
- 1. Receives the task
96
- 2. Calls OpenAI to decide: "Which agent should handle this?"
97
- 3. Parses response to extract agent name
98
- 4. Calls that agent
99
- 5. Receives result
100
- 6. Calls OpenAI again: "What next?"
101
- 7. Repeat until "final result"
102
-
103
- ### 3.2 Where It Breaks
104
-
105
- The manager's LLM parsing expects specific response formats. If OpenAI returns:
106
- - Unexpected JSON structure → parse error → stall
107
- - Agent name with typo → agent not found → reset
108
- - Verbose explanation → extraction fails → hang
109
-
110
- ### 3.3 The Event Processing
111
-
112
- ```python
113
- # orchestrator_magentic.py lines 178-191
114
- async for event in workflow.run_stream(task):
115
- agent_event = self._process_event(event, iteration)
116
- if agent_event:
117
- # Events are processed but may never arrive
118
- yield agent_event
119
- ```
120
-
121
- If `workflow.run_stream()` never yields events (manager stuck), the UI sees nothing.
122
-
123
- ## 4. Why Simple Mode Works
124
-
125
- Simple mode bypasses all of this:
126
-
127
- ```python
128
- # orchestrator.py
129
- while iteration < self.config.max_iterations:
130
- # Direct calls - no LLM routing
131
- search_results = await self.search.execute(query)
132
- assessment = await self.judge.assess(query, evidence)
133
-
134
- if assessment.sufficient:
135
- return synthesis
136
- else:
137
- continue # Deterministic loop
138
- ```
139
-
140
- No LLM-powered routing. No parsing. No hangs.
141
-
142
- ## 5. Fix Options
143
-
144
- ### Option A: Abandon Magentic (Recommended)
145
-
146
- Simple mode + HFInferenceJudgeHandler provides:
147
- - Free AI analysis
148
- - Reliable execution
149
- - No complex dependencies
150
-
151
- Mark magentic as "experimental" or remove entirely.
152
-
153
- ### Option B: Fix the Manager (Hard)
154
-
155
- 1. Add timeout to `workflow.run_stream()`
156
- 2. Implement custom manager without LLM routing
157
- 3. Use deterministic agent selection
158
- 4. Add better error handling in event processing
159
-
160
- ### Option C: Replace agent-framework (Medium)
161
-
162
- Use a different multi-agent framework:
163
- - LangGraph
164
- - AutoGen
165
- - Custom implementation
166
-
167
- ## 6. Recommendation
168
-
169
- **Do not use magentic mode for the hackathon.**
170
-
171
- Simple mode with HFInferenceJudgeHandler:
172
- - Works reliably
173
- - Provides real AI analysis
174
- - No extra dependencies
175
- - No API routing issues
176
-
177
- ## 7. Files Involved
178
-
179
- ```
180
- src/orchestrator_magentic.py ← Main orchestrator (broken)
181
- src/agents/search_agent.py ← Works in isolation
182
- src/agents/judge_agent.py ← Works in isolation
183
- src/agents/hypothesis_agent.py ← Works in isolation
184
- src/agents/report_agent.py ← Works in isolation
185
- ```
186
-
187
- The agents themselves work. The **manager** coordination is broken.
188
-
189
- ## 8. Verification
190
-
191
- To verify this bug still exists:
192
-
193
- ```bash
194
- # This should hang
195
- uv run python -c "
196
- import asyncio
197
- from src.app import configure_orchestrator
198
-
199
- orch, name = configure_orchestrator(mode='magentic', use_mock=False)
200
- print(f'Backend: {name}')
201
-
202
- async def test():
203
- async for event in orch.run('test query'):
204
- print(event.type)
205
-
206
- asyncio.run(test())
207
- "
208
- ```
209
-
210
- Expected: Hangs after "started"
211
- Working: Would show search_complete, judge_complete, etc.