Spaces:
Running
Running
Commit
·
a941b78
1
Parent(s):
474538c
chore: remove resolved bug documentation
Browse files- Delete 005_services_not_integrated.md - embeddings now wired to simple orchestrator
(enable_embeddings=True is the default in orchestrator.py)
- Delete 006_magentic_mode_broken.md - magentic mode is experimental/optional,
documented as requiring OpenAI (not a bug)
docs/bugs/005_services_not_integrated.md
DELETED
|
@@ -1,142 +0,0 @@
|
|
| 1 |
-
# Bug 005: Embedding Services Built But Not Wired to Default Orchestrator
|
| 2 |
-
|
| 3 |
-
**Date:** November 26, 2025
|
| 4 |
-
**Severity:** CRITICAL
|
| 5 |
-
**Status:** Open
|
| 6 |
-
|
| 7 |
-
## 1. The Problem
|
| 8 |
-
|
| 9 |
-
Two complete semantic search services exist but are **NOT USED** by the default orchestrator:
|
| 10 |
-
|
| 11 |
-
| Service | Location | Status |
|
| 12 |
-
| ------- | -------- | ------ |
|
| 13 |
-
| EmbeddingService | `src/services/embeddings.py` | BUILT, not wired to simple mode |
|
| 14 |
-
| LlamaIndexRAGService | `src/services/llamaindex_rag.py` | BUILT, not wired to simple mode |
|
| 15 |
-
|
| 16 |
-
## 2. Root Cause: Two Orchestrators
|
| 17 |
-
|
| 18 |
-
```
|
| 19 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 20 |
-
│ orchestrator.py (SIMPLE MODE - DEFAULT) │
|
| 21 |
-
│ - Basic search → judge → loop │
|
| 22 |
-
│ - NO embeddings │
|
| 23 |
-
│ - NO semantic search │
|
| 24 |
-
│ - Hand-rolled keyword matching │
|
| 25 |
-
└─────────────────────────────────────────────────────────────────┘
|
| 26 |
-
|
| 27 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 28 |
-
│ orchestrator_magentic.py (MAGENTIC MODE) │
|
| 29 |
-
│ - Multi-agent architecture │
|
| 30 |
-
│ - USES EmbeddingService │
|
| 31 |
-
│ - USES semantic search │
|
| 32 |
-
│ - Requires agent-framework (optional dep) │
|
| 33 |
-
│ - OpenAI only │
|
| 34 |
-
└─────────────────────────────────────────────────────────────────┘
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
**The UI defaults to simple mode**, which bypasses all the semantic search infrastructure.
|
| 38 |
-
|
| 39 |
-
## 3. What's Built (Not Wired)
|
| 40 |
-
|
| 41 |
-
### EmbeddingService (NO API KEY NEEDED)
|
| 42 |
-
|
| 43 |
-
```python
|
| 44 |
-
# src/services/embeddings.py
|
| 45 |
-
class EmbeddingService:
|
| 46 |
-
async def embed(text) -> list[float]
|
| 47 |
-
async def search_similar(query) -> list[dict] # SEMANTIC SEARCH
|
| 48 |
-
async def deduplicate(evidence) -> list # DEDUPLICATION
|
| 49 |
-
```
|
| 50 |
-
|
| 51 |
-
- Uses local sentence-transformers
|
| 52 |
-
- ChromaDB vector store
|
| 53 |
-
- **Works without API keys**
|
| 54 |
-
|
| 55 |
-
### LlamaIndexRAGService
|
| 56 |
-
|
| 57 |
-
```python
|
| 58 |
-
# src/services/llamaindex_rag.py
|
| 59 |
-
class LlamaIndexRAGService:
|
| 60 |
-
def ingest_evidence(evidence_list)
|
| 61 |
-
def retrieve(query) -> list[dict] # Semantic retrieval
|
| 62 |
-
def query(query_str) -> str # Synthesized response
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
## 4. Where Services ARE Used
|
| 66 |
-
|
| 67 |
-
```
|
| 68 |
-
src/orchestrator_magentic.py ← Uses EmbeddingService
|
| 69 |
-
src/agents/search_agent.py ← Uses EmbeddingService
|
| 70 |
-
src/agents/report_agent.py ← Uses EmbeddingService
|
| 71 |
-
src/agents/hypothesis_agent.py ← Uses EmbeddingService
|
| 72 |
-
src/agents/analysis_agent.py ← Uses EmbeddingService
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
All in magentic mode agents, NOT in simple orchestrator.
|
| 76 |
-
|
| 77 |
-
## 5. The Fix Options
|
| 78 |
-
|
| 79 |
-
### Option A: Add Embeddings to Simple Orchestrator (RECOMMENDED)
|
| 80 |
-
|
| 81 |
-
Modify `src/orchestrator.py` to optionally use EmbeddingService:
|
| 82 |
-
|
| 83 |
-
```python
|
| 84 |
-
class Orchestrator:
|
| 85 |
-
def __init__(self, ..., use_embeddings: bool = True):
|
| 86 |
-
if use_embeddings:
|
| 87 |
-
from src.services.embeddings import get_embedding_service
|
| 88 |
-
self.embeddings = get_embedding_service()
|
| 89 |
-
else:
|
| 90 |
-
self.embeddings = None
|
| 91 |
-
|
| 92 |
-
async def run(self, query):
|
| 93 |
-
# ... search phase ...
|
| 94 |
-
|
| 95 |
-
if self.embeddings:
|
| 96 |
-
# Semantic ranking
|
| 97 |
-
all_evidence = await self._rank_by_relevance(all_evidence, query)
|
| 98 |
-
# Deduplication
|
| 99 |
-
all_evidence = await self.embeddings.deduplicate(all_evidence)
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
### Option B: Make Magentic Mode Default
|
| 103 |
-
|
| 104 |
-
Change app.py to default to "magentic" mode when deps available.
|
| 105 |
-
|
| 106 |
-
### Option C: Merge Best of Both
|
| 107 |
-
|
| 108 |
-
Create a new orchestrator that:
|
| 109 |
-
- Has the simplicity of simple mode
|
| 110 |
-
- Uses embeddings for ranking/dedup
|
| 111 |
-
- Doesn't require agent-framework
|
| 112 |
-
|
| 113 |
-
## 6. Implementation Plan
|
| 114 |
-
|
| 115 |
-
### Phase 1: Wire EmbeddingService to Simple Orchestrator
|
| 116 |
-
|
| 117 |
-
1. Import EmbeddingService in orchestrator.py
|
| 118 |
-
2. Add semantic ranking after search
|
| 119 |
-
3. Add deduplication before judge
|
| 120 |
-
4. Test end-to-end
|
| 121 |
-
|
| 122 |
-
### Phase 2: Add Relevance to Evidence
|
| 123 |
-
|
| 124 |
-
1. Use embedding similarity as relevance score
|
| 125 |
-
2. Sort evidence by relevance
|
| 126 |
-
3. Only send top-K to judge
|
| 127 |
-
|
| 128 |
-
## 7. Files to Modify
|
| 129 |
-
|
| 130 |
-
```
|
| 131 |
-
src/orchestrator.py ← Add embedding integration
|
| 132 |
-
src/orchestrator_factory.py ← Pass embeddings flag
|
| 133 |
-
src/app.py ← Enable embeddings by default
|
| 134 |
-
```
|
| 135 |
-
|
| 136 |
-
## 8. Success Criteria
|
| 137 |
-
|
| 138 |
-
- [ ] Default mode uses semantic search
|
| 139 |
-
- [ ] Evidence ranked by relevance
|
| 140 |
-
- [ ] Duplicates removed
|
| 141 |
-
- [ ] No new API keys required (sentence-transformers is local)
|
| 142 |
-
- [ ] Magentic mode still works as before
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/006_magentic_mode_broken.md
DELETED
|
@@ -1,211 +0,0 @@
|
|
| 1 |
-
# Bug 006: Magentic Mode Deeply Broken
|
| 2 |
-
|
| 3 |
-
**Date:** November 26, 2025
|
| 4 |
-
**Severity:** HIGH
|
| 5 |
-
**Status:** Open (Low Priority - Simple Mode Works)
|
| 6 |
-
|
| 7 |
-
## 1. The Problem
|
| 8 |
-
|
| 9 |
-
Magentic mode (`mode="magentic"`) is **non-functional**. When enabled:
|
| 10 |
-
- Workflow hangs indefinitely (observed in local testing)
|
| 11 |
-
- No events are yielded to the UI
|
| 12 |
-
- API calls may be made but responses are not processed
|
| 13 |
-
|
| 14 |
-
## 2. Root Cause Analysis
|
| 15 |
-
|
| 16 |
-
### 2.1 Architecture Complexity
|
| 17 |
-
|
| 18 |
-
```
|
| 19 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 20 |
-
│ MagenticOrchestrator │
|
| 21 |
-
│ │
|
| 22 |
-
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
| 23 |
-
│ │ SearchAgent │ │ HypothesisAg│ │ JudgeAgent │ │
|
| 24 |
-
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
| 25 |
-
│ │ │ │ │
|
| 26 |
-
│ ▼ ▼ ▼ │
|
| 27 |
-
│ ┌─────────────────────────────────────────────────────────┐ │
|
| 28 |
-
│ │ MagenticBuilder Standard Manager │ │
|
| 29 |
-
│ │ (OpenAIChatClient orchestration) │ │
|
| 30 |
-
│ │ │ │
|
| 31 |
-
│ │ - Decides which agent to call │ │
|
| 32 |
-
│ │ - Parses agent responses │ │
|
| 33 |
-
│ │ - Loops until "final result" │ │
|
| 34 |
-
│ └─────────────────────────────────────────────────────────┘ │
|
| 35 |
-
└─────────────────────────────────────────────────────────────────┘
|
| 36 |
-
```
|
| 37 |
-
|
| 38 |
-
The issue is in the **Standard Manager** layer from `agent-framework-core`:
|
| 39 |
-
- It uses an LLM to decide which agent to call next
|
| 40 |
-
- The LLM response parsing is fragile
|
| 41 |
-
- The loop can stall or hang if parsing fails
|
| 42 |
-
|
| 43 |
-
### 2.2 Specific Issues
|
| 44 |
-
|
| 45 |
-
| Issue | Location | Impact |
|
| 46 |
-
|-------|----------|--------|
|
| 47 |
-
| OpenAI-only | `orchestrator_magentic.py:103` | Can't use Anthropic |
|
| 48 |
-
| Manager parsing | `agent-framework` library | Hangs on malformed responses |
|
| 49 |
-
| No timeout | `MagenticBuilder` | Workflow runs forever |
|
| 50 |
-
| Round limits insufficient | `max_round_count=10` | Still hangs within rounds |
|
| 51 |
-
|
| 52 |
-
### 2.3 Observed Behavior
|
| 53 |
-
|
| 54 |
-
```bash
|
| 55 |
-
# Test magentic mode
|
| 56 |
-
uv run python -c "
|
| 57 |
-
from src.orchestrator_factory import create_orchestrator
|
| 58 |
-
...
|
| 59 |
-
orch = create_orchestrator(mode='magentic')
|
| 60 |
-
async for event in orch.run('metformin alzheimer'):
|
| 61 |
-
print(event.type)
|
| 62 |
-
"
|
| 63 |
-
|
| 64 |
-
# Result: Hangs indefinitely after "started" event
|
| 65 |
-
# No search, no judge, no completion
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
## 3. Technical Deep Dive
|
| 69 |
-
|
| 70 |
-
### 3.1 The Manager's Role
|
| 71 |
-
|
| 72 |
-
The `MagenticBuilder.with_standard_manager()` creates an LLM-powered router:
|
| 73 |
-
|
| 74 |
-
```python
|
| 75 |
-
# From orchestrator_magentic.py lines 94-111
|
| 76 |
-
MagenticBuilder()
|
| 77 |
-
.participants(
|
| 78 |
-
searcher=search_agent,
|
| 79 |
-
hypothesizer=hypothesis_agent,
|
| 80 |
-
judge=judge_agent,
|
| 81 |
-
reporter=report_agent,
|
| 82 |
-
)
|
| 83 |
-
.with_standard_manager(
|
| 84 |
-
chat_client=OpenAIChatClient(
|
| 85 |
-
model_id=settings.openai_model,
|
| 86 |
-
api_key=settings.openai_api_key
|
| 87 |
-
),
|
| 88 |
-
max_round_count=self._max_rounds, # 10
|
| 89 |
-
max_stall_count=3,
|
| 90 |
-
max_reset_count=2,
|
| 91 |
-
)
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
The manager:
|
| 95 |
-
1. Receives the task
|
| 96 |
-
2. Calls OpenAI to decide: "Which agent should handle this?"
|
| 97 |
-
3. Parses response to extract agent name
|
| 98 |
-
4. Calls that agent
|
| 99 |
-
5. Receives result
|
| 100 |
-
6. Calls OpenAI again: "What next?"
|
| 101 |
-
7. Repeat until "final result"
|
| 102 |
-
|
| 103 |
-
### 3.2 Where It Breaks
|
| 104 |
-
|
| 105 |
-
The manager's LLM parsing expects specific response formats. If OpenAI returns:
|
| 106 |
-
- Unexpected JSON structure → parse error → stall
|
| 107 |
-
- Agent name with typo → agent not found → reset
|
| 108 |
-
- Verbose explanation → extraction fails → hang
|
| 109 |
-
|
| 110 |
-
### 3.3 The Event Processing
|
| 111 |
-
|
| 112 |
-
```python
|
| 113 |
-
# orchestrator_magentic.py lines 178-191
|
| 114 |
-
async for event in workflow.run_stream(task):
|
| 115 |
-
agent_event = self._process_event(event, iteration)
|
| 116 |
-
if agent_event:
|
| 117 |
-
# Events are processed but may never arrive
|
| 118 |
-
yield agent_event
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
If `workflow.run_stream()` never yields events (manager stuck), the UI sees nothing.
|
| 122 |
-
|
| 123 |
-
## 4. Why Simple Mode Works
|
| 124 |
-
|
| 125 |
-
Simple mode bypasses all of this:
|
| 126 |
-
|
| 127 |
-
```python
|
| 128 |
-
# orchestrator.py
|
| 129 |
-
while iteration < self.config.max_iterations:
|
| 130 |
-
# Direct calls - no LLM routing
|
| 131 |
-
search_results = await self.search.execute(query)
|
| 132 |
-
assessment = await self.judge.assess(query, evidence)
|
| 133 |
-
|
| 134 |
-
if assessment.sufficient:
|
| 135 |
-
return synthesis
|
| 136 |
-
else:
|
| 137 |
-
continue # Deterministic loop
|
| 138 |
-
```
|
| 139 |
-
|
| 140 |
-
No LLM-powered routing. No parsing. No hangs.
|
| 141 |
-
|
| 142 |
-
## 5. Fix Options
|
| 143 |
-
|
| 144 |
-
### Option A: Abandon Magentic (Recommended)
|
| 145 |
-
|
| 146 |
-
Simple mode + HFInferenceJudgeHandler provides:
|
| 147 |
-
- Free AI analysis
|
| 148 |
-
- Reliable execution
|
| 149 |
-
- No complex dependencies
|
| 150 |
-
|
| 151 |
-
Mark magentic as "experimental" or remove entirely.
|
| 152 |
-
|
| 153 |
-
### Option B: Fix the Manager (Hard)
|
| 154 |
-
|
| 155 |
-
1. Add timeout to `workflow.run_stream()`
|
| 156 |
-
2. Implement custom manager without LLM routing
|
| 157 |
-
3. Use deterministic agent selection
|
| 158 |
-
4. Add better error handling in event processing
|
| 159 |
-
|
| 160 |
-
### Option C: Replace agent-framework (Medium)
|
| 161 |
-
|
| 162 |
-
Use a different multi-agent framework:
|
| 163 |
-
- LangGraph
|
| 164 |
-
- AutoGen
|
| 165 |
-
- Custom implementation
|
| 166 |
-
|
| 167 |
-
## 6. Recommendation
|
| 168 |
-
|
| 169 |
-
**Do not use magentic mode for the hackathon.**
|
| 170 |
-
|
| 171 |
-
Simple mode with HFInferenceJudgeHandler:
|
| 172 |
-
- Works reliably
|
| 173 |
-
- Provides real AI analysis
|
| 174 |
-
- No extra dependencies
|
| 175 |
-
- No API routing issues
|
| 176 |
-
|
| 177 |
-
## 7. Files Involved
|
| 178 |
-
|
| 179 |
-
```
|
| 180 |
-
src/orchestrator_magentic.py ← Main orchestrator (broken)
|
| 181 |
-
src/agents/search_agent.py ← Works in isolation
|
| 182 |
-
src/agents/judge_agent.py ← Works in isolation
|
| 183 |
-
src/agents/hypothesis_agent.py ← Works in isolation
|
| 184 |
-
src/agents/report_agent.py ← Works in isolation
|
| 185 |
-
```
|
| 186 |
-
|
| 187 |
-
The agents themselves work. The **manager** coordination is broken.
|
| 188 |
-
|
| 189 |
-
## 8. Verification
|
| 190 |
-
|
| 191 |
-
To verify this bug still exists:
|
| 192 |
-
|
| 193 |
-
```bash
|
| 194 |
-
# This should hang
|
| 195 |
-
uv run python -c "
|
| 196 |
-
import asyncio
|
| 197 |
-
from src.app import configure_orchestrator
|
| 198 |
-
|
| 199 |
-
orch, name = configure_orchestrator(mode='magentic', use_mock=False)
|
| 200 |
-
print(f'Backend: {name}')
|
| 201 |
-
|
| 202 |
-
async def test():
|
| 203 |
-
async for event in orch.run('test query'):
|
| 204 |
-
print(event.type)
|
| 205 |
-
|
| 206 |
-
asyncio.run(test())
|
| 207 |
-
"
|
| 208 |
-
```
|
| 209 |
-
|
| 210 |
-
Expected: Hangs after "started"
|
| 211 |
-
Working: Would show search_complete, judge_complete, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|