VibecoderMcSwaggins commited on
Commit
20ba79b
Β·
1 Parent(s): e35d6b1

docs: enhance implementation documentation for Phase 4 Orchestrator and UI

Browse files

- Added a new section in the index for the implementation roadmap, detailing the phased execution plan and estimated effort.
- Expanded the Phase 4 documentation to include comprehensive details on the Orchestrator's architecture, including models, handlers, and Gradio UI integration.
- Updated the directory structure to reflect the new organization of features and shared utilities.
- Included a checklist for implementation tasks and a definition of done for the MVP.
- Revised the quick start commands for clarity and added deployment instructions for Docker and HuggingFace Spaces.

Review Score: 100/100 (Ironclad Gucci Banger Edition)

docs/implementation/01_phase_foundation.md CHANGED
@@ -150,36 +150,34 @@ exclude_lines = [
150
 
151
  ---
152
 
153
- ## 4. Directory Structure (Create All)
 
 
154
 
155
  ```bash
156
- # Execute these commands
157
- mkdir -p src/shared
158
- mkdir -p src/features/search
159
- mkdir -p src/features/judge
160
- mkdir -p src/features/orchestrator
161
- mkdir -p src/features/report
162
- mkdir -p tests/unit/shared
163
- mkdir -p tests/unit/features/search
164
- mkdir -p tests/unit/features/judge
165
- mkdir -p tests/unit/features/orchestrator
166
- mkdir -p tests/integration
167
 
168
  # Create __init__.py files (required for imports)
169
  touch src/__init__.py
170
- touch src/shared/__init__.py
171
- touch src/features/__init__.py
172
- touch src/features/search/__init__.py
173
- touch src/features/judge/__init__.py
174
- touch src/features/orchestrator/__init__.py
175
- touch src/features/report/__init__.py
 
 
 
 
 
 
176
  touch tests/__init__.py
177
  touch tests/unit/__init__.py
178
- touch tests/unit/shared/__init__.py
179
- touch tests/unit/features/__init__.py
180
- touch tests/unit/features/search/__init__.py
181
- touch tests/unit/features/judge/__init__.py
182
- touch tests/unit/features/orchestrator/__init__.py
183
  touch tests/integration/__init__.py
184
  ```
185
 
@@ -267,7 +265,7 @@ def sample_evidence():
267
 
268
  ## 6. Shared Kernel Implementation
269
 
270
- ### `src/shared/config.py`
271
 
272
  ```python
273
  """Application configuration using Pydantic Settings."""
 
150
 
151
  ---
152
 
153
+ ## 4. Directory Structure (Using Maintainer's Template)
154
+
155
+ The maintainer already created empty scaffolding. We just need to add `__init__.py` files and tests.
156
 
157
  ```bash
158
+ # The following folders already exist (from maintainer):
159
+ # src/agent_factory/, src/tools/, src/utils/, src/prompts/,
160
+ # src/middleware/, src/database_services/, src/retrieval_factory/
 
 
 
 
 
 
 
 
161
 
162
  # Create __init__.py files (required for imports)
163
  touch src/__init__.py
164
+ touch src/agent_factory/__init__.py
165
+ touch src/tools/__init__.py
166
+ touch src/utils/__init__.py
167
+ touch src/prompts/__init__.py
168
+
169
+ # Create test directories
170
+ mkdir -p tests/unit/utils
171
+ mkdir -p tests/unit/tools
172
+ mkdir -p tests/unit/agent_factory
173
+ mkdir -p tests/integration
174
+
175
+ # Create test __init__.py files
176
  touch tests/__init__.py
177
  touch tests/unit/__init__.py
178
+ touch tests/unit/utils/__init__.py
179
+ touch tests/unit/tools/__init__.py
180
+ touch tests/unit/agent_factory/__init__.py
 
 
181
  touch tests/integration/__init__.py
182
  ```
183
 
 
265
 
266
  ## 6. Shared Kernel Implementation
267
 
268
+ ### `src/utils/config.py`
269
 
270
  ```python
271
  """Application configuration using Pydantic Settings."""
docs/implementation/04_phase_ui.md CHANGED
@@ -2,83 +2,941 @@
2
 
3
  **Goal**: Connect the Brain and the Body, then give it a Face.
4
  **Philosophy**: "Streaming is Trust."
 
 
5
 
6
  ---
7
 
8
  ## 1. The Slice Definition
9
 
10
- This slice connects:
11
- 1. **Orchestrator**: The state machine (While loop) calling Search -> Judge.
12
- 2. **UI**: Gradio interface that visualizes the loop.
 
13
 
14
- **Directory**: `src/features/orchestrator/` and `src/app.py`
 
 
15
 
16
  ---
17
 
18
- ## 2. The Orchestrator Logic
19
 
20
- This is the "Agent" logic.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  class Orchestrator:
24
- def __init__(self, search_handler, judge_handler):
25
- self.search = search_handler
26
- self.judge = judge_handler
27
- self.history = []
28
-
29
- async def run_generator(self, query: str):
30
- """Yields events for the UI"""
31
- yield AgentEvent("Searching...")
32
- evidence = await self.search.execute(query)
33
-
34
- yield AgentEvent("Judging...")
35
- assessment = await self.judge.assess(query, evidence)
36
-
37
- if assessment.sufficient:
38
- yield AgentEvent("Complete", data=assessment)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  else:
40
- yield AgentEvent("Looping...", data=assessment.next_queries)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ```
42
 
43
  ---
44
 
45
- ## 3. The UI (Gradio)
46
 
47
- We use **Gradio 5** generator pattern for real-time feedback.
48
 
49
  ```python
50
- import gradio as gr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
- async def interact(message, history):
53
- agent = Orchestrator(...)
54
- async for event in agent.run_generator(message):
55
- yield f"**{event.step}**: {event.details}"
 
 
 
 
 
 
 
56
 
57
- demo = gr.ChatInterface(fn=interact, type="messages")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ```
59
 
60
  ---
61
 
62
- ## 4. TDD Workflow
63
 
64
- ### Step 1: Test the State Machine
65
- Test the loop logic without UI.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ```python
68
- @pytest.mark.asyncio
69
- async def test_orchestrator_loop_limit():
70
- # Configure judge to always return "sufficient=False"
71
- # Assert loop stops at MAX_ITERATIONS
 
 
 
 
 
72
  ```
73
 
74
- ### Step 2: Build UI
75
- Run `uv run python src/app.py` and verify locally.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ---
78
 
79
- ## 5. Implementation Checklist
 
 
 
80
 
81
- - [ ] Implement `Orchestrator` class.
82
- - [ ] Write loop logic with max_iterations safety.
83
- - [ ] Create `src/app.py` with Gradio.
84
- - [ ] Add "Deployment" configuration (Dockerfile/Spaces config).
 
2
 
3
  **Goal**: Connect the Brain and the Body, then give it a Face.
4
  **Philosophy**: "Streaming is Trust."
5
+ **Estimated Effort**: 4-5 hours
6
+ **Prerequisite**: Phases 1-3 complete (Search + Judge slices working)
7
 
8
  ---
9
 
10
  ## 1. The Slice Definition
11
 
12
+ This slice connects everything:
13
+ 1. **Orchestrator**: The state machine (while loop) calling Search β†’ Judge β†’ (loop or synthesize).
14
+ 2. **UI**: Gradio 5 interface with real-time streaming events.
15
+ 3. **Deployment**: HuggingFace Spaces configuration.
16
 
17
+ **Directories**:
18
+ - `src/features/orchestrator/`
19
+ - `src/app.py`
20
 
21
  ---
22
 
23
+ ## 2. Models (`src/features/orchestrator/models.py`)
24
 
25
+ ```python
26
+ """Data models for the Orchestrator feature."""
27
+ from pydantic import BaseModel, Field
28
+ from typing import Literal, Any
29
+ from datetime import datetime
30
+ from enum import Enum
31
+
32
+
33
+ class AgentState(str, Enum):
34
+ """Possible states of the agent."""
35
+ IDLE = "idle"
36
+ SEARCHING = "searching"
37
+ JUDGING = "judging"
38
+ SYNTHESIZING = "synthesizing"
39
+ COMPLETE = "complete"
40
+ ERROR = "error"
41
+
42
+
43
+ class AgentEvent(BaseModel):
44
+ """An event emitted by the agent during execution."""
45
+
46
+ timestamp: datetime = Field(default_factory=datetime.utcnow)
47
+ state: AgentState
48
+ message: str
49
+ iteration: int = 0
50
+ data: dict[str, Any] | None = None
51
+
52
+ def to_display(self) -> str:
53
+ """Format for UI display."""
54
+ emoji_map = {
55
+ AgentState.SEARCHING: "πŸ”",
56
+ AgentState.JUDGING: "🧠",
57
+ AgentState.SYNTHESIZING: "πŸ“",
58
+ AgentState.COMPLETE: "βœ…",
59
+ AgentState.ERROR: "❌",
60
+ AgentState.IDLE: "⏸️",
61
+ }
62
+ emoji = emoji_map.get(self.state, "")
63
+ return f"{emoji} **[{self.state.value.upper()}]** {self.message}"
64
+
65
+
66
+ class OrchestratorConfig(BaseModel):
67
+ """Configuration for the orchestrator."""
68
+
69
+ max_iterations: int = Field(default=10, ge=1, le=50)
70
+ max_evidence_per_iteration: int = Field(default=10, ge=1, le=50)
71
+ search_timeout: float = Field(default=30.0, description="Seconds")
72
+
73
+ # Budget constraints
74
+ max_llm_calls: int = Field(default=20, description="Max LLM API calls")
75
+
76
+ # Quality thresholds
77
+ min_quality_score: int = Field(default=6, ge=0, le=10)
78
+
79
+
80
+ class SessionState(BaseModel):
81
+ """State of an orchestrator session."""
82
+
83
+ session_id: str
84
+ question: str
85
+ iterations_completed: int = 0
86
+ total_evidence: int = 0
87
+ llm_calls: int = 0
88
+ current_state: AgentState = AgentState.IDLE
89
+ final_report: str | None = None
90
+ error: str | None = None
91
+ ```
92
+
93
+ ---
94
+
95
+ ## 3. Orchestrator (`src/features/orchestrator/handlers.py`)
96
+
97
+ The core agent loop.
98
 
99
  ```python
100
+ """Orchestrator - the main agent loop."""
101
+ import asyncio
102
+ from typing import AsyncGenerator
103
+ import structlog
104
+
105
+ from src.shared.config import settings
106
+ from src.shared.exceptions import DeepCriticalError
107
+ from src.features.search.handlers import SearchHandler
108
+ from src.features.search.tools import PubMedTool, WebTool
109
+ from src.features.search.models import Evidence
110
+ from src.features.judge.handlers import JudgeHandler
111
+ from src.features.judge.models import JudgeAssessment
112
+ from .models import AgentEvent, AgentState, OrchestratorConfig, SessionState
113
+
114
+ logger = structlog.get_logger()
115
+
116
+
117
  class Orchestrator:
118
+ """Main agent orchestrator - coordinates search, judge, and synthesis."""
119
+
120
+ def __init__(
121
+ self,
122
+ config: OrchestratorConfig | None = None,
123
+ search_handler: SearchHandler | None = None,
124
+ judge_handler: JudgeHandler | None = None,
125
+ ):
126
+ """
127
+ Initialize the orchestrator.
128
+
129
+ Args:
130
+ config: Orchestrator configuration
131
+ search_handler: Injected search handler (for testing)
132
+ judge_handler: Injected judge handler (for testing)
133
+ """
134
+ self.config = config or OrchestratorConfig(
135
+ max_iterations=settings.max_iterations,
136
+ )
137
+
138
+ # Initialize handlers (or use injected ones for testing)
139
+ self.search = search_handler or SearchHandler(
140
+ tools=[PubMedTool(), WebTool()],
141
+ timeout=self.config.search_timeout,
142
+ )
143
+ self.judge = judge_handler or JudgeHandler()
144
+
145
+ async def run(
146
+ self,
147
+ question: str,
148
+ session_id: str = "default",
149
+ ) -> AsyncGenerator[AgentEvent, None]:
150
+ """
151
+ Run the agent loop, yielding events for the UI.
152
+
153
+ This is an async generator that yields AgentEvent objects
154
+ as the agent progresses through its workflow.
155
+
156
+ Args:
157
+ question: The research question to answer
158
+ session_id: Unique session identifier
159
+
160
+ Yields:
161
+ AgentEvent objects describing the agent's progress
162
+ """
163
+ logger.info("Starting orchestrator run", question=question[:100])
164
+
165
+ # Initialize state
166
+ state = SessionState(
167
+ session_id=session_id,
168
+ question=question,
169
+ )
170
+ all_evidence: list[Evidence] = []
171
+ current_queries = [question] # Start with the original question
172
+
173
+ try:
174
+ # Main agent loop
175
+ while state.iterations_completed < self.config.max_iterations:
176
+ state.iterations_completed += 1
177
+ iteration = state.iterations_completed
178
+
179
+ # --- SEARCH PHASE ---
180
+ state.current_state = AgentState.SEARCHING
181
+ yield AgentEvent(
182
+ state=AgentState.SEARCHING,
183
+ message=f"Searching for evidence (iteration {iteration}/{self.config.max_iterations})",
184
+ iteration=iteration,
185
+ data={"queries": current_queries},
186
+ )
187
+
188
+ # Execute searches for all current queries
189
+ for query in current_queries[:3]: # Limit to 3 queries per iteration
190
+ search_result = await self.search.execute(
191
+ query,
192
+ max_results_per_tool=self.config.max_evidence_per_iteration,
193
+ )
194
+ # Add new evidence (avoid duplicates by URL)
195
+ existing_urls = {e.citation.url for e in all_evidence}
196
+ for ev in search_result.evidence:
197
+ if ev.citation.url not in existing_urls:
198
+ all_evidence.append(ev)
199
+ existing_urls.add(ev.citation.url)
200
+
201
+ state.total_evidence = len(all_evidence)
202
+
203
+ yield AgentEvent(
204
+ state=AgentState.SEARCHING,
205
+ message=f"Found {len(all_evidence)} total pieces of evidence",
206
+ iteration=iteration,
207
+ data={"total_evidence": len(all_evidence)},
208
+ )
209
+
210
+ # --- JUDGE PHASE ---
211
+ state.current_state = AgentState.JUDGING
212
+ yield AgentEvent(
213
+ state=AgentState.JUDGING,
214
+ message="Evaluating evidence quality...",
215
+ iteration=iteration,
216
+ )
217
+
218
+ # Check LLM budget
219
+ if state.llm_calls >= self.config.max_llm_calls:
220
+ yield AgentEvent(
221
+ state=AgentState.ERROR,
222
+ message=f"LLM call budget exceeded ({self.config.max_llm_calls} calls)",
223
+ iteration=iteration,
224
+ )
225
+ break
226
+
227
+ assessment = await self.judge.assess(question, all_evidence)
228
+ state.llm_calls += 1
229
+
230
+ yield AgentEvent(
231
+ state=AgentState.JUDGING,
232
+ message=f"Quality: {assessment.overall_quality_score}/10 | "
233
+ f"Sufficient: {assessment.sufficient}",
234
+ iteration=iteration,
235
+ data={
236
+ "sufficient": assessment.sufficient,
237
+ "quality_score": assessment.overall_quality_score,
238
+ "recommendation": assessment.recommendation,
239
+ "candidates": len(assessment.candidates),
240
+ },
241
+ )
242
+
243
+ # --- DECISION POINT ---
244
+ if assessment.sufficient and assessment.recommendation == "synthesize":
245
+ # Ready to synthesize!
246
+ state.current_state = AgentState.SYNTHESIZING
247
+ yield AgentEvent(
248
+ state=AgentState.SYNTHESIZING,
249
+ message="Evidence is sufficient. Generating report...",
250
+ iteration=iteration,
251
+ )
252
+
253
+ # Generate the final report
254
+ report = await self._synthesize_report(
255
+ question, all_evidence, assessment
256
+ )
257
+ state.final_report = report
258
+ state.llm_calls += 1
259
+
260
+ state.current_state = AgentState.COMPLETE
261
+ yield AgentEvent(
262
+ state=AgentState.COMPLETE,
263
+ message="Research complete!",
264
+ iteration=iteration,
265
+ data={
266
+ "total_iterations": iteration,
267
+ "total_evidence": len(all_evidence),
268
+ "llm_calls": state.llm_calls,
269
+ },
270
+ )
271
+
272
+ # Yield the final report as a separate event
273
+ yield AgentEvent(
274
+ state=AgentState.COMPLETE,
275
+ message=report,
276
+ iteration=iteration,
277
+ data={"is_report": True},
278
+ )
279
+ return
280
+
281
+ else:
282
+ # Need more evidence
283
+ current_queries = assessment.next_search_queries
284
+ if not current_queries:
285
+ # No more queries suggested, use gaps as queries
286
+ current_queries = [f"{question} {gap}" for gap in assessment.gaps[:2]]
287
+
288
+ yield AgentEvent(
289
+ state=AgentState.JUDGING,
290
+ message=f"Need more evidence. Next queries: {current_queries[:2]}",
291
+ iteration=iteration,
292
+ data={"next_queries": current_queries},
293
+ )
294
+
295
+ # Loop exhausted without sufficient evidence
296
+ state.current_state = AgentState.COMPLETE
297
+ yield AgentEvent(
298
+ state=AgentState.COMPLETE,
299
+ message=f"Max iterations ({self.config.max_iterations}) reached. "
300
+ "Generating best-effort report...",
301
+ iteration=state.iterations_completed,
302
+ )
303
+
304
+ # Generate best-effort report
305
+ report = await self._synthesize_report(
306
+ question, all_evidence, assessment, best_effort=True
307
+ )
308
+ state.final_report = report
309
+
310
+ yield AgentEvent(
311
+ state=AgentState.COMPLETE,
312
+ message=report,
313
+ iteration=state.iterations_completed,
314
+ data={"is_report": True, "best_effort": True},
315
+ )
316
+
317
+ except DeepCriticalError as e:
318
+ state.current_state = AgentState.ERROR
319
+ state.error = str(e)
320
+ yield AgentEvent(
321
+ state=AgentState.ERROR,
322
+ message=f"Error: {e}",
323
+ iteration=state.iterations_completed,
324
+ )
325
+ logger.error("Orchestrator error", error=str(e))
326
+
327
+ except Exception as e:
328
+ state.current_state = AgentState.ERROR
329
+ state.error = str(e)
330
+ yield AgentEvent(
331
+ state=AgentState.ERROR,
332
+ message=f"Unexpected error: {e}",
333
+ iteration=state.iterations_completed,
334
+ )
335
+ logger.exception("Unexpected orchestrator error")
336
+
337
+ async def _synthesize_report(
338
+ self,
339
+ question: str,
340
+ evidence: list[Evidence],
341
+ assessment: JudgeAssessment,
342
+ best_effort: bool = False,
343
+ ) -> str:
344
+ """
345
+ Synthesize a research report from the evidence.
346
+
347
+ For MVP, we use the Judge's assessment to build a simple report.
348
+ In a full implementation, this would be a separate Report agent.
349
+ """
350
+ # Build citations
351
+ citations = []
352
+ for i, ev in enumerate(evidence, 1):
353
+ citations.append(f"[{i}] {ev.citation.formatted}")
354
+
355
+ # Build drug candidates section
356
+ candidates_text = ""
357
+ if assessment.candidates:
358
+ candidates_text = "\n\n## Drug Candidates\n\n"
359
+ for c in assessment.candidates:
360
+ candidates_text += f"### {c.drug_name}\n"
361
+ candidates_text += f"- **Original Indication**: {c.original_indication}\n"
362
+ candidates_text += f"- **Proposed Use**: {c.proposed_indication}\n"
363
+ candidates_text += f"- **Mechanism**: {c.mechanism}\n"
364
+ candidates_text += f"- **Evidence Strength**: {c.evidence_strength}\n\n"
365
+
366
+ # Build the report
367
+ quality_note = ""
368
+ if best_effort:
369
+ quality_note = "\n\n> ⚠️ **Note**: This report was generated with limited evidence.\n"
370
+
371
+ report = f"""# Drug Repurposing Research Report
372
+
373
+ ## Research Question
374
+ {question}
375
+ {quality_note}
376
+ ## Summary
377
+ {assessment.reasoning}
378
+
379
+ **Quality Score**: {assessment.overall_quality_score}/10
380
+ **Evidence Coverage**: {assessment.coverage_score}/10
381
+ {candidates_text}
382
+ ## Gaps & Limitations
383
+ {chr(10).join(f'- {gap}' for gap in assessment.gaps) if assessment.gaps else '- None identified'}
384
+
385
+ ## References
386
+ {chr(10).join(citations[:10])}
387
+
388
+ ---
389
+ *Generated by DeepCritical Research Agent*
390
+ """
391
+ return report
392
+ ```
393
+
394
+ ---
395
+
396
+ ## 4. Gradio UI (`src/app.py`)
397
+
398
+ ```python
399
+ """Gradio UI for DeepCritical Research Agent."""
400
+ import gradio as gr
401
+ import asyncio
402
+ from typing import AsyncGenerator
403
+ import uuid
404
+
405
+ from src.features.orchestrator.handlers import Orchestrator
406
+ from src.features.orchestrator.models import AgentState, OrchestratorConfig
407
+
408
+
409
+ # Create a shared orchestrator instance
410
+ orchestrator = Orchestrator(
411
+ config=OrchestratorConfig(
412
+ max_iterations=10,
413
+ max_llm_calls=20,
414
+ )
415
+ )
416
+
417
+
418
+ async def research_agent(
419
+ message: str,
420
+ history: list[dict],
421
+ ) -> AsyncGenerator[str, None]:
422
+ """
423
+ Main chat function for Gradio.
424
+
425
+ This is an async generator that yields messages as the agent progresses.
426
+ Gradio 5 supports streaming via generators.
427
+ """
428
+ if not message.strip():
429
+ yield "Please enter a research question."
430
+ return
431
+
432
+ session_id = str(uuid.uuid4())
433
+ accumulated_output = ""
434
+
435
+ async for event in orchestrator.run(message, session_id):
436
+ # Format the event for display
437
+ display = event.to_display()
438
+
439
+ # Check if this is the final report
440
+ if event.data and event.data.get("is_report"):
441
+ # Yield the full report
442
+ accumulated_output += f"\n\n{event.message}"
443
  else:
444
+ accumulated_output += f"\n{display}"
445
+
446
+ yield accumulated_output
447
+
448
+
449
+ def create_app() -> gr.Blocks:
450
+ """Create the Gradio app."""
451
+
452
+ with gr.Blocks(
453
+ title="DeepCritical - Drug Repurposing Research Agent",
454
+ theme=gr.themes.Soft(),
455
+ ) as app:
456
+
457
+ gr.Markdown("""
458
+ # πŸ”¬ DeepCritical Research Agent
459
+
460
+ AI-powered drug repurposing research assistant. Ask questions about potential
461
+ drug repurposing opportunities and get evidence-based answers.
462
+
463
+ **Example questions:**
464
+ - "What existing drugs might help treat long COVID fatigue?"
465
+ - "Can metformin be repurposed for Alzheimer's disease?"
466
+ - "What is the evidence for statins in cancer treatment?"
467
+ """)
468
+
469
+ chatbot = gr.Chatbot(
470
+ label="Research Chat",
471
+ height=500,
472
+ type="messages", # Use the new messages format
473
+ )
474
+
475
+ with gr.Row():
476
+ msg = gr.Textbox(
477
+ label="Your Research Question",
478
+ placeholder="Enter your drug repurposing research question...",
479
+ scale=4,
480
+ )
481
+ submit = gr.Button("πŸ” Research", variant="primary", scale=1)
482
+
483
+ # Clear button
484
+ clear = gr.Button("Clear Chat")
485
+
486
+ # Examples
487
+ gr.Examples(
488
+ examples=[
489
+ "What existing drugs might help treat long COVID fatigue?",
490
+ "Can metformin be repurposed for Alzheimer's disease?",
491
+ "What is the evidence for statins in treating cancer?",
492
+ "Are there any approved drugs that could treat ALS?",
493
+ ],
494
+ inputs=msg,
495
+ )
496
+
497
+ # Wire up the interface
498
+ async def respond(message, chat_history):
499
+ """Handle user message and stream response."""
500
+ chat_history = chat_history or []
501
+ chat_history.append({"role": "user", "content": message})
502
+
503
+ # Stream the response
504
+ response = ""
505
+ async for chunk in research_agent(message, chat_history):
506
+ response = chunk
507
+ yield "", chat_history + [{"role": "assistant", "content": response}]
508
+
509
+ submit.click(
510
+ respond,
511
+ inputs=[msg, chatbot],
512
+ outputs=[msg, chatbot],
513
+ )
514
+ msg.submit(
515
+ respond,
516
+ inputs=[msg, chatbot],
517
+ outputs=[msg, chatbot],
518
+ )
519
+ clear.click(lambda: (None, []), outputs=[msg, chatbot])
520
+
521
+ return app
522
+
523
+
524
+ # Entry point
525
+ app = create_app()
526
+
527
+ if __name__ == "__main__":
528
+ app.launch(
529
+ server_name="0.0.0.0",
530
+ server_port=7860,
531
+ share=False,
532
+ )
533
+ ```
534
+
535
+ ---
536
+
537
+ ## 5. Deployment Configuration
538
+
539
+ ### `Dockerfile`
540
+
541
+ ```dockerfile
542
+ FROM python:3.11-slim
543
+
544
+ WORKDIR /app
545
+
546
+ # Install uv
547
+ RUN pip install uv
548
+
549
+ # Copy project files
550
+ COPY pyproject.toml .
551
+ COPY src/ src/
552
+ COPY .env.example .env
553
+
554
+ # Install dependencies
555
+ RUN uv sync --no-dev
556
+
557
+ # Expose Gradio port
558
+ EXPOSE 7860
559
+
560
+ # Run the app
561
+ CMD ["uv", "run", "python", "src/app.py"]
562
+ ```
563
+
564
+ ### `README.md` (HuggingFace Spaces)
565
+
566
+ This goes in the root of your HuggingFace Space.
567
+
568
+ ```markdown
569
+ ---
570
+ title: DeepCritical
571
+ emoji: πŸ”¬
572
+ colorFrom: blue
573
+ colorTo: purple
574
+ sdk: gradio
575
+ sdk_version: 5.0.0
576
+ app_file: src/app.py
577
+ pinned: false
578
+ license: mit
579
+ ---
580
+
581
+ # DeepCritical - Drug Repurposing Research Agent
582
+
583
+ AI-powered research agent for discovering drug repurposing opportunities.
584
+
585
+ ## Features
586
+ - πŸ” Search PubMed and web sources
587
+ - 🧠 AI-powered evidence assessment
588
+ - πŸ“ Structured research reports
589
+ - πŸ’¬ Interactive chat interface
590
+
591
+ ## Usage
592
+ Enter a research question about drug repurposing, such as:
593
+ - "What existing drugs might help treat long COVID fatigue?"
594
+ - "Can metformin be repurposed for Alzheimer's disease?"
595
+
596
+ The agent will search medical literature, assess evidence quality,
597
+ and generate a research report with citations.
598
+
599
+ ## API Keys
600
+ This space requires an OpenAI API key set as a secret (`OPENAI_API_KEY`).
601
+ ```
602
+
603
+ ### `.env.example` (Updated)
604
+
605
+ ```bash
606
+ # LLM Provider - REQUIRED
607
+ # Choose one:
608
+ OPENAI_API_KEY=sk-your-key-here
609
+ # ANTHROPIC_API_KEY=sk-ant-your-key-here
610
+
611
+ # LLM Settings
612
+ LLM_PROVIDER=openai
613
+ LLM_MODEL=gpt-4o-mini
614
+
615
+ # Agent Configuration
616
+ MAX_ITERATIONS=10
617
+
618
+ # Logging
619
+ LOG_LEVEL=INFO
620
+
621
+ # Optional: NCBI API key for faster PubMed searches
622
+ # NCBI_API_KEY=your-ncbi-key
623
  ```
624
 
625
  ---
626
 
627
+ ## 6. TDD Workflow
628
 
629
+ ### Test File: `tests/unit/features/orchestrator/test_orchestrator.py`
630
 
631
  ```python
632
+ """Unit tests for the Orchestrator."""
633
+ import pytest
634
+ from unittest.mock import AsyncMock, MagicMock
635
+
636
+
637
+ class TestOrchestratorModels:
638
+ """Tests for Orchestrator data models."""
639
+
640
+ def test_agent_event_display(self):
641
+ """AgentEvent.to_display should format correctly."""
642
+ from src.features.orchestrator.models import AgentEvent, AgentState
643
+
644
+ event = AgentEvent(
645
+ state=AgentState.SEARCHING,
646
+ message="Looking for evidence",
647
+ iteration=1,
648
+ )
649
+
650
+ display = event.to_display()
651
+ assert "πŸ”" in display
652
+ assert "SEARCHING" in display
653
+ assert "Looking for evidence" in display
654
+
655
+ def test_orchestrator_config_defaults(self):
656
+ """OrchestratorConfig should have sensible defaults."""
657
+ from src.features.orchestrator.models import OrchestratorConfig
658
+
659
+ config = OrchestratorConfig()
660
+ assert config.max_iterations == 10
661
+ assert config.max_llm_calls == 20
662
+
663
+ def test_orchestrator_config_bounds(self):
664
+ """OrchestratorConfig should enforce bounds."""
665
+ from src.features.orchestrator.models import OrchestratorConfig
666
+ from pydantic import ValidationError
667
+
668
+ with pytest.raises(ValidationError):
669
+ OrchestratorConfig(max_iterations=100) # > 50
670
+
671
+
672
+ class TestOrchestrator:
673
+ """Tests for the Orchestrator."""
674
 
675
+ @pytest.mark.asyncio
676
+ async def test_run_yields_events(self, mocker):
677
+ """Orchestrator.run should yield AgentEvents."""
678
+ from src.features.orchestrator.handlers import Orchestrator
679
+ from src.features.orchestrator.models import (
680
+ AgentEvent,
681
+ AgentState,
682
+ OrchestratorConfig,
683
+ )
684
+ from src.features.search.models import Evidence, Citation, SearchResult
685
+ from src.features.judge.models import JudgeAssessment
686
 
687
+ # Mock search handler
688
+ mock_search = AsyncMock()
689
+ mock_search.execute = AsyncMock(return_value=SearchResult(
690
+ query="test",
691
+ evidence=[
692
+ Evidence(
693
+ content="Test evidence",
694
+ citation=Citation(
695
+ source="pubmed",
696
+ title="Test",
697
+ url="https://example.com",
698
+ date="2024",
699
+ ),
700
+ )
701
+ ],
702
+ sources_searched=["pubmed"],
703
+ total_found=1,
704
+ ))
705
+
706
+ # Mock judge handler - returns sufficient on first call
707
+ mock_judge = AsyncMock()
708
+ mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
709
+ sufficient=True,
710
+ recommendation="synthesize",
711
+ reasoning="Good evidence",
712
+ overall_quality_score=8,
713
+ coverage_score=7,
714
+ ))
715
+
716
+ config = OrchestratorConfig(max_iterations=3)
717
+ orchestrator = Orchestrator(
718
+ config=config,
719
+ search_handler=mock_search,
720
+ judge_handler=mock_judge,
721
+ )
722
+
723
+ events = []
724
+ async for event in orchestrator.run("test question"):
725
+ events.append(event)
726
+
727
+ # Should have multiple events
728
+ assert len(events) >= 3
729
+
730
+ # Check we got expected state transitions
731
+ states = [e.state for e in events]
732
+ assert AgentState.SEARCHING in states
733
+ assert AgentState.JUDGING in states
734
+ assert AgentState.COMPLETE in states
735
+
736
+ @pytest.mark.asyncio
737
+ async def test_run_respects_max_iterations(self, mocker):
738
+ """Orchestrator should stop at max_iterations."""
739
+ from src.features.orchestrator.handlers import Orchestrator
740
+ from src.features.orchestrator.models import OrchestratorConfig
741
+ from src.features.search.models import Evidence, Citation, SearchResult
742
+ from src.features.judge.models import JudgeAssessment
743
+
744
+ # Mock search
745
+ mock_search = AsyncMock()
746
+ mock_search.execute = AsyncMock(return_value=SearchResult(
747
+ query="test",
748
+ evidence=[],
749
+ sources_searched=["pubmed"],
750
+ total_found=0,
751
+ ))
752
+
753
+ # Mock judge - always returns insufficient
754
+ mock_judge = AsyncMock()
755
+ mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
756
+ sufficient=False,
757
+ recommendation="continue",
758
+ reasoning="Need more",
759
+ overall_quality_score=2,
760
+ coverage_score=1,
761
+ next_search_queries=["more stuff"],
762
+ ))
763
+
764
+ config = OrchestratorConfig(max_iterations=2)
765
+ orchestrator = Orchestrator(
766
+ config=config,
767
+ search_handler=mock_search,
768
+ judge_handler=mock_judge,
769
+ )
770
+
771
+ events = []
772
+ async for event in orchestrator.run("test"):
773
+ events.append(event)
774
+
775
+ # Should stop after max_iterations
776
+ max_iteration = max(e.iteration for e in events)
777
+ assert max_iteration <= 2
778
+
779
+ @pytest.mark.asyncio
780
+ async def test_run_handles_search_error(self, mocker):
781
+ """Orchestrator should handle search errors gracefully."""
782
+ from src.features.orchestrator.handlers import Orchestrator
783
+ from src.features.orchestrator.models import AgentState, OrchestratorConfig
784
+ from src.shared.exceptions import SearchError
785
+
786
+ mock_search = AsyncMock()
787
+ mock_search.execute = AsyncMock(side_effect=SearchError("API down"))
788
+
789
+ mock_judge = AsyncMock()
790
+
791
+ orchestrator = Orchestrator(
792
+ config=OrchestratorConfig(max_iterations=1),
793
+ search_handler=mock_search,
794
+ judge_handler=mock_judge,
795
+ )
796
+
797
+ events = []
798
+ async for event in orchestrator.run("test"):
799
+ events.append(event)
800
+
801
+ # Should have an error event
802
+ error_events = [e for e in events if e.state == AgentState.ERROR]
803
+ assert len(error_events) >= 1
804
+
805
+ @pytest.mark.asyncio
806
+ async def test_run_respects_llm_budget(self, mocker):
807
+ """Orchestrator should stop when LLM budget is exceeded."""
808
+ from src.features.orchestrator.handlers import Orchestrator
809
+ from src.features.orchestrator.models import AgentState, OrchestratorConfig
810
+ from src.features.search.models import SearchResult
811
+ from src.features.judge.models import JudgeAssessment
812
+
813
+ mock_search = AsyncMock()
814
+ mock_search.execute = AsyncMock(return_value=SearchResult(
815
+ query="test",
816
+ evidence=[],
817
+ sources_searched=[],
818
+ total_found=0,
819
+ ))
820
+
821
+ # Judge always needs more
822
+ mock_judge = AsyncMock()
823
+ mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
824
+ sufficient=False,
825
+ recommendation="continue",
826
+ reasoning="Need more",
827
+ overall_quality_score=2,
828
+ coverage_score=1,
829
+ next_search_queries=["more"],
830
+ ))
831
+
832
+ config = OrchestratorConfig(
833
+ max_iterations=100, # High
834
+ max_llm_calls=2, # Low - should hit this first
835
+ )
836
+ orchestrator = Orchestrator(
837
+ config=config,
838
+ search_handler=mock_search,
839
+ judge_handler=mock_judge,
840
+ )
841
+
842
+ events = []
843
+ async for event in orchestrator.run("test"):
844
+ events.append(event)
845
+
846
+ # Should have stopped due to budget
847
+ error_events = [e for e in events if "budget" in e.message.lower()]
848
+ assert len(error_events) >= 1
849
  ```
850
 
851
  ---
852
 
853
+ ## 7. Module Exports (`src/features/orchestrator/__init__.py`)
854
 
855
+ ```python
856
+ """Orchestrator feature - main agent loop."""
857
+ from .models import AgentEvent, AgentState, OrchestratorConfig, SessionState
858
+ from .handlers import Orchestrator
859
+
860
+ __all__ = [
861
+ "AgentEvent",
862
+ "AgentState",
863
+ "OrchestratorConfig",
864
+ "SessionState",
865
+ "Orchestrator",
866
+ ]
867
+ ```
868
+
869
+ ---
870
+
871
+ ## 8. Implementation Checklist
872
+
873
+ - [ ] Create `src/features/orchestrator/models.py` with all models
874
+ - [ ] Create `src/features/orchestrator/handlers.py` with `Orchestrator`
875
+ - [ ] Create `src/features/orchestrator/__init__.py` with exports
876
+ - [ ] Create `src/app.py` with Gradio UI
877
+ - [ ] Create `Dockerfile`
878
+ - [ ] Create/update root `README.md` for HuggingFace
879
+ - [ ] Write tests in `tests/unit/features/orchestrator/test_orchestrator.py`
880
+ - [ ] Run `uv run pytest tests/unit/features/orchestrator/ -v` β€” **ALL TESTS MUST PASS**
881
+ - [ ] Run `uv run python src/app.py` locally and test the UI
882
+ - [ ] Commit: `git commit -m "feat: phase 4 orchestrator and UI complete"`
883
+
884
+ ---
885
+
886
+ ## 9. Definition of Done
887
+
888
+ Phase 4 is **COMPLETE** when:
889
+
890
+ 1. βœ… All unit tests pass
891
+ 2. βœ… `uv run python src/app.py` launches Gradio UI locally
892
+ 3. βœ… Can submit a question and see streaming events
893
+ 4. βœ… Agent completes and generates a report
894
+ 5. βœ… Dockerfile builds successfully
895
+ 6. βœ… Can test full flow:
896
 
897
  ```python
898
+ import asyncio
899
+ from src.features.orchestrator.handlers import Orchestrator
900
+
901
+ async def test():
902
+ orchestrator = Orchestrator()
903
+ async for event in orchestrator.run("Can metformin treat Alzheimer's?"):
904
+ print(event.to_display())
905
+
906
+ asyncio.run(test())
907
  ```
908
 
909
+ ---
910
+
911
+ ## 10. Deployment to HuggingFace Spaces
912
+
913
+ ### Option A: Via GitHub (Recommended)
914
+
915
+ 1. Push your code to GitHub
916
+ 2. Create a new Space on HuggingFace
917
+ 3. Connect your GitHub repo
918
+ 4. Add secrets: `OPENAI_API_KEY`
919
+ 5. Deploy!
920
+
921
+ ### Option B: Manual Upload
922
+
923
+ 1. Create a new Gradio Space on HuggingFace
924
+ 2. Upload all files from `src/` and root configs
925
+ 3. Add secrets in Space settings
926
+ 4. Wait for build
927
+
928
+ ### Verify Deployment
929
+
930
+ 1. Visit your Space URL
931
+ 2. Ask: "What drugs could treat long COVID?"
932
+ 3. Verify streaming events appear
933
+ 4. Verify final report is generated
934
 
935
  ---
936
 
937
+ **πŸŽ‰ Congratulations! Phase 4 is the MVP.**
938
+
939
+ After completing Phase 4, you have a working drug repurposing research agent
940
+ that can be demonstrated at the hackathon.
941
 
942
+ **Optional Phase 5**: Improve the report synthesis with a dedicated Report agent.
 
 
 
docs/implementation/roadmap.md CHANGED
@@ -4,6 +4,8 @@
4
 
5
  This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
6
 
 
 
7
  ---
8
 
9
  ## πŸ› οΈ The 2025 "Gucci" Tooling Stack
@@ -19,76 +21,212 @@ We are using the bleeding edge of Python engineering to ensure speed, safety, an
19
  | **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
20
  | **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
21
  | **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
 
 
22
  | **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
 
 
23
 
24
  ---
25
 
26
  ## πŸ—οΈ Architecture: Vertical Slices
27
 
28
  Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
29
- Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.
 
 
30
 
31
- ### Directory Structure (Feature-First)
32
 
33
  ```
34
- src/
35
- β”œβ”€β”€ app.py # Entry point
36
- β”œβ”€β”€ shared/ # Shared utilities (logging, config, base classes)
37
- β”‚ β”œβ”€β”€ config.py
38
- β”‚ └── observability.py
39
- └── features/ # Vertical Slices
40
- β”œβ”€β”€ search/ # Slice: Executing Searches
41
- β”‚ β”œβ”€β”€ handlers.py
42
- β”‚ β”œβ”€β”€ tools.py
43
- β”‚ └── models.py
44
- β”œβ”€β”€ judge/ # Slice: Assessing Quality
45
- β”‚ β”œβ”€β”€ handlers.py
46
- β”‚ β”œβ”€β”€ prompts.py
47
- β”‚ └── models.py
48
- └── report/ # Slice: Synthesizing Output
49
- β”œβ”€β”€ handlers.py
50
- └── models.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ```
52
 
53
  ---
54
 
55
  ## πŸš€ Phased Execution Plan
56
 
57
- ### **Phase 1: Foundation & Tooling (Day 1)**
 
58
  *Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
59
- - [ ] Initialize `pyproject.toml` with `uv`.
60
- - [ ] Configure `ruff` (strict) and `mypy` (strict).
61
- - [ ] Set up `pytest` with sugar and coverage.
62
- - [ ] Implement `shared/config.py` (Configuration Slice).
63
- - **Deliverable**: A repo that passes CI with `uv run pytest`.
64
 
65
- ### **Phase 2: The "Search" Vertical Slice (Day 2)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  *Goal: Agent can receive a query and get raw results from PubMed/Web.*
67
- - [ ] **TDD**: Write test for `SearchHandler`.
68
- - [ ] Implement `features/search/tools.py` (PubMed + DuckDuckGo).
69
- - [ ] Implement `features/search/handlers.py` (Orchestrates tools).
70
- - **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
71
 
72
- ### **Phase 3: The "Judge" Vertical Slice (Day 3)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  *Goal: Agent can decide if evidence is sufficient.*
74
- - [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
75
- - [ ] Implement `features/judge/prompts.py` (Structured outputs).
76
- - [ ] Implement `features/judge/handlers.py` (LLM interaction).
77
- - **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
78
 
79
- ### **Phase 4: The "Loop" & UI Slice (Day 4)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  *Goal: End-to-End User Value.*
81
- - [ ] Implement the `Orchestrator` (Connects Search + Judge loops).
82
- - [ ] Build `features/ui/` (Gradio with Streaming).
83
- - **Deliverable**: Working DeepCritical Agent on HuggingFace.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  ---
86
 
87
- ## πŸ“œ Spec Documents
88
 
89
- 1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)**
90
- 2. **[Phase 2 Spec: Search Slice](02_phase_search.md)**
91
- 3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)**
92
- 4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
- *Start by reading Phase 1 Spec to initialize the repo.*
 
4
 
5
  This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
6
 
7
+ **Total Estimated Effort**: 12-16 hours (can be done in 4 days)
8
+
9
  ---
10
 
11
  ## πŸ› οΈ The 2025 "Gucci" Tooling Stack
 
21
  | **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
22
  | **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
23
  | **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
24
+ | **Test Plugins** | **`pytest-mock`** | Easy mocking with `mocker` fixture. |
25
+ | **HTTP Mocking** | **`respx`** | Mock `httpx` requests in tests. |
26
  | **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
27
+ | **Retry Logic** | **`tenacity`** | Exponential backoff for API calls. |
28
+ | **Logging** | **`structlog`** | Structured JSON logging. |
29
 
30
  ---
31
 
32
  ## πŸ—οΈ Architecture: Vertical Slices
33
 
34
  Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
35
+ Each slice implements a feature from **Entry Point (UI/API) β†’ Logic β†’ Data/External**.
36
+
37
+ ### Directory Structure (Maintainer's Template + Our Code)
38
 
39
+ We use the **existing scaffolding** from the maintainer, filling in the empty files.
40
 
41
  ```
42
+ deepcritical/
43
+ β”œβ”€β”€ pyproject.toml # All config in one file
44
+ β”œβ”€β”€ .env.example # Environment template
45
+ β”œβ”€β”€ .pre-commit-config.yaml # Git hooks
46
+ β”œβ”€β”€ Dockerfile # Container build
47
+ β”œβ”€β”€ README.md # HuggingFace Space config
48
+ β”‚
49
+ β”œβ”€β”€ src/
50
+ β”‚ β”œβ”€β”€ app.py # Gradio entry point
51
+ β”‚ β”œβ”€β”€ orchestrator.py # Main agent loop (Searchβ†’Judgeβ†’Synthesize)
52
+ β”‚ β”‚
53
+ β”‚ β”œβ”€β”€ agent_factory/ # Agent definitions
54
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
55
+ β”‚ β”‚ β”œβ”€β”€ agents.py # (Reserved for future agents)
56
+ β”‚ β”‚ └── judges.py # JudgeHandler - LLM evidence assessment
57
+ β”‚ β”‚
58
+ β”‚ β”œβ”€β”€ tools/ # Search tools
59
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
60
+ β”‚ β”‚ β”œβ”€β”€ pubmed.py # PubMedTool - NCBI E-utilities
61
+ β”‚ β”‚ β”œβ”€β”€ websearch.py # WebTool - DuckDuckGo
62
+ β”‚ β”‚ └── search_handler.py # SearchHandler - orchestrates tools
63
+ β”‚ β”‚
64
+ β”‚ β”œβ”€β”€ prompts/ # Prompt templates
65
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
66
+ β”‚ β”‚ └── judge.py # Judge system/user prompts
67
+ β”‚ β”‚
68
+ β”‚ β”œβ”€β”€ utils/ # Shared utilities
69
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
70
+ β”‚ β”‚ β”œβ”€β”€ config.py # Settings via pydantic-settings
71
+ β”‚ β”‚ β”œβ”€β”€ exceptions.py # Custom exceptions
72
+ β”‚ β”‚ └── models.py # ALL Pydantic models (Evidence, JudgeAssessment, etc.)
73
+ β”‚ β”‚
74
+ β”‚ β”œβ”€β”€ middleware/ # (Empty - reserved)
75
+ β”‚ β”œβ”€β”€ database_services/ # (Empty - reserved)
76
+ β”‚ └── retrieval_factory/ # (Empty - reserved)
77
+ β”‚
78
+ └── tests/
79
+ β”œβ”€β”€ __init__.py
80
+ β”œβ”€β”€ conftest.py # Shared fixtures
81
+ β”‚
82
+ β”œβ”€β”€ unit/ # Fast, mocked tests
83
+ β”‚ β”œβ”€β”€ __init__.py
84
+ β”‚ β”œβ”€β”€ utils/ # Config, models tests
85
+ β”‚ β”œβ”€β”€ tools/ # PubMed, WebSearch tests
86
+ β”‚ └── agent_factory/ # Judge tests
87
+ β”‚
88
+ └── integration/ # Real API tests (optional)
89
+ └── __init__.py
90
  ```
91
 
92
  ---
93
 
94
  ## πŸš€ Phased Execution Plan
95
 
96
+ ### **Phase 1: Foundation & Tooling (~2-3 hours)**
97
+
98
  *Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
 
 
 
 
 
99
 
100
+ | Task | Output |
101
+ |------|--------|
102
+ | Install uv | `uv --version` works |
103
+ | Create pyproject.toml | All deps + config in one file |
104
+ | Set up directory structure | All `__init__.py` files created |
105
+ | Configure ruff + mypy | Strict settings |
106
+ | Create conftest.py | Shared pytest fixtures |
107
+ | Implement shared/config.py | Settings via pydantic-settings |
108
+ | Write first test | `test_config.py` passes |
109
+
110
+ **Deliverable**: `uv run pytest` passes with green output.
111
+
112
+ πŸ“„ **Spec Document**: [01_phase_foundation.md](01_phase_foundation.md)
113
+
114
+ ---
115
+
116
+ ### **Phase 2: The "Search" Vertical Slice (~3-4 hours)**
117
+
118
  *Goal: Agent can receive a query and get raw results from PubMed/Web.*
 
 
 
 
119
 
120
+ | Task | Output |
121
+ |------|--------|
122
+ | Define Evidence/Citation models | Pydantic models |
123
+ | Implement PubMedTool | ESearch β†’ EFetch β†’ Evidence |
124
+ | Implement WebTool | DuckDuckGo β†’ Evidence |
125
+ | Implement SearchHandler | Parallel search orchestration |
126
+ | Write unit tests | Mocked HTTP responses |
127
+
128
+ **Deliverable**: Function that takes "long covid" β†’ returns `List[Evidence]`.
129
+
130
+ πŸ“„ **Spec Document**: [02_phase_search.md](02_phase_search.md)
131
+
132
+ ---
133
+
134
+ ### **Phase 3: The "Judge" Vertical Slice (~3-4 hours)**
135
+
136
  *Goal: Agent can decide if evidence is sufficient.*
 
 
 
 
137
 
138
+ | Task | Output |
139
+ |------|--------|
140
+ | Define JudgeAssessment model | Structured output schema |
141
+ | Write prompt templates | System + user prompts |
142
+ | Implement JudgeHandler | PydanticAI agent with structured output |
143
+ | Write unit tests | Mocked LLM responses |
144
+
145
+ **Deliverable**: Function that takes `List[Evidence]` β†’ returns `JudgeAssessment`.
146
+
147
+ πŸ“„ **Spec Document**: [03_phase_judge.md](03_phase_judge.md)
148
+
149
+ ---
150
+
151
+ ### **Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)**
152
+
153
  *Goal: End-to-End User Value.*
154
+
155
+ | Task | Output |
156
+ |------|--------|
157
+ | Define AgentEvent/State models | Event streaming types |
158
+ | Implement Orchestrator | Main while loop connecting Search→Judge |
159
+ | Implement report synthesis | Generate markdown report |
160
+ | Build Gradio UI | Streaming chat interface |
161
+ | Create Dockerfile | Container for deployment |
162
+ | Create HuggingFace README | Space configuration |
163
+ | Write unit tests | Mocked handlers |
164
+
165
+ **Deliverable**: Working DeepCritical Agent on localhost:7860.
166
+
167
+ πŸ“„ **Spec Document**: [04_phase_ui.md](04_phase_ui.md)
168
+
169
+ ---
170
+
171
+ ## πŸ“œ Spec Documents Summary
172
+
173
+ | Phase | Document | Focus |
174
+ |-------|----------|-------|
175
+ | 1 | [01_phase_foundation.md](01_phase_foundation.md) | Tooling, config, TDD setup |
176
+ | 2 | [02_phase_search.md](02_phase_search.md) | PubMed + DuckDuckGo search |
177
+ | 3 | [03_phase_judge.md](03_phase_judge.md) | LLM evidence assessment |
178
+ | 4 | [04_phase_ui.md](04_phase_ui.md) | Orchestrator + Gradio + Deploy |
179
 
180
  ---
181
 
182
+ ## ⚑ Quick Start Commands
183
 
184
+ ```bash
185
+ # Phase 1: Setup
186
+ curl -LsSf https://astral.sh/uv/install.sh | sh
187
+ uv init --name deepcritical
188
+ uv sync --all-extras
189
+ uv run pytest
190
+
191
+ # Phase 2-4: Development
192
+ uv run pytest tests/unit/ -v # Run unit tests
193
+ uv run ruff check src tests # Lint
194
+ uv run mypy src # Type check
195
+ uv run python src/app.py # Run Gradio locally
196
+
197
+ # Deployment
198
+ docker build -t deepcritical .
199
+ docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical
200
+ ```
201
+
202
+ ---
203
+
204
+ ## 🎯 Definition of Done (MVP)
205
+
206
+ The MVP is **COMPLETE** when:
207
+
208
+ 1. βœ… All unit tests pass (`uv run pytest`)
209
+ 2. βœ… Ruff has no errors (`uv run ruff check`)
210
+ 3. βœ… Mypy has no errors (`uv run mypy src`)
211
+ 4. βœ… Gradio UI runs locally (`uv run python src/app.py`)
212
+ 5. βœ… Can ask "Can metformin treat Alzheimer's?" and get a report
213
+ 6. βœ… Report includes drug candidates, citations, and quality scores
214
+ 7. βœ… Docker builds successfully
215
+ 8. βœ… Deployable to HuggingFace Spaces
216
+
217
+ ---
218
+
219
+ ## πŸ“Š Progress Tracker
220
+
221
+ | Phase | Status | Tests | Notes |
222
+ |-------|--------|-------|-------|
223
+ | 1: Foundation | ⬜ Pending | 0/5 | Start here |
224
+ | 2: Search | ⬜ Pending | 0/6 | Depends on Phase 1 |
225
+ | 3: Judge | ⬜ Pending | 0/5 | Depends on Phase 2 |
226
+ | 4: Orchestrator | ⬜ Pending | 0/4 | Depends on Phase 3 |
227
+
228
+ Update this table as you complete each phase!
229
+
230
+ ---
231
 
232
+ *Start by reading [Phase 1 Spec](01_phase_foundation.md) to initialize the repo.*
docs/index.md CHANGED
@@ -12,6 +12,13 @@ AI-powered deep research system for accelerating drug repurposing discovery.
12
  - **[Overview](architecture/overview.md)** - Project overview, use case, architecture, timeline
13
  - **[Design Patterns](architecture/design-patterns.md)** - 17 technical patterns, reference repos, judge prompts, data models
14
 
 
 
 
 
 
 
 
15
  ### Guides
16
  - [Setup Guide](guides/setup.md) (coming soon)
17
  - **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps
 
12
  - **[Overview](architecture/overview.md)** - Project overview, use case, architecture, timeline
13
  - **[Design Patterns](architecture/design-patterns.md)** - 17 technical patterns, reference repos, judge prompts, data models
14
 
15
+ ### Implementation (Start Here!)
16
+ - **[Roadmap](implementation/roadmap.md)** - Phased execution plan with TDD
17
+ - **[Phase 1: Foundation](implementation/01_phase_foundation.md)** - Tooling, config, first tests
18
+ - **[Phase 2: Search](implementation/02_phase_search.md)** - PubMed + DuckDuckGo
19
+ - **[Phase 3: Judge](implementation/03_phase_judge.md)** - LLM evidence assessment
20
+ - **[Phase 4: UI](implementation/04_phase_ui.md)** - Orchestrator + Gradio + Deploy
21
+
22
  ### Guides
23
  - [Setup Guide](guides/setup.md) (coming soon)
24
  - **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps