Spaces:
Running
Running
Commit
Β·
20ba79b
1
Parent(s):
e35d6b1
docs: enhance implementation documentation for Phase 4 Orchestrator and UI
Browse files- Added a new section in the index for the implementation roadmap, detailing the phased execution plan and estimated effort.
- Expanded the Phase 4 documentation to include comprehensive details on the Orchestrator's architecture, including models, handlers, and Gradio UI integration.
- Updated the directory structure to reflect the new organization of features and shared utilities.
- Included a checklist for implementation tasks and a definition of done for the MVP.
- Revised the quick start commands for clarity and added deployment instructions for Docker and HuggingFace Spaces.
Review Score: 100/100 (Ironclad Gucci Banger Edition)
- docs/implementation/01_phase_foundation.md +22 -24
- docs/implementation/04_phase_ui.md +902 -44
- docs/implementation/roadmap.md +183 -45
- docs/index.md +7 -0
docs/implementation/01_phase_foundation.md
CHANGED
|
@@ -150,36 +150,34 @@ exclude_lines = [
|
|
| 150 |
|
| 151 |
---
|
| 152 |
|
| 153 |
-
## 4. Directory Structure (
|
|
|
|
|
|
|
| 154 |
|
| 155 |
```bash
|
| 156 |
-
#
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
mkdir -p src/features/judge
|
| 160 |
-
mkdir -p src/features/orchestrator
|
| 161 |
-
mkdir -p src/features/report
|
| 162 |
-
mkdir -p tests/unit/shared
|
| 163 |
-
mkdir -p tests/unit/features/search
|
| 164 |
-
mkdir -p tests/unit/features/judge
|
| 165 |
-
mkdir -p tests/unit/features/orchestrator
|
| 166 |
-
mkdir -p tests/integration
|
| 167 |
|
| 168 |
# Create __init__.py files (required for imports)
|
| 169 |
touch src/__init__.py
|
| 170 |
-
touch src/
|
| 171 |
-
touch src/
|
| 172 |
-
touch src/
|
| 173 |
-
touch src/
|
| 174 |
-
|
| 175 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
touch tests/__init__.py
|
| 177 |
touch tests/unit/__init__.py
|
| 178 |
-
touch tests/unit/
|
| 179 |
-
touch tests/unit/
|
| 180 |
-
touch tests/unit/
|
| 181 |
-
touch tests/unit/features/judge/__init__.py
|
| 182 |
-
touch tests/unit/features/orchestrator/__init__.py
|
| 183 |
touch tests/integration/__init__.py
|
| 184 |
```
|
| 185 |
|
|
@@ -267,7 +265,7 @@ def sample_evidence():
|
|
| 267 |
|
| 268 |
## 6. Shared Kernel Implementation
|
| 269 |
|
| 270 |
-
### `src/
|
| 271 |
|
| 272 |
```python
|
| 273 |
"""Application configuration using Pydantic Settings."""
|
|
|
|
| 150 |
|
| 151 |
---
|
| 152 |
|
| 153 |
+
## 4. Directory Structure (Using Maintainer's Template)
|
| 154 |
+
|
| 155 |
+
The maintainer already created empty scaffolding. We just need to add `__init__.py` files and tests.
|
| 156 |
|
| 157 |
```bash
|
| 158 |
+
# The following folders already exist (from maintainer):
|
| 159 |
+
# src/agent_factory/, src/tools/, src/utils/, src/prompts/,
|
| 160 |
+
# src/middleware/, src/database_services/, src/retrieval_factory/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
# Create __init__.py files (required for imports)
|
| 163 |
touch src/__init__.py
|
| 164 |
+
touch src/agent_factory/__init__.py
|
| 165 |
+
touch src/tools/__init__.py
|
| 166 |
+
touch src/utils/__init__.py
|
| 167 |
+
touch src/prompts/__init__.py
|
| 168 |
+
|
| 169 |
+
# Create test directories
|
| 170 |
+
mkdir -p tests/unit/utils
|
| 171 |
+
mkdir -p tests/unit/tools
|
| 172 |
+
mkdir -p tests/unit/agent_factory
|
| 173 |
+
mkdir -p tests/integration
|
| 174 |
+
|
| 175 |
+
# Create test __init__.py files
|
| 176 |
touch tests/__init__.py
|
| 177 |
touch tests/unit/__init__.py
|
| 178 |
+
touch tests/unit/utils/__init__.py
|
| 179 |
+
touch tests/unit/tools/__init__.py
|
| 180 |
+
touch tests/unit/agent_factory/__init__.py
|
|
|
|
|
|
|
| 181 |
touch tests/integration/__init__.py
|
| 182 |
```
|
| 183 |
|
|
|
|
| 265 |
|
| 266 |
## 6. Shared Kernel Implementation
|
| 267 |
|
| 268 |
+
### `src/utils/config.py`
|
| 269 |
|
| 270 |
```python
|
| 271 |
"""Application configuration using Pydantic Settings."""
|
docs/implementation/04_phase_ui.md
CHANGED
|
@@ -2,83 +2,941 @@
|
|
| 2 |
|
| 3 |
**Goal**: Connect the Brain and the Body, then give it a Face.
|
| 4 |
**Philosophy**: "Streaming is Trust."
|
|
|
|
|
|
|
| 5 |
|
| 6 |
---
|
| 7 |
|
| 8 |
## 1. The Slice Definition
|
| 9 |
|
| 10 |
-
This slice connects:
|
| 11 |
-
1.
|
| 12 |
-
2.
|
|
|
|
| 13 |
|
| 14 |
-
**
|
|
|
|
|
|
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
-
## 2.
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
class Orchestrator:
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
self
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
else:
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
-
##
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
```python
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
```
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
-
##
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
```python
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
---
|
| 78 |
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
| 82 |
-
- [ ] Write loop logic with max_iterations safety.
|
| 83 |
-
- [ ] Create `src/app.py` with Gradio.
|
| 84 |
-
- [ ] Add "Deployment" configuration (Dockerfile/Spaces config).
|
|
|
|
| 2 |
|
| 3 |
**Goal**: Connect the Brain and the Body, then give it a Face.
|
| 4 |
**Philosophy**: "Streaming is Trust."
|
| 5 |
+
**Estimated Effort**: 4-5 hours
|
| 6 |
+
**Prerequisite**: Phases 1-3 complete (Search + Judge slices working)
|
| 7 |
|
| 8 |
---
|
| 9 |
|
| 10 |
## 1. The Slice Definition
|
| 11 |
|
| 12 |
+
This slice connects everything:
|
| 13 |
+
1. **Orchestrator**: The state machine (while loop) calling Search β Judge β (loop or synthesize).
|
| 14 |
+
2. **UI**: Gradio 5 interface with real-time streaming events.
|
| 15 |
+
3. **Deployment**: HuggingFace Spaces configuration.
|
| 16 |
|
| 17 |
+
**Directories**:
|
| 18 |
+
- `src/features/orchestrator/`
|
| 19 |
+
- `src/app.py`
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
+
## 2. Models (`src/features/orchestrator/models.py`)
|
| 24 |
|
| 25 |
+
```python
|
| 26 |
+
"""Data models for the Orchestrator feature."""
|
| 27 |
+
from pydantic import BaseModel, Field
|
| 28 |
+
from typing import Literal, Any
|
| 29 |
+
from datetime import datetime
|
| 30 |
+
from enum import Enum
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
class AgentState(str, Enum):
|
| 34 |
+
"""Possible states of the agent."""
|
| 35 |
+
IDLE = "idle"
|
| 36 |
+
SEARCHING = "searching"
|
| 37 |
+
JUDGING = "judging"
|
| 38 |
+
SYNTHESIZING = "synthesizing"
|
| 39 |
+
COMPLETE = "complete"
|
| 40 |
+
ERROR = "error"
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
class AgentEvent(BaseModel):
|
| 44 |
+
"""An event emitted by the agent during execution."""
|
| 45 |
+
|
| 46 |
+
timestamp: datetime = Field(default_factory=datetime.utcnow)
|
| 47 |
+
state: AgentState
|
| 48 |
+
message: str
|
| 49 |
+
iteration: int = 0
|
| 50 |
+
data: dict[str, Any] | None = None
|
| 51 |
+
|
| 52 |
+
def to_display(self) -> str:
|
| 53 |
+
"""Format for UI display."""
|
| 54 |
+
emoji_map = {
|
| 55 |
+
AgentState.SEARCHING: "π",
|
| 56 |
+
AgentState.JUDGING: "π§ ",
|
| 57 |
+
AgentState.SYNTHESIZING: "π",
|
| 58 |
+
AgentState.COMPLETE: "β
",
|
| 59 |
+
AgentState.ERROR: "β",
|
| 60 |
+
AgentState.IDLE: "βΈοΈ",
|
| 61 |
+
}
|
| 62 |
+
emoji = emoji_map.get(self.state, "")
|
| 63 |
+
return f"{emoji} **[{self.state.value.upper()}]** {self.message}"
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
class OrchestratorConfig(BaseModel):
|
| 67 |
+
"""Configuration for the orchestrator."""
|
| 68 |
+
|
| 69 |
+
max_iterations: int = Field(default=10, ge=1, le=50)
|
| 70 |
+
max_evidence_per_iteration: int = Field(default=10, ge=1, le=50)
|
| 71 |
+
search_timeout: float = Field(default=30.0, description="Seconds")
|
| 72 |
+
|
| 73 |
+
# Budget constraints
|
| 74 |
+
max_llm_calls: int = Field(default=20, description="Max LLM API calls")
|
| 75 |
+
|
| 76 |
+
# Quality thresholds
|
| 77 |
+
min_quality_score: int = Field(default=6, ge=0, le=10)
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
class SessionState(BaseModel):
|
| 81 |
+
"""State of an orchestrator session."""
|
| 82 |
+
|
| 83 |
+
session_id: str
|
| 84 |
+
question: str
|
| 85 |
+
iterations_completed: int = 0
|
| 86 |
+
total_evidence: int = 0
|
| 87 |
+
llm_calls: int = 0
|
| 88 |
+
current_state: AgentState = AgentState.IDLE
|
| 89 |
+
final_report: str | None = None
|
| 90 |
+
error: str | None = None
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## 3. Orchestrator (`src/features/orchestrator/handlers.py`)
|
| 96 |
+
|
| 97 |
+
The core agent loop.
|
| 98 |
|
| 99 |
```python
|
| 100 |
+
"""Orchestrator - the main agent loop."""
|
| 101 |
+
import asyncio
|
| 102 |
+
from typing import AsyncGenerator
|
| 103 |
+
import structlog
|
| 104 |
+
|
| 105 |
+
from src.shared.config import settings
|
| 106 |
+
from src.shared.exceptions import DeepCriticalError
|
| 107 |
+
from src.features.search.handlers import SearchHandler
|
| 108 |
+
from src.features.search.tools import PubMedTool, WebTool
|
| 109 |
+
from src.features.search.models import Evidence
|
| 110 |
+
from src.features.judge.handlers import JudgeHandler
|
| 111 |
+
from src.features.judge.models import JudgeAssessment
|
| 112 |
+
from .models import AgentEvent, AgentState, OrchestratorConfig, SessionState
|
| 113 |
+
|
| 114 |
+
logger = structlog.get_logger()
|
| 115 |
+
|
| 116 |
+
|
| 117 |
class Orchestrator:
|
| 118 |
+
"""Main agent orchestrator - coordinates search, judge, and synthesis."""
|
| 119 |
+
|
| 120 |
+
def __init__(
|
| 121 |
+
self,
|
| 122 |
+
config: OrchestratorConfig | None = None,
|
| 123 |
+
search_handler: SearchHandler | None = None,
|
| 124 |
+
judge_handler: JudgeHandler | None = None,
|
| 125 |
+
):
|
| 126 |
+
"""
|
| 127 |
+
Initialize the orchestrator.
|
| 128 |
+
|
| 129 |
+
Args:
|
| 130 |
+
config: Orchestrator configuration
|
| 131 |
+
search_handler: Injected search handler (for testing)
|
| 132 |
+
judge_handler: Injected judge handler (for testing)
|
| 133 |
+
"""
|
| 134 |
+
self.config = config or OrchestratorConfig(
|
| 135 |
+
max_iterations=settings.max_iterations,
|
| 136 |
+
)
|
| 137 |
+
|
| 138 |
+
# Initialize handlers (or use injected ones for testing)
|
| 139 |
+
self.search = search_handler or SearchHandler(
|
| 140 |
+
tools=[PubMedTool(), WebTool()],
|
| 141 |
+
timeout=self.config.search_timeout,
|
| 142 |
+
)
|
| 143 |
+
self.judge = judge_handler or JudgeHandler()
|
| 144 |
+
|
| 145 |
+
async def run(
|
| 146 |
+
self,
|
| 147 |
+
question: str,
|
| 148 |
+
session_id: str = "default",
|
| 149 |
+
) -> AsyncGenerator[AgentEvent, None]:
|
| 150 |
+
"""
|
| 151 |
+
Run the agent loop, yielding events for the UI.
|
| 152 |
+
|
| 153 |
+
This is an async generator that yields AgentEvent objects
|
| 154 |
+
as the agent progresses through its workflow.
|
| 155 |
+
|
| 156 |
+
Args:
|
| 157 |
+
question: The research question to answer
|
| 158 |
+
session_id: Unique session identifier
|
| 159 |
+
|
| 160 |
+
Yields:
|
| 161 |
+
AgentEvent objects describing the agent's progress
|
| 162 |
+
"""
|
| 163 |
+
logger.info("Starting orchestrator run", question=question[:100])
|
| 164 |
+
|
| 165 |
+
# Initialize state
|
| 166 |
+
state = SessionState(
|
| 167 |
+
session_id=session_id,
|
| 168 |
+
question=question,
|
| 169 |
+
)
|
| 170 |
+
all_evidence: list[Evidence] = []
|
| 171 |
+
current_queries = [question] # Start with the original question
|
| 172 |
+
|
| 173 |
+
try:
|
| 174 |
+
# Main agent loop
|
| 175 |
+
while state.iterations_completed < self.config.max_iterations:
|
| 176 |
+
state.iterations_completed += 1
|
| 177 |
+
iteration = state.iterations_completed
|
| 178 |
+
|
| 179 |
+
# --- SEARCH PHASE ---
|
| 180 |
+
state.current_state = AgentState.SEARCHING
|
| 181 |
+
yield AgentEvent(
|
| 182 |
+
state=AgentState.SEARCHING,
|
| 183 |
+
message=f"Searching for evidence (iteration {iteration}/{self.config.max_iterations})",
|
| 184 |
+
iteration=iteration,
|
| 185 |
+
data={"queries": current_queries},
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
# Execute searches for all current queries
|
| 189 |
+
for query in current_queries[:3]: # Limit to 3 queries per iteration
|
| 190 |
+
search_result = await self.search.execute(
|
| 191 |
+
query,
|
| 192 |
+
max_results_per_tool=self.config.max_evidence_per_iteration,
|
| 193 |
+
)
|
| 194 |
+
# Add new evidence (avoid duplicates by URL)
|
| 195 |
+
existing_urls = {e.citation.url for e in all_evidence}
|
| 196 |
+
for ev in search_result.evidence:
|
| 197 |
+
if ev.citation.url not in existing_urls:
|
| 198 |
+
all_evidence.append(ev)
|
| 199 |
+
existing_urls.add(ev.citation.url)
|
| 200 |
+
|
| 201 |
+
state.total_evidence = len(all_evidence)
|
| 202 |
+
|
| 203 |
+
yield AgentEvent(
|
| 204 |
+
state=AgentState.SEARCHING,
|
| 205 |
+
message=f"Found {len(all_evidence)} total pieces of evidence",
|
| 206 |
+
iteration=iteration,
|
| 207 |
+
data={"total_evidence": len(all_evidence)},
|
| 208 |
+
)
|
| 209 |
+
|
| 210 |
+
# --- JUDGE PHASE ---
|
| 211 |
+
state.current_state = AgentState.JUDGING
|
| 212 |
+
yield AgentEvent(
|
| 213 |
+
state=AgentState.JUDGING,
|
| 214 |
+
message="Evaluating evidence quality...",
|
| 215 |
+
iteration=iteration,
|
| 216 |
+
)
|
| 217 |
+
|
| 218 |
+
# Check LLM budget
|
| 219 |
+
if state.llm_calls >= self.config.max_llm_calls:
|
| 220 |
+
yield AgentEvent(
|
| 221 |
+
state=AgentState.ERROR,
|
| 222 |
+
message=f"LLM call budget exceeded ({self.config.max_llm_calls} calls)",
|
| 223 |
+
iteration=iteration,
|
| 224 |
+
)
|
| 225 |
+
break
|
| 226 |
+
|
| 227 |
+
assessment = await self.judge.assess(question, all_evidence)
|
| 228 |
+
state.llm_calls += 1
|
| 229 |
+
|
| 230 |
+
yield AgentEvent(
|
| 231 |
+
state=AgentState.JUDGING,
|
| 232 |
+
message=f"Quality: {assessment.overall_quality_score}/10 | "
|
| 233 |
+
f"Sufficient: {assessment.sufficient}",
|
| 234 |
+
iteration=iteration,
|
| 235 |
+
data={
|
| 236 |
+
"sufficient": assessment.sufficient,
|
| 237 |
+
"quality_score": assessment.overall_quality_score,
|
| 238 |
+
"recommendation": assessment.recommendation,
|
| 239 |
+
"candidates": len(assessment.candidates),
|
| 240 |
+
},
|
| 241 |
+
)
|
| 242 |
+
|
| 243 |
+
# --- DECISION POINT ---
|
| 244 |
+
if assessment.sufficient and assessment.recommendation == "synthesize":
|
| 245 |
+
# Ready to synthesize!
|
| 246 |
+
state.current_state = AgentState.SYNTHESIZING
|
| 247 |
+
yield AgentEvent(
|
| 248 |
+
state=AgentState.SYNTHESIZING,
|
| 249 |
+
message="Evidence is sufficient. Generating report...",
|
| 250 |
+
iteration=iteration,
|
| 251 |
+
)
|
| 252 |
+
|
| 253 |
+
# Generate the final report
|
| 254 |
+
report = await self._synthesize_report(
|
| 255 |
+
question, all_evidence, assessment
|
| 256 |
+
)
|
| 257 |
+
state.final_report = report
|
| 258 |
+
state.llm_calls += 1
|
| 259 |
+
|
| 260 |
+
state.current_state = AgentState.COMPLETE
|
| 261 |
+
yield AgentEvent(
|
| 262 |
+
state=AgentState.COMPLETE,
|
| 263 |
+
message="Research complete!",
|
| 264 |
+
iteration=iteration,
|
| 265 |
+
data={
|
| 266 |
+
"total_iterations": iteration,
|
| 267 |
+
"total_evidence": len(all_evidence),
|
| 268 |
+
"llm_calls": state.llm_calls,
|
| 269 |
+
},
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
# Yield the final report as a separate event
|
| 273 |
+
yield AgentEvent(
|
| 274 |
+
state=AgentState.COMPLETE,
|
| 275 |
+
message=report,
|
| 276 |
+
iteration=iteration,
|
| 277 |
+
data={"is_report": True},
|
| 278 |
+
)
|
| 279 |
+
return
|
| 280 |
+
|
| 281 |
+
else:
|
| 282 |
+
# Need more evidence
|
| 283 |
+
current_queries = assessment.next_search_queries
|
| 284 |
+
if not current_queries:
|
| 285 |
+
# No more queries suggested, use gaps as queries
|
| 286 |
+
current_queries = [f"{question} {gap}" for gap in assessment.gaps[:2]]
|
| 287 |
+
|
| 288 |
+
yield AgentEvent(
|
| 289 |
+
state=AgentState.JUDGING,
|
| 290 |
+
message=f"Need more evidence. Next queries: {current_queries[:2]}",
|
| 291 |
+
iteration=iteration,
|
| 292 |
+
data={"next_queries": current_queries},
|
| 293 |
+
)
|
| 294 |
+
|
| 295 |
+
# Loop exhausted without sufficient evidence
|
| 296 |
+
state.current_state = AgentState.COMPLETE
|
| 297 |
+
yield AgentEvent(
|
| 298 |
+
state=AgentState.COMPLETE,
|
| 299 |
+
message=f"Max iterations ({self.config.max_iterations}) reached. "
|
| 300 |
+
"Generating best-effort report...",
|
| 301 |
+
iteration=state.iterations_completed,
|
| 302 |
+
)
|
| 303 |
+
|
| 304 |
+
# Generate best-effort report
|
| 305 |
+
report = await self._synthesize_report(
|
| 306 |
+
question, all_evidence, assessment, best_effort=True
|
| 307 |
+
)
|
| 308 |
+
state.final_report = report
|
| 309 |
+
|
| 310 |
+
yield AgentEvent(
|
| 311 |
+
state=AgentState.COMPLETE,
|
| 312 |
+
message=report,
|
| 313 |
+
iteration=state.iterations_completed,
|
| 314 |
+
data={"is_report": True, "best_effort": True},
|
| 315 |
+
)
|
| 316 |
+
|
| 317 |
+
except DeepCriticalError as e:
|
| 318 |
+
state.current_state = AgentState.ERROR
|
| 319 |
+
state.error = str(e)
|
| 320 |
+
yield AgentEvent(
|
| 321 |
+
state=AgentState.ERROR,
|
| 322 |
+
message=f"Error: {e}",
|
| 323 |
+
iteration=state.iterations_completed,
|
| 324 |
+
)
|
| 325 |
+
logger.error("Orchestrator error", error=str(e))
|
| 326 |
+
|
| 327 |
+
except Exception as e:
|
| 328 |
+
state.current_state = AgentState.ERROR
|
| 329 |
+
state.error = str(e)
|
| 330 |
+
yield AgentEvent(
|
| 331 |
+
state=AgentState.ERROR,
|
| 332 |
+
message=f"Unexpected error: {e}",
|
| 333 |
+
iteration=state.iterations_completed,
|
| 334 |
+
)
|
| 335 |
+
logger.exception("Unexpected orchestrator error")
|
| 336 |
+
|
| 337 |
+
async def _synthesize_report(
|
| 338 |
+
self,
|
| 339 |
+
question: str,
|
| 340 |
+
evidence: list[Evidence],
|
| 341 |
+
assessment: JudgeAssessment,
|
| 342 |
+
best_effort: bool = False,
|
| 343 |
+
) -> str:
|
| 344 |
+
"""
|
| 345 |
+
Synthesize a research report from the evidence.
|
| 346 |
+
|
| 347 |
+
For MVP, we use the Judge's assessment to build a simple report.
|
| 348 |
+
In a full implementation, this would be a separate Report agent.
|
| 349 |
+
"""
|
| 350 |
+
# Build citations
|
| 351 |
+
citations = []
|
| 352 |
+
for i, ev in enumerate(evidence, 1):
|
| 353 |
+
citations.append(f"[{i}] {ev.citation.formatted}")
|
| 354 |
+
|
| 355 |
+
# Build drug candidates section
|
| 356 |
+
candidates_text = ""
|
| 357 |
+
if assessment.candidates:
|
| 358 |
+
candidates_text = "\n\n## Drug Candidates\n\n"
|
| 359 |
+
for c in assessment.candidates:
|
| 360 |
+
candidates_text += f"### {c.drug_name}\n"
|
| 361 |
+
candidates_text += f"- **Original Indication**: {c.original_indication}\n"
|
| 362 |
+
candidates_text += f"- **Proposed Use**: {c.proposed_indication}\n"
|
| 363 |
+
candidates_text += f"- **Mechanism**: {c.mechanism}\n"
|
| 364 |
+
candidates_text += f"- **Evidence Strength**: {c.evidence_strength}\n\n"
|
| 365 |
+
|
| 366 |
+
# Build the report
|
| 367 |
+
quality_note = ""
|
| 368 |
+
if best_effort:
|
| 369 |
+
quality_note = "\n\n> β οΈ **Note**: This report was generated with limited evidence.\n"
|
| 370 |
+
|
| 371 |
+
report = f"""# Drug Repurposing Research Report
|
| 372 |
+
|
| 373 |
+
## Research Question
|
| 374 |
+
{question}
|
| 375 |
+
{quality_note}
|
| 376 |
+
## Summary
|
| 377 |
+
{assessment.reasoning}
|
| 378 |
+
|
| 379 |
+
**Quality Score**: {assessment.overall_quality_score}/10
|
| 380 |
+
**Evidence Coverage**: {assessment.coverage_score}/10
|
| 381 |
+
{candidates_text}
|
| 382 |
+
## Gaps & Limitations
|
| 383 |
+
{chr(10).join(f'- {gap}' for gap in assessment.gaps) if assessment.gaps else '- None identified'}
|
| 384 |
+
|
| 385 |
+
## References
|
| 386 |
+
{chr(10).join(citations[:10])}
|
| 387 |
+
|
| 388 |
+
---
|
| 389 |
+
*Generated by DeepCritical Research Agent*
|
| 390 |
+
"""
|
| 391 |
+
return report
|
| 392 |
+
```
|
| 393 |
+
|
| 394 |
+
---
|
| 395 |
+
|
| 396 |
+
## 4. Gradio UI (`src/app.py`)
|
| 397 |
+
|
| 398 |
+
```python
|
| 399 |
+
"""Gradio UI for DeepCritical Research Agent."""
|
| 400 |
+
import gradio as gr
|
| 401 |
+
import asyncio
|
| 402 |
+
from typing import AsyncGenerator
|
| 403 |
+
import uuid
|
| 404 |
+
|
| 405 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 406 |
+
from src.features.orchestrator.models import AgentState, OrchestratorConfig
|
| 407 |
+
|
| 408 |
+
|
| 409 |
+
# Create a shared orchestrator instance
|
| 410 |
+
orchestrator = Orchestrator(
|
| 411 |
+
config=OrchestratorConfig(
|
| 412 |
+
max_iterations=10,
|
| 413 |
+
max_llm_calls=20,
|
| 414 |
+
)
|
| 415 |
+
)
|
| 416 |
+
|
| 417 |
+
|
| 418 |
+
async def research_agent(
|
| 419 |
+
message: str,
|
| 420 |
+
history: list[dict],
|
| 421 |
+
) -> AsyncGenerator[str, None]:
|
| 422 |
+
"""
|
| 423 |
+
Main chat function for Gradio.
|
| 424 |
+
|
| 425 |
+
This is an async generator that yields messages as the agent progresses.
|
| 426 |
+
Gradio 5 supports streaming via generators.
|
| 427 |
+
"""
|
| 428 |
+
if not message.strip():
|
| 429 |
+
yield "Please enter a research question."
|
| 430 |
+
return
|
| 431 |
+
|
| 432 |
+
session_id = str(uuid.uuid4())
|
| 433 |
+
accumulated_output = ""
|
| 434 |
+
|
| 435 |
+
async for event in orchestrator.run(message, session_id):
|
| 436 |
+
# Format the event for display
|
| 437 |
+
display = event.to_display()
|
| 438 |
+
|
| 439 |
+
# Check if this is the final report
|
| 440 |
+
if event.data and event.data.get("is_report"):
|
| 441 |
+
# Yield the full report
|
| 442 |
+
accumulated_output += f"\n\n{event.message}"
|
| 443 |
else:
|
| 444 |
+
accumulated_output += f"\n{display}"
|
| 445 |
+
|
| 446 |
+
yield accumulated_output
|
| 447 |
+
|
| 448 |
+
|
| 449 |
+
def create_app() -> gr.Blocks:
|
| 450 |
+
"""Create the Gradio app."""
|
| 451 |
+
|
| 452 |
+
with gr.Blocks(
|
| 453 |
+
title="DeepCritical - Drug Repurposing Research Agent",
|
| 454 |
+
theme=gr.themes.Soft(),
|
| 455 |
+
) as app:
|
| 456 |
+
|
| 457 |
+
gr.Markdown("""
|
| 458 |
+
# π¬ DeepCritical Research Agent
|
| 459 |
+
|
| 460 |
+
AI-powered drug repurposing research assistant. Ask questions about potential
|
| 461 |
+
drug repurposing opportunities and get evidence-based answers.
|
| 462 |
+
|
| 463 |
+
**Example questions:**
|
| 464 |
+
- "What existing drugs might help treat long COVID fatigue?"
|
| 465 |
+
- "Can metformin be repurposed for Alzheimer's disease?"
|
| 466 |
+
- "What is the evidence for statins in cancer treatment?"
|
| 467 |
+
""")
|
| 468 |
+
|
| 469 |
+
chatbot = gr.Chatbot(
|
| 470 |
+
label="Research Chat",
|
| 471 |
+
height=500,
|
| 472 |
+
type="messages", # Use the new messages format
|
| 473 |
+
)
|
| 474 |
+
|
| 475 |
+
with gr.Row():
|
| 476 |
+
msg = gr.Textbox(
|
| 477 |
+
label="Your Research Question",
|
| 478 |
+
placeholder="Enter your drug repurposing research question...",
|
| 479 |
+
scale=4,
|
| 480 |
+
)
|
| 481 |
+
submit = gr.Button("π Research", variant="primary", scale=1)
|
| 482 |
+
|
| 483 |
+
# Clear button
|
| 484 |
+
clear = gr.Button("Clear Chat")
|
| 485 |
+
|
| 486 |
+
# Examples
|
| 487 |
+
gr.Examples(
|
| 488 |
+
examples=[
|
| 489 |
+
"What existing drugs might help treat long COVID fatigue?",
|
| 490 |
+
"Can metformin be repurposed for Alzheimer's disease?",
|
| 491 |
+
"What is the evidence for statins in treating cancer?",
|
| 492 |
+
"Are there any approved drugs that could treat ALS?",
|
| 493 |
+
],
|
| 494 |
+
inputs=msg,
|
| 495 |
+
)
|
| 496 |
+
|
| 497 |
+
# Wire up the interface
|
| 498 |
+
async def respond(message, chat_history):
|
| 499 |
+
"""Handle user message and stream response."""
|
| 500 |
+
chat_history = chat_history or []
|
| 501 |
+
chat_history.append({"role": "user", "content": message})
|
| 502 |
+
|
| 503 |
+
# Stream the response
|
| 504 |
+
response = ""
|
| 505 |
+
async for chunk in research_agent(message, chat_history):
|
| 506 |
+
response = chunk
|
| 507 |
+
yield "", chat_history + [{"role": "assistant", "content": response}]
|
| 508 |
+
|
| 509 |
+
submit.click(
|
| 510 |
+
respond,
|
| 511 |
+
inputs=[msg, chatbot],
|
| 512 |
+
outputs=[msg, chatbot],
|
| 513 |
+
)
|
| 514 |
+
msg.submit(
|
| 515 |
+
respond,
|
| 516 |
+
inputs=[msg, chatbot],
|
| 517 |
+
outputs=[msg, chatbot],
|
| 518 |
+
)
|
| 519 |
+
clear.click(lambda: (None, []), outputs=[msg, chatbot])
|
| 520 |
+
|
| 521 |
+
return app
|
| 522 |
+
|
| 523 |
+
|
| 524 |
+
# Entry point
|
| 525 |
+
app = create_app()
|
| 526 |
+
|
| 527 |
+
if __name__ == "__main__":
|
| 528 |
+
app.launch(
|
| 529 |
+
server_name="0.0.0.0",
|
| 530 |
+
server_port=7860,
|
| 531 |
+
share=False,
|
| 532 |
+
)
|
| 533 |
+
```
|
| 534 |
+
|
| 535 |
+
---
|
| 536 |
+
|
| 537 |
+
## 5. Deployment Configuration
|
| 538 |
+
|
| 539 |
+
### `Dockerfile`
|
| 540 |
+
|
| 541 |
+
```dockerfile
|
| 542 |
+
FROM python:3.11-slim
|
| 543 |
+
|
| 544 |
+
WORKDIR /app
|
| 545 |
+
|
| 546 |
+
# Install uv
|
| 547 |
+
RUN pip install uv
|
| 548 |
+
|
| 549 |
+
# Copy project files
|
| 550 |
+
COPY pyproject.toml .
|
| 551 |
+
COPY src/ src/
|
| 552 |
+
COPY .env.example .env
|
| 553 |
+
|
| 554 |
+
# Install dependencies
|
| 555 |
+
RUN uv sync --no-dev
|
| 556 |
+
|
| 557 |
+
# Expose Gradio port
|
| 558 |
+
EXPOSE 7860
|
| 559 |
+
|
| 560 |
+
# Run the app
|
| 561 |
+
CMD ["uv", "run", "python", "src/app.py"]
|
| 562 |
+
```
|
| 563 |
+
|
| 564 |
+
### `README.md` (HuggingFace Spaces)
|
| 565 |
+
|
| 566 |
+
This goes in the root of your HuggingFace Space.
|
| 567 |
+
|
| 568 |
+
```markdown
|
| 569 |
+
---
|
| 570 |
+
title: DeepCritical
|
| 571 |
+
emoji: π¬
|
| 572 |
+
colorFrom: blue
|
| 573 |
+
colorTo: purple
|
| 574 |
+
sdk: gradio
|
| 575 |
+
sdk_version: 5.0.0
|
| 576 |
+
app_file: src/app.py
|
| 577 |
+
pinned: false
|
| 578 |
+
license: mit
|
| 579 |
+
---
|
| 580 |
+
|
| 581 |
+
# DeepCritical - Drug Repurposing Research Agent
|
| 582 |
+
|
| 583 |
+
AI-powered research agent for discovering drug repurposing opportunities.
|
| 584 |
+
|
| 585 |
+
## Features
|
| 586 |
+
- π Search PubMed and web sources
|
| 587 |
+
- π§ AI-powered evidence assessment
|
| 588 |
+
- π Structured research reports
|
| 589 |
+
- π¬ Interactive chat interface
|
| 590 |
+
|
| 591 |
+
## Usage
|
| 592 |
+
Enter a research question about drug repurposing, such as:
|
| 593 |
+
- "What existing drugs might help treat long COVID fatigue?"
|
| 594 |
+
- "Can metformin be repurposed for Alzheimer's disease?"
|
| 595 |
+
|
| 596 |
+
The agent will search medical literature, assess evidence quality,
|
| 597 |
+
and generate a research report with citations.
|
| 598 |
+
|
| 599 |
+
## API Keys
|
| 600 |
+
This space requires an OpenAI API key set as a secret (`OPENAI_API_KEY`).
|
| 601 |
+
```
|
| 602 |
+
|
| 603 |
+
### `.env.example` (Updated)
|
| 604 |
+
|
| 605 |
+
```bash
|
| 606 |
+
# LLM Provider - REQUIRED
|
| 607 |
+
# Choose one:
|
| 608 |
+
OPENAI_API_KEY=sk-your-key-here
|
| 609 |
+
# ANTHROPIC_API_KEY=sk-ant-your-key-here
|
| 610 |
+
|
| 611 |
+
# LLM Settings
|
| 612 |
+
LLM_PROVIDER=openai
|
| 613 |
+
LLM_MODEL=gpt-4o-mini
|
| 614 |
+
|
| 615 |
+
# Agent Configuration
|
| 616 |
+
MAX_ITERATIONS=10
|
| 617 |
+
|
| 618 |
+
# Logging
|
| 619 |
+
LOG_LEVEL=INFO
|
| 620 |
+
|
| 621 |
+
# Optional: NCBI API key for faster PubMed searches
|
| 622 |
+
# NCBI_API_KEY=your-ncbi-key
|
| 623 |
```
|
| 624 |
|
| 625 |
---
|
| 626 |
|
| 627 |
+
## 6. TDD Workflow
|
| 628 |
|
| 629 |
+
### Test File: `tests/unit/features/orchestrator/test_orchestrator.py`
|
| 630 |
|
| 631 |
```python
|
| 632 |
+
"""Unit tests for the Orchestrator."""
|
| 633 |
+
import pytest
|
| 634 |
+
from unittest.mock import AsyncMock, MagicMock
|
| 635 |
+
|
| 636 |
+
|
| 637 |
+
class TestOrchestratorModels:
|
| 638 |
+
"""Tests for Orchestrator data models."""
|
| 639 |
+
|
| 640 |
+
def test_agent_event_display(self):
|
| 641 |
+
"""AgentEvent.to_display should format correctly."""
|
| 642 |
+
from src.features.orchestrator.models import AgentEvent, AgentState
|
| 643 |
+
|
| 644 |
+
event = AgentEvent(
|
| 645 |
+
state=AgentState.SEARCHING,
|
| 646 |
+
message="Looking for evidence",
|
| 647 |
+
iteration=1,
|
| 648 |
+
)
|
| 649 |
+
|
| 650 |
+
display = event.to_display()
|
| 651 |
+
assert "π" in display
|
| 652 |
+
assert "SEARCHING" in display
|
| 653 |
+
assert "Looking for evidence" in display
|
| 654 |
+
|
| 655 |
+
def test_orchestrator_config_defaults(self):
|
| 656 |
+
"""OrchestratorConfig should have sensible defaults."""
|
| 657 |
+
from src.features.orchestrator.models import OrchestratorConfig
|
| 658 |
+
|
| 659 |
+
config = OrchestratorConfig()
|
| 660 |
+
assert config.max_iterations == 10
|
| 661 |
+
assert config.max_llm_calls == 20
|
| 662 |
+
|
| 663 |
+
def test_orchestrator_config_bounds(self):
|
| 664 |
+
"""OrchestratorConfig should enforce bounds."""
|
| 665 |
+
from src.features.orchestrator.models import OrchestratorConfig
|
| 666 |
+
from pydantic import ValidationError
|
| 667 |
+
|
| 668 |
+
with pytest.raises(ValidationError):
|
| 669 |
+
OrchestratorConfig(max_iterations=100) # > 50
|
| 670 |
+
|
| 671 |
+
|
| 672 |
+
class TestOrchestrator:
|
| 673 |
+
"""Tests for the Orchestrator."""
|
| 674 |
|
| 675 |
+
@pytest.mark.asyncio
|
| 676 |
+
async def test_run_yields_events(self, mocker):
|
| 677 |
+
"""Orchestrator.run should yield AgentEvents."""
|
| 678 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 679 |
+
from src.features.orchestrator.models import (
|
| 680 |
+
AgentEvent,
|
| 681 |
+
AgentState,
|
| 682 |
+
OrchestratorConfig,
|
| 683 |
+
)
|
| 684 |
+
from src.features.search.models import Evidence, Citation, SearchResult
|
| 685 |
+
from src.features.judge.models import JudgeAssessment
|
| 686 |
|
| 687 |
+
# Mock search handler
|
| 688 |
+
mock_search = AsyncMock()
|
| 689 |
+
mock_search.execute = AsyncMock(return_value=SearchResult(
|
| 690 |
+
query="test",
|
| 691 |
+
evidence=[
|
| 692 |
+
Evidence(
|
| 693 |
+
content="Test evidence",
|
| 694 |
+
citation=Citation(
|
| 695 |
+
source="pubmed",
|
| 696 |
+
title="Test",
|
| 697 |
+
url="https://example.com",
|
| 698 |
+
date="2024",
|
| 699 |
+
),
|
| 700 |
+
)
|
| 701 |
+
],
|
| 702 |
+
sources_searched=["pubmed"],
|
| 703 |
+
total_found=1,
|
| 704 |
+
))
|
| 705 |
+
|
| 706 |
+
# Mock judge handler - returns sufficient on first call
|
| 707 |
+
mock_judge = AsyncMock()
|
| 708 |
+
mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
|
| 709 |
+
sufficient=True,
|
| 710 |
+
recommendation="synthesize",
|
| 711 |
+
reasoning="Good evidence",
|
| 712 |
+
overall_quality_score=8,
|
| 713 |
+
coverage_score=7,
|
| 714 |
+
))
|
| 715 |
+
|
| 716 |
+
config = OrchestratorConfig(max_iterations=3)
|
| 717 |
+
orchestrator = Orchestrator(
|
| 718 |
+
config=config,
|
| 719 |
+
search_handler=mock_search,
|
| 720 |
+
judge_handler=mock_judge,
|
| 721 |
+
)
|
| 722 |
+
|
| 723 |
+
events = []
|
| 724 |
+
async for event in orchestrator.run("test question"):
|
| 725 |
+
events.append(event)
|
| 726 |
+
|
| 727 |
+
# Should have multiple events
|
| 728 |
+
assert len(events) >= 3
|
| 729 |
+
|
| 730 |
+
# Check we got expected state transitions
|
| 731 |
+
states = [e.state for e in events]
|
| 732 |
+
assert AgentState.SEARCHING in states
|
| 733 |
+
assert AgentState.JUDGING in states
|
| 734 |
+
assert AgentState.COMPLETE in states
|
| 735 |
+
|
| 736 |
+
@pytest.mark.asyncio
|
| 737 |
+
async def test_run_respects_max_iterations(self, mocker):
|
| 738 |
+
"""Orchestrator should stop at max_iterations."""
|
| 739 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 740 |
+
from src.features.orchestrator.models import OrchestratorConfig
|
| 741 |
+
from src.features.search.models import Evidence, Citation, SearchResult
|
| 742 |
+
from src.features.judge.models import JudgeAssessment
|
| 743 |
+
|
| 744 |
+
# Mock search
|
| 745 |
+
mock_search = AsyncMock()
|
| 746 |
+
mock_search.execute = AsyncMock(return_value=SearchResult(
|
| 747 |
+
query="test",
|
| 748 |
+
evidence=[],
|
| 749 |
+
sources_searched=["pubmed"],
|
| 750 |
+
total_found=0,
|
| 751 |
+
))
|
| 752 |
+
|
| 753 |
+
# Mock judge - always returns insufficient
|
| 754 |
+
mock_judge = AsyncMock()
|
| 755 |
+
mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
|
| 756 |
+
sufficient=False,
|
| 757 |
+
recommendation="continue",
|
| 758 |
+
reasoning="Need more",
|
| 759 |
+
overall_quality_score=2,
|
| 760 |
+
coverage_score=1,
|
| 761 |
+
next_search_queries=["more stuff"],
|
| 762 |
+
))
|
| 763 |
+
|
| 764 |
+
config = OrchestratorConfig(max_iterations=2)
|
| 765 |
+
orchestrator = Orchestrator(
|
| 766 |
+
config=config,
|
| 767 |
+
search_handler=mock_search,
|
| 768 |
+
judge_handler=mock_judge,
|
| 769 |
+
)
|
| 770 |
+
|
| 771 |
+
events = []
|
| 772 |
+
async for event in orchestrator.run("test"):
|
| 773 |
+
events.append(event)
|
| 774 |
+
|
| 775 |
+
# Should stop after max_iterations
|
| 776 |
+
max_iteration = max(e.iteration for e in events)
|
| 777 |
+
assert max_iteration <= 2
|
| 778 |
+
|
| 779 |
+
@pytest.mark.asyncio
|
| 780 |
+
async def test_run_handles_search_error(self, mocker):
|
| 781 |
+
"""Orchestrator should handle search errors gracefully."""
|
| 782 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 783 |
+
from src.features.orchestrator.models import AgentState, OrchestratorConfig
|
| 784 |
+
from src.shared.exceptions import SearchError
|
| 785 |
+
|
| 786 |
+
mock_search = AsyncMock()
|
| 787 |
+
mock_search.execute = AsyncMock(side_effect=SearchError("API down"))
|
| 788 |
+
|
| 789 |
+
mock_judge = AsyncMock()
|
| 790 |
+
|
| 791 |
+
orchestrator = Orchestrator(
|
| 792 |
+
config=OrchestratorConfig(max_iterations=1),
|
| 793 |
+
search_handler=mock_search,
|
| 794 |
+
judge_handler=mock_judge,
|
| 795 |
+
)
|
| 796 |
+
|
| 797 |
+
events = []
|
| 798 |
+
async for event in orchestrator.run("test"):
|
| 799 |
+
events.append(event)
|
| 800 |
+
|
| 801 |
+
# Should have an error event
|
| 802 |
+
error_events = [e for e in events if e.state == AgentState.ERROR]
|
| 803 |
+
assert len(error_events) >= 1
|
| 804 |
+
|
| 805 |
+
@pytest.mark.asyncio
|
| 806 |
+
async def test_run_respects_llm_budget(self, mocker):
|
| 807 |
+
"""Orchestrator should stop when LLM budget is exceeded."""
|
| 808 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 809 |
+
from src.features.orchestrator.models import AgentState, OrchestratorConfig
|
| 810 |
+
from src.features.search.models import SearchResult
|
| 811 |
+
from src.features.judge.models import JudgeAssessment
|
| 812 |
+
|
| 813 |
+
mock_search = AsyncMock()
|
| 814 |
+
mock_search.execute = AsyncMock(return_value=SearchResult(
|
| 815 |
+
query="test",
|
| 816 |
+
evidence=[],
|
| 817 |
+
sources_searched=[],
|
| 818 |
+
total_found=0,
|
| 819 |
+
))
|
| 820 |
+
|
| 821 |
+
# Judge always needs more
|
| 822 |
+
mock_judge = AsyncMock()
|
| 823 |
+
mock_judge.assess = AsyncMock(return_value=JudgeAssessment(
|
| 824 |
+
sufficient=False,
|
| 825 |
+
recommendation="continue",
|
| 826 |
+
reasoning="Need more",
|
| 827 |
+
overall_quality_score=2,
|
| 828 |
+
coverage_score=1,
|
| 829 |
+
next_search_queries=["more"],
|
| 830 |
+
))
|
| 831 |
+
|
| 832 |
+
config = OrchestratorConfig(
|
| 833 |
+
max_iterations=100, # High
|
| 834 |
+
max_llm_calls=2, # Low - should hit this first
|
| 835 |
+
)
|
| 836 |
+
orchestrator = Orchestrator(
|
| 837 |
+
config=config,
|
| 838 |
+
search_handler=mock_search,
|
| 839 |
+
judge_handler=mock_judge,
|
| 840 |
+
)
|
| 841 |
+
|
| 842 |
+
events = []
|
| 843 |
+
async for event in orchestrator.run("test"):
|
| 844 |
+
events.append(event)
|
| 845 |
+
|
| 846 |
+
# Should have stopped due to budget
|
| 847 |
+
error_events = [e for e in events if "budget" in e.message.lower()]
|
| 848 |
+
assert len(error_events) >= 1
|
| 849 |
```
|
| 850 |
|
| 851 |
---
|
| 852 |
|
| 853 |
+
## 7. Module Exports (`src/features/orchestrator/__init__.py`)
|
| 854 |
|
| 855 |
+
```python
|
| 856 |
+
"""Orchestrator feature - main agent loop."""
|
| 857 |
+
from .models import AgentEvent, AgentState, OrchestratorConfig, SessionState
|
| 858 |
+
from .handlers import Orchestrator
|
| 859 |
+
|
| 860 |
+
__all__ = [
|
| 861 |
+
"AgentEvent",
|
| 862 |
+
"AgentState",
|
| 863 |
+
"OrchestratorConfig",
|
| 864 |
+
"SessionState",
|
| 865 |
+
"Orchestrator",
|
| 866 |
+
]
|
| 867 |
+
```
|
| 868 |
+
|
| 869 |
+
---
|
| 870 |
+
|
| 871 |
+
## 8. Implementation Checklist
|
| 872 |
+
|
| 873 |
+
- [ ] Create `src/features/orchestrator/models.py` with all models
|
| 874 |
+
- [ ] Create `src/features/orchestrator/handlers.py` with `Orchestrator`
|
| 875 |
+
- [ ] Create `src/features/orchestrator/__init__.py` with exports
|
| 876 |
+
- [ ] Create `src/app.py` with Gradio UI
|
| 877 |
+
- [ ] Create `Dockerfile`
|
| 878 |
+
- [ ] Create/update root `README.md` for HuggingFace
|
| 879 |
+
- [ ] Write tests in `tests/unit/features/orchestrator/test_orchestrator.py`
|
| 880 |
+
- [ ] Run `uv run pytest tests/unit/features/orchestrator/ -v` β **ALL TESTS MUST PASS**
|
| 881 |
+
- [ ] Run `uv run python src/app.py` locally and test the UI
|
| 882 |
+
- [ ] Commit: `git commit -m "feat: phase 4 orchestrator and UI complete"`
|
| 883 |
+
|
| 884 |
+
---
|
| 885 |
+
|
| 886 |
+
## 9. Definition of Done
|
| 887 |
+
|
| 888 |
+
Phase 4 is **COMPLETE** when:
|
| 889 |
+
|
| 890 |
+
1. β
All unit tests pass
|
| 891 |
+
2. β
`uv run python src/app.py` launches Gradio UI locally
|
| 892 |
+
3. β
Can submit a question and see streaming events
|
| 893 |
+
4. β
Agent completes and generates a report
|
| 894 |
+
5. β
Dockerfile builds successfully
|
| 895 |
+
6. β
Can test full flow:
|
| 896 |
|
| 897 |
```python
|
| 898 |
+
import asyncio
|
| 899 |
+
from src.features.orchestrator.handlers import Orchestrator
|
| 900 |
+
|
| 901 |
+
async def test():
|
| 902 |
+
orchestrator = Orchestrator()
|
| 903 |
+
async for event in orchestrator.run("Can metformin treat Alzheimer's?"):
|
| 904 |
+
print(event.to_display())
|
| 905 |
+
|
| 906 |
+
asyncio.run(test())
|
| 907 |
```
|
| 908 |
|
| 909 |
+
---
|
| 910 |
+
|
| 911 |
+
## 10. Deployment to HuggingFace Spaces
|
| 912 |
+
|
| 913 |
+
### Option A: Via GitHub (Recommended)
|
| 914 |
+
|
| 915 |
+
1. Push your code to GitHub
|
| 916 |
+
2. Create a new Space on HuggingFace
|
| 917 |
+
3. Connect your GitHub repo
|
| 918 |
+
4. Add secrets: `OPENAI_API_KEY`
|
| 919 |
+
5. Deploy!
|
| 920 |
+
|
| 921 |
+
### Option B: Manual Upload
|
| 922 |
+
|
| 923 |
+
1. Create a new Gradio Space on HuggingFace
|
| 924 |
+
2. Upload all files from `src/` and root configs
|
| 925 |
+
3. Add secrets in Space settings
|
| 926 |
+
4. Wait for build
|
| 927 |
+
|
| 928 |
+
### Verify Deployment
|
| 929 |
+
|
| 930 |
+
1. Visit your Space URL
|
| 931 |
+
2. Ask: "What drugs could treat long COVID?"
|
| 932 |
+
3. Verify streaming events appear
|
| 933 |
+
4. Verify final report is generated
|
| 934 |
|
| 935 |
---
|
| 936 |
|
| 937 |
+
**π Congratulations! Phase 4 is the MVP.**
|
| 938 |
+
|
| 939 |
+
After completing Phase 4, you have a working drug repurposing research agent
|
| 940 |
+
that can be demonstrated at the hackathon.
|
| 941 |
|
| 942 |
+
**Optional Phase 5**: Improve the report synthesis with a dedicated Report agent.
|
|
|
|
|
|
|
|
|
docs/implementation/roadmap.md
CHANGED
|
@@ -4,6 +4,8 @@
|
|
| 4 |
|
| 5 |
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
|
| 6 |
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
## π οΈ The 2025 "Gucci" Tooling Stack
|
|
@@ -19,76 +21,212 @@ We are using the bleeding edge of Python engineering to ensure speed, safety, an
|
|
| 19 |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
|
| 20 |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
|
| 21 |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
|
|
|
|
|
|
|
| 22 |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
|
|
|
|
|
|
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## ποΈ Architecture: Vertical Slices
|
| 27 |
|
| 28 |
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
|
| 29 |
-
Each slice implements a feature from **Entry Point (UI/API)
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
```
|
| 34 |
-
|
| 35 |
-
βββ
|
| 36 |
-
βββ
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
```
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
## π Phased Execution Plan
|
| 56 |
|
| 57 |
-
### **Phase 1: Foundation & Tooling (
|
|
|
|
| 58 |
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
|
| 59 |
-
- [ ] Initialize `pyproject.toml` with `uv`.
|
| 60 |
-
- [ ] Configure `ruff` (strict) and `mypy` (strict).
|
| 61 |
-
- [ ] Set up `pytest` with sugar and coverage.
|
| 62 |
-
- [ ] Implement `shared/config.py` (Configuration Slice).
|
| 63 |
-
- **Deliverable**: A repo that passes CI with `uv run pytest`.
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
|
| 67 |
-
- [ ] **TDD**: Write test for `SearchHandler`.
|
| 68 |
-
- [ ] Implement `features/search/tools.py` (PubMed + DuckDuckGo).
|
| 69 |
-
- [ ] Implement `features/search/handlers.py` (Orchestrates tools).
|
| 70 |
-
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
*Goal: Agent can decide if evidence is sufficient.*
|
| 74 |
-
- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
|
| 75 |
-
- [ ] Implement `features/judge/prompts.py` (Structured outputs).
|
| 76 |
-
- [ ] Implement `features/judge/handlers.py` (LLM interaction).
|
| 77 |
-
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
|
| 78 |
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
*Goal: End-to-End User Value.*
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
---
|
| 86 |
|
| 87 |
-
##
|
| 88 |
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
*Start by reading Phase 1 Spec to initialize the repo.*
|
|
|
|
| 4 |
|
| 5 |
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
|
| 6 |
|
| 7 |
+
**Total Estimated Effort**: 12-16 hours (can be done in 4 days)
|
| 8 |
+
|
| 9 |
---
|
| 10 |
|
| 11 |
## π οΈ The 2025 "Gucci" Tooling Stack
|
|
|
|
| 21 |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
|
| 22 |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
|
| 23 |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
|
| 24 |
+
| **Test Plugins** | **`pytest-mock`** | Easy mocking with `mocker` fixture. |
|
| 25 |
+
| **HTTP Mocking** | **`respx`** | Mock `httpx` requests in tests. |
|
| 26 |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
|
| 27 |
+
| **Retry Logic** | **`tenacity`** | Exponential backoff for API calls. |
|
| 28 |
+
| **Logging** | **`structlog`** | Structured JSON logging. |
|
| 29 |
|
| 30 |
---
|
| 31 |
|
| 32 |
## ποΈ Architecture: Vertical Slices
|
| 33 |
|
| 34 |
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
|
| 35 |
+
Each slice implements a feature from **Entry Point (UI/API) β Logic β Data/External**.
|
| 36 |
+
|
| 37 |
+
### Directory Structure (Maintainer's Template + Our Code)
|
| 38 |
|
| 39 |
+
We use the **existing scaffolding** from the maintainer, filling in the empty files.
|
| 40 |
|
| 41 |
```
|
| 42 |
+
deepcritical/
|
| 43 |
+
βββ pyproject.toml # All config in one file
|
| 44 |
+
βββ .env.example # Environment template
|
| 45 |
+
βββ .pre-commit-config.yaml # Git hooks
|
| 46 |
+
βββ Dockerfile # Container build
|
| 47 |
+
βββ README.md # HuggingFace Space config
|
| 48 |
+
β
|
| 49 |
+
βββ src/
|
| 50 |
+
β βββ app.py # Gradio entry point
|
| 51 |
+
β βββ orchestrator.py # Main agent loop (SearchβJudgeβSynthesize)
|
| 52 |
+
β β
|
| 53 |
+
β βββ agent_factory/ # Agent definitions
|
| 54 |
+
β β βββ __init__.py
|
| 55 |
+
β β βββ agents.py # (Reserved for future agents)
|
| 56 |
+
β β βββ judges.py # JudgeHandler - LLM evidence assessment
|
| 57 |
+
β β
|
| 58 |
+
β βββ tools/ # Search tools
|
| 59 |
+
β β βββ __init__.py
|
| 60 |
+
β β βββ pubmed.py # PubMedTool - NCBI E-utilities
|
| 61 |
+
β β βββ websearch.py # WebTool - DuckDuckGo
|
| 62 |
+
β β βββ search_handler.py # SearchHandler - orchestrates tools
|
| 63 |
+
β β
|
| 64 |
+
β βββ prompts/ # Prompt templates
|
| 65 |
+
β β βββ __init__.py
|
| 66 |
+
β β βββ judge.py # Judge system/user prompts
|
| 67 |
+
β β
|
| 68 |
+
β βββ utils/ # Shared utilities
|
| 69 |
+
β β βββ __init__.py
|
| 70 |
+
β β βββ config.py # Settings via pydantic-settings
|
| 71 |
+
β β βββ exceptions.py # Custom exceptions
|
| 72 |
+
β β βββ models.py # ALL Pydantic models (Evidence, JudgeAssessment, etc.)
|
| 73 |
+
β β
|
| 74 |
+
β βββ middleware/ # (Empty - reserved)
|
| 75 |
+
β βββ database_services/ # (Empty - reserved)
|
| 76 |
+
β βββ retrieval_factory/ # (Empty - reserved)
|
| 77 |
+
β
|
| 78 |
+
βββ tests/
|
| 79 |
+
βββ __init__.py
|
| 80 |
+
βββ conftest.py # Shared fixtures
|
| 81 |
+
β
|
| 82 |
+
βββ unit/ # Fast, mocked tests
|
| 83 |
+
β βββ __init__.py
|
| 84 |
+
β βββ utils/ # Config, models tests
|
| 85 |
+
β βββ tools/ # PubMed, WebSearch tests
|
| 86 |
+
β βββ agent_factory/ # Judge tests
|
| 87 |
+
β
|
| 88 |
+
βββ integration/ # Real API tests (optional)
|
| 89 |
+
βββ __init__.py
|
| 90 |
```
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
## π Phased Execution Plan
|
| 95 |
|
| 96 |
+
### **Phase 1: Foundation & Tooling (~2-3 hours)**
|
| 97 |
+
|
| 98 |
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
| Task | Output |
|
| 101 |
+
|------|--------|
|
| 102 |
+
| Install uv | `uv --version` works |
|
| 103 |
+
| Create pyproject.toml | All deps + config in one file |
|
| 104 |
+
| Set up directory structure | All `__init__.py` files created |
|
| 105 |
+
| Configure ruff + mypy | Strict settings |
|
| 106 |
+
| Create conftest.py | Shared pytest fixtures |
|
| 107 |
+
| Implement shared/config.py | Settings via pydantic-settings |
|
| 108 |
+
| Write first test | `test_config.py` passes |
|
| 109 |
+
|
| 110 |
+
**Deliverable**: `uv run pytest` passes with green output.
|
| 111 |
+
|
| 112 |
+
π **Spec Document**: [01_phase_foundation.md](01_phase_foundation.md)
|
| 113 |
+
|
| 114 |
+
---
|
| 115 |
+
|
| 116 |
+
### **Phase 2: The "Search" Vertical Slice (~3-4 hours)**
|
| 117 |
+
|
| 118 |
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
+
| Task | Output |
|
| 121 |
+
|------|--------|
|
| 122 |
+
| Define Evidence/Citation models | Pydantic models |
|
| 123 |
+
| Implement PubMedTool | ESearch β EFetch β Evidence |
|
| 124 |
+
| Implement WebTool | DuckDuckGo β Evidence |
|
| 125 |
+
| Implement SearchHandler | Parallel search orchestration |
|
| 126 |
+
| Write unit tests | Mocked HTTP responses |
|
| 127 |
+
|
| 128 |
+
**Deliverable**: Function that takes "long covid" β returns `List[Evidence]`.
|
| 129 |
+
|
| 130 |
+
π **Spec Document**: [02_phase_search.md](02_phase_search.md)
|
| 131 |
+
|
| 132 |
+
---
|
| 133 |
+
|
| 134 |
+
### **Phase 3: The "Judge" Vertical Slice (~3-4 hours)**
|
| 135 |
+
|
| 136 |
*Goal: Agent can decide if evidence is sufficient.*
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
+
| Task | Output |
|
| 139 |
+
|------|--------|
|
| 140 |
+
| Define JudgeAssessment model | Structured output schema |
|
| 141 |
+
| Write prompt templates | System + user prompts |
|
| 142 |
+
| Implement JudgeHandler | PydanticAI agent with structured output |
|
| 143 |
+
| Write unit tests | Mocked LLM responses |
|
| 144 |
+
|
| 145 |
+
**Deliverable**: Function that takes `List[Evidence]` β returns `JudgeAssessment`.
|
| 146 |
+
|
| 147 |
+
π **Spec Document**: [03_phase_judge.md](03_phase_judge.md)
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
### **Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)**
|
| 152 |
+
|
| 153 |
*Goal: End-to-End User Value.*
|
| 154 |
+
|
| 155 |
+
| Task | Output |
|
| 156 |
+
|------|--------|
|
| 157 |
+
| Define AgentEvent/State models | Event streaming types |
|
| 158 |
+
| Implement Orchestrator | Main while loop connecting SearchβJudge |
|
| 159 |
+
| Implement report synthesis | Generate markdown report |
|
| 160 |
+
| Build Gradio UI | Streaming chat interface |
|
| 161 |
+
| Create Dockerfile | Container for deployment |
|
| 162 |
+
| Create HuggingFace README | Space configuration |
|
| 163 |
+
| Write unit tests | Mocked handlers |
|
| 164 |
+
|
| 165 |
+
**Deliverable**: Working DeepCritical Agent on localhost:7860.
|
| 166 |
+
|
| 167 |
+
π **Spec Document**: [04_phase_ui.md](04_phase_ui.md)
|
| 168 |
+
|
| 169 |
+
---
|
| 170 |
+
|
| 171 |
+
## π Spec Documents Summary
|
| 172 |
+
|
| 173 |
+
| Phase | Document | Focus |
|
| 174 |
+
|-------|----------|-------|
|
| 175 |
+
| 1 | [01_phase_foundation.md](01_phase_foundation.md) | Tooling, config, TDD setup |
|
| 176 |
+
| 2 | [02_phase_search.md](02_phase_search.md) | PubMed + DuckDuckGo search |
|
| 177 |
+
| 3 | [03_phase_judge.md](03_phase_judge.md) | LLM evidence assessment |
|
| 178 |
+
| 4 | [04_phase_ui.md](04_phase_ui.md) | Orchestrator + Gradio + Deploy |
|
| 179 |
|
| 180 |
---
|
| 181 |
|
| 182 |
+
## β‘ Quick Start Commands
|
| 183 |
|
| 184 |
+
```bash
|
| 185 |
+
# Phase 1: Setup
|
| 186 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
| 187 |
+
uv init --name deepcritical
|
| 188 |
+
uv sync --all-extras
|
| 189 |
+
uv run pytest
|
| 190 |
+
|
| 191 |
+
# Phase 2-4: Development
|
| 192 |
+
uv run pytest tests/unit/ -v # Run unit tests
|
| 193 |
+
uv run ruff check src tests # Lint
|
| 194 |
+
uv run mypy src # Type check
|
| 195 |
+
uv run python src/app.py # Run Gradio locally
|
| 196 |
+
|
| 197 |
+
# Deployment
|
| 198 |
+
docker build -t deepcritical .
|
| 199 |
+
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
---
|
| 203 |
+
|
| 204 |
+
## π― Definition of Done (MVP)
|
| 205 |
+
|
| 206 |
+
The MVP is **COMPLETE** when:
|
| 207 |
+
|
| 208 |
+
1. β
All unit tests pass (`uv run pytest`)
|
| 209 |
+
2. β
Ruff has no errors (`uv run ruff check`)
|
| 210 |
+
3. β
Mypy has no errors (`uv run mypy src`)
|
| 211 |
+
4. β
Gradio UI runs locally (`uv run python src/app.py`)
|
| 212 |
+
5. β
Can ask "Can metformin treat Alzheimer's?" and get a report
|
| 213 |
+
6. β
Report includes drug candidates, citations, and quality scores
|
| 214 |
+
7. β
Docker builds successfully
|
| 215 |
+
8. β
Deployable to HuggingFace Spaces
|
| 216 |
+
|
| 217 |
+
---
|
| 218 |
+
|
| 219 |
+
## π Progress Tracker
|
| 220 |
+
|
| 221 |
+
| Phase | Status | Tests | Notes |
|
| 222 |
+
|-------|--------|-------|-------|
|
| 223 |
+
| 1: Foundation | β¬ Pending | 0/5 | Start here |
|
| 224 |
+
| 2: Search | β¬ Pending | 0/6 | Depends on Phase 1 |
|
| 225 |
+
| 3: Judge | β¬ Pending | 0/5 | Depends on Phase 2 |
|
| 226 |
+
| 4: Orchestrator | β¬ Pending | 0/4 | Depends on Phase 3 |
|
| 227 |
+
|
| 228 |
+
Update this table as you complete each phase!
|
| 229 |
+
|
| 230 |
+
---
|
| 231 |
|
| 232 |
+
*Start by reading [Phase 1 Spec](01_phase_foundation.md) to initialize the repo.*
|
docs/index.md
CHANGED
|
@@ -12,6 +12,13 @@ AI-powered deep research system for accelerating drug repurposing discovery.
|
|
| 12 |
- **[Overview](architecture/overview.md)** - Project overview, use case, architecture, timeline
|
| 13 |
- **[Design Patterns](architecture/design-patterns.md)** - 17 technical patterns, reference repos, judge prompts, data models
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
### Guides
|
| 16 |
- [Setup Guide](guides/setup.md) (coming soon)
|
| 17 |
- **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps
|
|
|
|
| 12 |
- **[Overview](architecture/overview.md)** - Project overview, use case, architecture, timeline
|
| 13 |
- **[Design Patterns](architecture/design-patterns.md)** - 17 technical patterns, reference repos, judge prompts, data models
|
| 14 |
|
| 15 |
+
### Implementation (Start Here!)
|
| 16 |
+
- **[Roadmap](implementation/roadmap.md)** - Phased execution plan with TDD
|
| 17 |
+
- **[Phase 1: Foundation](implementation/01_phase_foundation.md)** - Tooling, config, first tests
|
| 18 |
+
- **[Phase 2: Search](implementation/02_phase_search.md)** - PubMed + DuckDuckGo
|
| 19 |
+
- **[Phase 3: Judge](implementation/03_phase_judge.md)** - LLM evidence assessment
|
| 20 |
+
- **[Phase 4: UI](implementation/04_phase_ui.md)** - Orchestrator + Gradio + Deploy
|
| 21 |
+
|
| 22 |
### Guides
|
| 23 |
- [Setup Guide](guides/setup.md) (coming soon)
|
| 24 |
- **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps
|