Spaces:
Running
Running
Commit
Β·
5e7604a
1
Parent(s):
521d97d
docs: add AI agent context files for team collaboration
Browse filesAdded AGENTS.md, CLAUDE.md, GEMINI.md - AI-assisted scoping documents
that help AI coding tools understand the project architecture,
coding standards, and development workflow.
Requested by
@Tonic
for team acceleration.
AGENTS.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AGENTS.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to AI agents when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
|
| 8 |
+
|
| 9 |
+
## Development Commands
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
# Install all dependencies (including dev)
|
| 13 |
+
make install # or: uv sync --all-extras && uv run pre-commit install
|
| 14 |
+
|
| 15 |
+
# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
|
| 16 |
+
make check
|
| 17 |
+
|
| 18 |
+
# Individual commands
|
| 19 |
+
make test # uv run pytest tests/unit/ -v
|
| 20 |
+
make lint # uv run ruff check src tests
|
| 21 |
+
make format # uv run ruff format src tests
|
| 22 |
+
make typecheck # uv run mypy src
|
| 23 |
+
make test-cov # uv run pytest --cov=src --cov-report=term-missing
|
| 24 |
+
|
| 25 |
+
# Run single test
|
| 26 |
+
uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
|
| 27 |
+
|
| 28 |
+
# Integration tests (real APIs)
|
| 29 |
+
uv run pytest -m integration
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## Architecture
|
| 33 |
+
|
| 34 |
+
**Pattern**: Search-and-judge loop with multi-tool orchestration.
|
| 35 |
+
|
| 36 |
+
```
|
| 37 |
+
User Question β Orchestrator
|
| 38 |
+
β
|
| 39 |
+
Search Loop:
|
| 40 |
+
1. Query PubMed
|
| 41 |
+
2. Gather evidence
|
| 42 |
+
3. Judge quality ("Do we have enough?")
|
| 43 |
+
4. If NO β Refine query, search more
|
| 44 |
+
5. If YES β Synthesize findings
|
| 45 |
+
β
|
| 46 |
+
Research Report with Citations
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
**Key Components**:
|
| 50 |
+
- `src/orchestrator.py` - Main agent loop
|
| 51 |
+
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 52 |
+
- `src/tools/search_handler.py` - Scatter-gather orchestration
|
| 53 |
+
- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
|
| 54 |
+
- `src/agent_factory/judges.py` - LLM-based evidence assessment
|
| 55 |
+
- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
|
| 56 |
+
- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
|
| 57 |
+
- `src/utils/models.py` - Evidence, Citation, SearchResult models
|
| 58 |
+
- `src/utils/exceptions.py` - Exception hierarchy
|
| 59 |
+
- `src/app.py` - Gradio UI (HuggingFace Spaces)
|
| 60 |
+
|
| 61 |
+
**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
|
| 62 |
+
|
| 63 |
+
## Configuration
|
| 64 |
+
|
| 65 |
+
Settings via pydantic-settings from `.env`:
|
| 66 |
+
- `LLM_PROVIDER`: "openai" or "anthropic"
|
| 67 |
+
- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
|
| 68 |
+
- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
|
| 69 |
+
- `MAX_ITERATIONS`: 1-50, default 10
|
| 70 |
+
- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
|
| 71 |
+
|
| 72 |
+
## Exception Hierarchy
|
| 73 |
+
|
| 74 |
+
```
|
| 75 |
+
DeepCriticalError (base)
|
| 76 |
+
βββ SearchError
|
| 77 |
+
β βββ RateLimitError
|
| 78 |
+
βββ JudgeError
|
| 79 |
+
βββ ConfigurationError
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Testing
|
| 83 |
+
|
| 84 |
+
- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
|
| 85 |
+
- **Markers**: `unit`, `integration`, `slow`
|
| 86 |
+
- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
|
| 87 |
+
- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
|
| 88 |
+
|
| 89 |
+
## Coding Standards
|
| 90 |
+
|
| 91 |
+
- Python 3.11+, strict mypy, ruff (100-char lines)
|
| 92 |
+
- Type all functions, use Pydantic models for data
|
| 93 |
+
- Use `structlog` for logging, not print
|
| 94 |
+
- Conventional commits: `feat(scope):`, `fix:`, `docs:`
|
| 95 |
+
|
| 96 |
+
## Git Workflow
|
| 97 |
+
|
| 98 |
+
- `main`: Production-ready
|
| 99 |
+
- `dev`: Development
|
| 100 |
+
- `vcms-dev`: HuggingFace Spaces sandbox
|
| 101 |
+
- Remote `origin`: GitHub
|
| 102 |
+
- Remote `huggingface-upstream`: HuggingFace Spaces
|
CLAUDE.md
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
|
| 8 |
+
|
| 9 |
+
## Development Commands
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
# Install all dependencies (including dev)
|
| 13 |
+
make install # or: uv sync --all-extras && uv run pre-commit install
|
| 14 |
+
|
| 15 |
+
# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
|
| 16 |
+
make check
|
| 17 |
+
|
| 18 |
+
# Individual commands
|
| 19 |
+
make test # uv run pytest tests/unit/ -v
|
| 20 |
+
make lint # uv run ruff check src tests
|
| 21 |
+
make format # uv run ruff format src tests
|
| 22 |
+
make typecheck # uv run mypy src
|
| 23 |
+
make test-cov # uv run pytest --cov=src --cov-report=term-missing
|
| 24 |
+
|
| 25 |
+
# Run single test
|
| 26 |
+
uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
|
| 27 |
+
|
| 28 |
+
# Integration tests (real APIs)
|
| 29 |
+
uv run pytest -m integration
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## Architecture
|
| 33 |
+
|
| 34 |
+
**Pattern**: Search-and-judge loop with multi-tool orchestration.
|
| 35 |
+
|
| 36 |
+
```
|
| 37 |
+
User Question β Orchestrator
|
| 38 |
+
β
|
| 39 |
+
Search Loop:
|
| 40 |
+
1. Query PubMed
|
| 41 |
+
2. Gather evidence
|
| 42 |
+
3. Judge quality ("Do we have enough?")
|
| 43 |
+
4. If NO β Refine query, search more
|
| 44 |
+
5. If YES β Synthesize findings
|
| 45 |
+
β
|
| 46 |
+
Research Report with Citations
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
**Key Components**:
|
| 50 |
+
- `src/orchestrator.py` - Main agent loop
|
| 51 |
+
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 52 |
+
- `src/tools/search_handler.py` - Scatter-gather orchestration
|
| 53 |
+
- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
|
| 54 |
+
- `src/agent_factory/judges.py` - LLM-based evidence assessment
|
| 55 |
+
- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
|
| 56 |
+
- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
|
| 57 |
+
- `src/utils/models.py` - Evidence, Citation, SearchResult models
|
| 58 |
+
- `src/utils/exceptions.py` - Exception hierarchy
|
| 59 |
+
- `src/app.py` - Gradio UI (HuggingFace Spaces)
|
| 60 |
+
|
| 61 |
+
**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
|
| 62 |
+
|
| 63 |
+
## Configuration
|
| 64 |
+
|
| 65 |
+
Settings via pydantic-settings from `.env`:
|
| 66 |
+
- `LLM_PROVIDER`: "openai" or "anthropic"
|
| 67 |
+
- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
|
| 68 |
+
- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
|
| 69 |
+
- `MAX_ITERATIONS`: 1-50, default 10
|
| 70 |
+
- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
|
| 71 |
+
|
| 72 |
+
## Exception Hierarchy
|
| 73 |
+
|
| 74 |
+
```
|
| 75 |
+
DeepCriticalError (base)
|
| 76 |
+
βββ SearchError
|
| 77 |
+
β βββ RateLimitError
|
| 78 |
+
βββ JudgeError
|
| 79 |
+
βββ ConfigurationError
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Testing
|
| 83 |
+
|
| 84 |
+
- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
|
| 85 |
+
- **Markers**: `unit`, `integration`, `slow`
|
| 86 |
+
- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
|
| 87 |
+
- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
|
| 88 |
+
|
| 89 |
+
## Git Workflow
|
| 90 |
+
|
| 91 |
+
- `main`: Production-ready
|
| 92 |
+
- `dev`: Development
|
| 93 |
+
- `vcms-dev`: HuggingFace Spaces sandbox
|
| 94 |
+
- Remote `origin`: GitHub
|
| 95 |
+
- Remote `huggingface-upstream`: HuggingFace Spaces
|
GEMINI.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DeepCritical Context
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
**DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
|
| 5 |
+
**Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed), evaluating evidence, and hypothesizing potential applications.
|
| 6 |
+
|
| 7 |
+
**Architecture:**
|
| 8 |
+
The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
|
| 9 |
+
|
| 10 |
+
**Current Status:**
|
| 11 |
+
- **Phases 1-8:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report.
|
| 12 |
+
- **Phase 9 (Source Cleanup):** COMPLETE. Removed DuckDuckGo web search (unreliable for scientific research).
|
| 13 |
+
- **Phase 10-11:** PLANNED. ClinicalTrials.gov and bioRxiv integration.
|
| 14 |
+
|
| 15 |
+
## Tech Stack & Tooling
|
| 16 |
+
- **Language:** Python 3.11 (Pinned)
|
| 17 |
+
- **Package Manager:** `uv` (Rust-based, extremely fast)
|
| 18 |
+
- **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio`
|
| 19 |
+
- **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
|
| 20 |
+
- **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
|
| 21 |
+
- **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
|
| 22 |
+
|
| 23 |
+
## Building & Running
|
| 24 |
+
We use a `Makefile` to standardize developer commands.
|
| 25 |
+
|
| 26 |
+
| Command | Description |
|
| 27 |
+
| :--- | :--- |
|
| 28 |
+
| `make install` | Install dependencies and pre-commit hooks. |
|
| 29 |
+
| `make test` | Run unit tests. |
|
| 30 |
+
| `make lint` | Run Ruff linter. |
|
| 31 |
+
| `make format` | Run Ruff formatter. |
|
| 32 |
+
| `make typecheck` | Run Mypy static type checker. |
|
| 33 |
+
| `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
|
| 34 |
+
| `make clean` | Clean up cache and artifacts. |
|
| 35 |
+
|
| 36 |
+
## Directory Structure
|
| 37 |
+
- `src/`: Source code
|
| 38 |
+
- `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
|
| 39 |
+
- `tools/`: Search tools (`pubmed.py`, `base.py`, `search_handler.py`)
|
| 40 |
+
- `services/`: Services (`embeddings.py` - ChromaDB vector store)
|
| 41 |
+
- `agents/`: Magentic multi-agent mode agents
|
| 42 |
+
- `agent_factory/`: Agent definitions (judges, prompts)
|
| 43 |
+
- `tests/`: Test suite
|
| 44 |
+
- `unit/`: Isolated unit tests (Mocked)
|
| 45 |
+
- `integration/`: Real API tests (Marked as slow/integration)
|
| 46 |
+
- `docs/`: Documentation and Implementation Specs
|
| 47 |
+
- `examples/`: Working demos for each phase
|
| 48 |
+
|
| 49 |
+
## Development Conventions
|
| 50 |
+
1. **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
|
| 51 |
+
2. **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
|
| 52 |
+
3. **Linting:** Zero tolerance for Ruff errors.
|
| 53 |
+
4. **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests. Real calls go in `tests/integration`.
|
| 54 |
+
5. **Vertical Slices:** Implement features end-to-end (Search -> Judge -> UI) rather than layer-by-layer.
|