Spaces:

DataQuests
/

DeepCritical

Running

VibecoderMcSwaggins commited on 11 days ago

Commit

5e7604a

1 Parent(s): 521d97d

docs: add AI agent context files for team collaboration

Added AGENTS.md, CLAUDE.md, GEMINI.md - AI-assisted scoping documents
that help AI coding tools understand the project architecture,
coding standards, and development workflow.

Requested by

@Tonic
for team acceleration.

Files changed (3) hide show

AGENTS.md +102 -0
CLAUDE.md +95 -0
GEMINI.md +54 -0

AGENTS.md ADDED Viewed

	@@ -0,0 +1,102 @@

+# AGENTS.md
+This file provides guidance to AI agents when working with code in this repository.
+## Project Overview
+DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
+## Development Commands
+```bash
+# Install all dependencies (including dev)
+make install   # or: uv sync --all-extras && uv run pre-commit install
+# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
+make check
+# Individual commands
+make test        # uv run pytest tests/unit/ -v
+make lint        # uv run ruff check src tests
+make format      # uv run ruff format src tests
+make typecheck   # uv run mypy src
+make test-cov    # uv run pytest --cov=src --cov-report=term-missing
+# Run single test
+uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
+# Integration tests (real APIs)
+uv run pytest -m integration
+```
+## Architecture
+**Pattern**: Search-and-judge loop with multi-tool orchestration.
+```
+User Question → Orchestrator
+    ↓
+Search Loop:
+  1. Query PubMed
+  2. Gather evidence
+  3. Judge quality ("Do we have enough?")
+  4. If NO → Refine query, search more
+  5. If YES → Synthesize findings
+    ↓
+Research Report with Citations
+```
+**Key Components**:
+- `src/orchestrator.py` - Main agent loop
+- `src/tools/pubmed.py` - PubMed E-utilities search
+- `src/tools/search_handler.py` - Scatter-gather orchestration
+- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
+- `src/agent_factory/judges.py` - LLM-based evidence assessment
+- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
+- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
+- `src/utils/models.py` - Evidence, Citation, SearchResult models
+- `src/utils/exceptions.py` - Exception hierarchy
+- `src/app.py` - Gradio UI (HuggingFace Spaces)
+**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
+## Configuration
+Settings via pydantic-settings from `.env`:
+- `LLM_PROVIDER`: "openai" or "anthropic"
+- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
+- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
+- `MAX_ITERATIONS`: 1-50, default 10
+- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
+## Exception Hierarchy
+```
+DeepCriticalError (base)
+├── SearchError
+│   └── RateLimitError
+├── JudgeError
+└── ConfigurationError
+```
+## Testing
+- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
+- **Markers**: `unit`, `integration`, `slow`
+- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
+- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
+## Coding Standards
+- Python 3.11+, strict mypy, ruff (100-char lines)
+- Type all functions, use Pydantic models for data
+- Use `structlog` for logging, not print
+- Conventional commits: `feat(scope):`, `fix:`, `docs:`
+## Git Workflow
+- `main`: Production-ready
+- `dev`: Development
+- `vcms-dev`: HuggingFace Spaces sandbox
+- Remote `origin`: GitHub
+- Remote `huggingface-upstream`: HuggingFace Spaces

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
+## Development Commands
+```bash
+# Install all dependencies (including dev)
+make install   # or: uv sync --all-extras && uv run pre-commit install
+# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
+make check
+# Individual commands
+make test        # uv run pytest tests/unit/ -v
+make lint        # uv run ruff check src tests
+make format      # uv run ruff format src tests
+make typecheck   # uv run mypy src
+make test-cov    # uv run pytest --cov=src --cov-report=term-missing
+# Run single test
+uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
+# Integration tests (real APIs)
+uv run pytest -m integration
+```
+## Architecture
+**Pattern**: Search-and-judge loop with multi-tool orchestration.
+```
+User Question → Orchestrator
+    ↓
+Search Loop:
+  1. Query PubMed
+  2. Gather evidence
+  3. Judge quality ("Do we have enough?")
+  4. If NO → Refine query, search more
+  5. If YES → Synthesize findings
+    ↓
+Research Report with Citations
+```
+**Key Components**:
+- `src/orchestrator.py` - Main agent loop
+- `src/tools/pubmed.py` - PubMed E-utilities search
+- `src/tools/search_handler.py` - Scatter-gather orchestration
+- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
+- `src/agent_factory/judges.py` - LLM-based evidence assessment
+- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
+- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
+- `src/utils/models.py` - Evidence, Citation, SearchResult models
+- `src/utils/exceptions.py` - Exception hierarchy
+- `src/app.py` - Gradio UI (HuggingFace Spaces)
+**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
+## Configuration
+Settings via pydantic-settings from `.env`:
+- `LLM_PROVIDER`: "openai" or "anthropic"
+- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
+- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
+- `MAX_ITERATIONS`: 1-50, default 10
+- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
+## Exception Hierarchy
+```
+DeepCriticalError (base)
+├── SearchError
+│   └── RateLimitError
+├── JudgeError
+└── ConfigurationError
+```
+## Testing
+- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
+- **Markers**: `unit`, `integration`, `slow`
+- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
+- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
+## Git Workflow
+- `main`: Production-ready
+- `dev`: Development
+- `vcms-dev`: HuggingFace Spaces sandbox
+- Remote `origin`: GitHub
+- Remote `huggingface-upstream`: HuggingFace Spaces

GEMINI.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# DeepCritical Context
+## Project Overview
+**DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
+**Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed), evaluating evidence, and hypothesizing potential applications.
+**Architecture:**
+The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
+**Current Status:**
+- **Phases 1-8:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report.
+- **Phase 9 (Source Cleanup):** COMPLETE. Removed DuckDuckGo web search (unreliable for scientific research).
+- **Phase 10-11:** PLANNED. ClinicalTrials.gov and bioRxiv integration.
+## Tech Stack & Tooling
+- **Language:** Python 3.11 (Pinned)
+- **Package Manager:** `uv` (Rust-based, extremely fast)
+- **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio`
+- **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
+- **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
+- **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
+## Building & Running
+We use a `Makefile` to standardize developer commands.
+| Command | Description |
+| :--- | :--- |
+| `make install` | Install dependencies and pre-commit hooks. |
+| `make test` | Run unit tests. |
+| `make lint` | Run Ruff linter. |
+| `make format` | Run Ruff formatter. |
+| `make typecheck` | Run Mypy static type checker. |
+| `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
+| `make clean` | Clean up cache and artifacts. |
+## Directory Structure
+- `src/`: Source code
+  - `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
+  - `tools/`: Search tools (`pubmed.py`, `base.py`, `search_handler.py`)
+  - `services/`: Services (`embeddings.py` - ChromaDB vector store)
+  - `agents/`: Magentic multi-agent mode agents
+  - `agent_factory/`: Agent definitions (judges, prompts)
+- `tests/`: Test suite
+  - `unit/`: Isolated unit tests (Mocked)
+  - `integration/`: Real API tests (Marked as slow/integration)
+- `docs/`: Documentation and Implementation Specs
+- `examples/`: Working demos for each phase
+## Development Conventions
+1.  **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
+2.  **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
+3.  **Linting:** Zero tolerance for Ruff errors.
+4.  **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests. Real calls go in `tests/integration`.
+5.  **Vertical Slices:** Implement features end-to-end (Search -> Judge -> UI) rather than layer-by-layer.