Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /contributing.md

Joseph Pollack

demo launches

53c4c46 unverified 12 days ago

preview code

raw

history blame

12.6 kB

Contributing to DeepCritical

Thank you for your interest in contributing to DeepCritical! This guide will help you get started.

Git Workflow
Getting Started
Development Commands
Code Style & Conventions
Type Safety
Error Handling & Logging
Testing Requirements
Implementation Patterns
Code Quality & Documentation
Prompt Engineering & Citation Validation
MCP Integration
Common Pitfalls
Key Principles
Pull Request Process

Git Workflow

main: Production-ready (GitHub)
dev: Development integration (GitHub)
Use feature branches: yourname-dev
NEVER push directly to main or dev on HuggingFace
GitHub is source of truth; HuggingFace is for deployment

Getting Started

Fork the repository on GitHub

Clone your fork:

git clone https://github.com/yourusername/GradioDemo.git
cd GradioDemo

Install dependencies:
```
make install
```
Create a feature branch:
```
git checkout -b yourname-feature-name
```
Make your changes following the guidelines below
Run checks:
```
make check
```

Commit and push:

git commit -m "Description of changes"
git push origin yourname-feature-name

Create a pull request on GitHub

Development Commands

make install      # Install dependencies + pre-commit
make check        # Lint + typecheck + test (MUST PASS)
make test         # Run unit tests
make lint         # Run ruff
make format       # Format with ruff
make typecheck    # Run mypy
make test-cov     # Test with coverage
make docs-build  # Build documentation
make docs-serve  # Serve documentation locally

Code Style & Conventions

Type Safety

ALWAYS use type hints for all function parameters and return types
Use mypy --strict compliance (no Any unless absolutely necessary)
Use TYPE_CHECKING imports for circular dependencies:

from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from src.services.embeddings import EmbeddingService

Pydantic Models

All data exchange uses Pydantic models (src/utils/models.py)
Models are frozen (model_config = {"frozen": True}) for immutability
Use Field() with descriptions for all model fields
Validate with ge=, le=, min_length=, max_length= constraints

Async Patterns

ALL I/O operations must be async (async def, await)
Use asyncio.gather() for parallel operations
CPU-bound work (embeddings, parsing) must use run_in_executor():

loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)

Never block the event loop with synchronous I/O

Linting

Ruff with 100-char line length
Ignore rules documented in pyproject.toml:
- PLR0913: Too many arguments (agents need many params)
- PLR0912: Too many branches (complex orchestrator logic)
- PLR0911: Too many return statements (complex agent logic)
- PLR2004: Magic values (statistical constants)
- PLW0603: Global statement (singleton pattern)
- PLC0415: Lazy imports for optional dependencies

Pre-commit

Run make check before committing
Must pass: lint + typecheck + test-cov
Pre-commit hooks installed via make install
CRITICAL: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind

Error Handling & Logging

Exception Hierarchy

Use custom exception hierarchy (src/utils/exceptions.py):

DeepCriticalError (base)
SearchError → RateLimitError
JudgeError
ConfigurationError

Error Handling Rules

Always chain exceptions: raise SearchError(...) from e
Log errors with context using structlog:

logger.error("Operation failed", error=str(e), context=value)

Never silently swallow exceptions
Provide actionable error messages

Logging

Use structlog for all logging (NOT print or logging)
Import: import structlog; logger = structlog.get_logger()
Log with structured data: logger.info("event", key=value)
Use appropriate levels: DEBUG, INFO, WARNING, ERROR

Logging Examples

logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))

Error Chaining

Always preserve exception context:

try:
    result = await api_call()
except httpx.HTTPError as e:
    raise SearchError(f"API call failed: {e}") from e

Testing Requirements

Test Structure

Unit tests in tests/unit/ (mocked, fast)
Integration tests in tests/integration/ (real APIs, marked @pytest.mark.integration)
Use markers: unit, integration, slow

Mocking

Use respx for httpx mocking
Use pytest-mock for general mocking
Mock LLM calls in unit tests (use MockJudgeHandler)
Fixtures in tests/conftest.py: mock_httpx_client, mock_llm_response

TDD Workflow

Write failing test in tests/unit/
Implement in src/
Ensure test passes
Run make check (lint + typecheck + test)

Test Examples

@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=5)
    assert len(results) > 0
    assert all(isinstance(r, Evidence) for r in results)

@pytest.mark.integration
async def test_real_pubmed_search():
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=3)
    assert len(results) <= 3

Test Coverage

Run make test-cov for coverage report
Aim for >80% coverage on critical paths
Exclude: __init__.py, TYPE_CHECKING blocks

Implementation Patterns

Search Tools

All tools implement SearchTool protocol (src/tools/base.py):

Must have name property
Must implement async def search(query, max_results) -> list[Evidence]
Use @retry decorator from tenacity for resilience
Rate limiting: Implement _rate_limit() for APIs with limits (e.g., PubMed)
Error handling: Raise SearchError or RateLimitError on failures

Example pattern:

class MySearchTool:
    @property
    def name(self) -> str:
        return "mytool"
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
        # Implementation
        return evidence_list

Judge Handlers

Implement JudgeHandlerProtocol (async def assess(question, evidence) -> JudgeAssessment)
Use pydantic-ai Agent with output_type=JudgeAssessment
System prompts in src/prompts/judge.py
Support fallback handlers: MockJudgeHandler, HFInferenceJudgeHandler
Always return valid JudgeAssessment (never raise exceptions)

Agent Factory Pattern

Use factory functions for creating agents (src/agent_factory/)
Lazy initialization for optional dependencies (e.g., embeddings, Modal)
Check requirements before initialization:

def check_magentic_requirements() -> None:
    if not settings.has_openai_key:
        raise ConfigurationError("Magentic requires OpenAI")

State Management

Magentic Mode: Use ContextVar for thread-safe state (src/agents/state.py)
Simple Mode: Pass state via function parameters
Never use global mutable state (except singletons via @lru_cache)

Singleton Pattern

Use @lru_cache(maxsize=1) for singletons:

@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService:
    return EmbeddingService()

Lazy initialization to avoid requiring dependencies at import time

Code Quality & Documentation

Docstrings

Google-style docstrings for all public functions
Include Args, Returns, Raises sections
Use type hints in docstrings only if needed for clarity

Example:

async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
    """Search PubMed and return evidence.

    Args:
        query: The search query string
        max_results: Maximum number of results to return

    Returns:
        List of Evidence objects

    Raises:
        SearchError: If the search fails
        RateLimitError: If we hit rate limits
    """

Code Comments

Explain WHY, not WHAT
Document non-obvious patterns (e.g., why requests not httpx for ClinicalTrials)
Mark critical sections: # CRITICAL: ...
Document rate limiting rationale
Explain async patterns when non-obvious

Prompt Engineering & Citation Validation

Judge Prompts

System prompt in src/prompts/judge.py
Format evidence with truncation (1500 chars per item)
Handle empty evidence case separately
Always request structured JSON output
Use format_user_prompt() and format_empty_evidence_prompt() helpers

Hypothesis Prompts

Use diverse evidence selection (MMR algorithm)
Sentence-aware truncation (truncate_at_sentence())
Format: Drug → Target → Pathway → Effect
System prompt emphasizes mechanistic reasoning
Use format_hypothesis_prompt() with embeddings for diversity

Report Prompts

Include full citation details for validation
Use diverse evidence selection (n=20)
CRITICAL: Emphasize citation validation rules
Format hypotheses with support/contradiction counts
System prompt includes explicit JSON structure requirements

Citation Validation

ALWAYS validate references before returning reports
Use validate_references() from src/utils/citation_validator.py
Remove hallucinated citations (URLs not in evidence)
Log warnings for removed citations
Never trust LLM-generated citations without validation

Citation Validation Rules

Every reference URL must EXACTLY match a provided evidence URL
Do NOT invent, fabricate, or hallucinate any references
Do NOT modify paper titles, authors, dates, or URLs
If unsure about a citation, OMIT it rather than guess
Copy URLs exactly as provided - do not create similar-looking URLs

Evidence Selection

Use select_diverse_evidence() for MMR-based selection
Balance relevance vs diversity (lambda=0.7 default)
Sentence-aware truncation preserves meaning
Limit evidence per prompt to avoid context overflow

MCP Integration

MCP Tools

Functions in src/mcp_tools.py for Claude Desktop
Full type hints required
Google-style docstrings with Args/Returns sections
Formatted string returns (markdown)

Gradio MCP Server

Enable with mcp_server=True in demo.launch()
Endpoint: /gradio_api/mcp/
Use ssr_mode=False to fix hydration issues in HF Spaces

Common Pitfalls

Blocking the event loop: Never use sync I/O in async functions
Missing type hints: All functions must have complete type annotations
Hallucinated citations: Always validate references
Global mutable state: Use ContextVar or pass via parameters
Import errors: Lazy-load optional dependencies (magentic, modal, embeddings)
Rate limiting: Always implement for external APIs
Error chaining: Always use from e when raising exceptions

Key Principles

Type Safety First: All code must pass mypy --strict
Async Everything: All I/O must be async
Test-Driven: Write tests before implementation
No Hallucinations: Validate all citations
Graceful Degradation: Support free tier (HF Inference) when no API keys
Lazy Loading: Don't require optional dependencies at import time
Structured Logging: Use structlog, never print()
Error Chaining: Always preserve exception context

Pull Request Process

Ensure all checks pass: make check
Update documentation if needed
Add tests for new features
Update CHANGELOG if applicable
Request review from maintainers
Address review feedback
Wait for approval before merging

Questions?

Open an issue on GitHub
Check existing documentation
Review code examples in the codebase

Thank you for contributing to DeepCritical!