Spaces:
Running
Running
Contributing to DeepCritical
Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
Table of Contents
- Git Workflow
- Getting Started
- Development Commands
- Code Style & Conventions
- Type Safety
- Error Handling & Logging
- Testing Requirements
- Implementation Patterns
- Code Quality & Documentation
- Prompt Engineering & Citation Validation
- MCP Integration
- Common Pitfalls
- Key Principles
- Pull Request Process
Git Workflow
main: Production-ready (GitHub)dev: Development integration (GitHub)- Use feature branches:
yourname-dev - NEVER push directly to
mainordevon HuggingFace - GitHub is source of truth; HuggingFace is for deployment
Getting Started
Fork the repository on GitHub
Clone your fork:
git clone https://github.com/yourusername/GradioDemo.git cd GradioDemoInstall dependencies:
make installCreate a feature branch:
git checkout -b yourname-feature-nameMake your changes following the guidelines below
Run checks:
make checkCommit and push:
git commit -m "Description of changes" git push origin yourname-feature-nameCreate a pull request on GitHub
Development Commands
make install # Install dependencies + pre-commit
make check # Lint + typecheck + test (MUST PASS)
make test # Run unit tests
make lint # Run ruff
make format # Format with ruff
make typecheck # Run mypy
make test-cov # Test with coverage
make docs-build # Build documentation
make docs-serve # Serve documentation locally
Code Style & Conventions
Type Safety
- ALWAYS use type hints for all function parameters and return types
- Use
mypy --strictcompliance (noAnyunless absolutely necessary) - Use
TYPE_CHECKINGimports for circular dependencies:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from src.services.embeddings import EmbeddingService
Pydantic Models
- All data exchange uses Pydantic models (
src/utils/models.py) - Models are frozen (
model_config = {"frozen": True}) for immutability - Use
Field()with descriptions for all model fields - Validate with
ge=,le=,min_length=,max_length=constraints
Async Patterns
- ALL I/O operations must be async (
async def,await) - Use
asyncio.gather()for parallel operations - CPU-bound work (embeddings, parsing) must use
run_in_executor():
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)
- Never block the event loop with synchronous I/O
Linting
- Ruff with 100-char line length
- Ignore rules documented in
pyproject.toml:PLR0913: Too many arguments (agents need many params)PLR0912: Too many branches (complex orchestrator logic)PLR0911: Too many return statements (complex agent logic)PLR2004: Magic values (statistical constants)PLW0603: Global statement (singleton pattern)PLC0415: Lazy imports for optional dependencies
Pre-commit
- Run
make checkbefore committing - Must pass: lint + typecheck + test-cov
- Pre-commit hooks installed via
make install - CRITICAL: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind
Error Handling & Logging
Exception Hierarchy
Use custom exception hierarchy (src/utils/exceptions.py):
DeepCriticalError(base)SearchErrorβRateLimitErrorJudgeErrorConfigurationError
Error Handling Rules
- Always chain exceptions:
raise SearchError(...) from e - Log errors with context using
structlog:
logger.error("Operation failed", error=str(e), context=value)
- Never silently swallow exceptions
- Provide actionable error messages
Logging
- Use
structlogfor all logging (NOTprintorlogging) - Import:
import structlog; logger = structlog.get_logger() - Log with structured data:
logger.info("event", key=value) - Use appropriate levels: DEBUG, INFO, WARNING, ERROR
Logging Examples
logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))
Error Chaining
Always preserve exception context:
try:
result = await api_call()
except httpx.HTTPError as e:
raise SearchError(f"API call failed: {e}") from e
Testing Requirements
Test Structure
- Unit tests in
tests/unit/(mocked, fast) - Integration tests in
tests/integration/(real APIs, marked@pytest.mark.integration) - Use markers:
unit,integration,slow
Mocking
- Use
respxfor httpx mocking - Use
pytest-mockfor general mocking - Mock LLM calls in unit tests (use
MockJudgeHandler) - Fixtures in
tests/conftest.py:mock_httpx_client,mock_llm_response
TDD Workflow
- Write failing test in
tests/unit/ - Implement in
src/ - Ensure test passes
- Run
make check(lint + typecheck + test)
Test Examples
@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
tool = PubMedTool()
results = await tool.search("metformin", max_results=5)
assert len(results) > 0
assert all(isinstance(r, Evidence) for r in results)
@pytest.mark.integration
async def test_real_pubmed_search():
tool = PubMedTool()
results = await tool.search("metformin", max_results=3)
assert len(results) <= 3
Test Coverage
- Run
make test-covfor coverage report - Aim for >80% coverage on critical paths
- Exclude:
__init__.py,TYPE_CHECKINGblocks
Implementation Patterns
Search Tools
All tools implement SearchTool protocol (src/tools/base.py):
- Must have
nameproperty - Must implement
async def search(query, max_results) -> list[Evidence] - Use
@retrydecorator from tenacity for resilience - Rate limiting: Implement
_rate_limit()for APIs with limits (e.g., PubMed) - Error handling: Raise
SearchErrororRateLimitErroron failures
Example pattern:
class MySearchTool:
@property
def name(self) -> str:
return "mytool"
@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
# Implementation
return evidence_list
Judge Handlers
- Implement
JudgeHandlerProtocol(async def assess(question, evidence) -> JudgeAssessment) - Use pydantic-ai
Agentwithoutput_type=JudgeAssessment - System prompts in
src/prompts/judge.py - Support fallback handlers:
MockJudgeHandler,HFInferenceJudgeHandler - Always return valid
JudgeAssessment(never raise exceptions)
Agent Factory Pattern
- Use factory functions for creating agents (
src/agent_factory/) - Lazy initialization for optional dependencies (e.g., embeddings, Modal)
- Check requirements before initialization:
def check_magentic_requirements() -> None:
if not settings.has_openai_key:
raise ConfigurationError("Magentic requires OpenAI")
State Management
- Magentic Mode: Use
ContextVarfor thread-safe state (src/agents/state.py) - Simple Mode: Pass state via function parameters
- Never use global mutable state (except singletons via
@lru_cache)
Singleton Pattern
Use @lru_cache(maxsize=1) for singletons:
@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService:
return EmbeddingService()
- Lazy initialization to avoid requiring dependencies at import time
Code Quality & Documentation
Docstrings
- Google-style docstrings for all public functions
- Include Args, Returns, Raises sections
- Use type hints in docstrings only if needed for clarity
Example:
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
"""Search PubMed and return evidence.
Args:
query: The search query string
max_results: Maximum number of results to return
Returns:
List of Evidence objects
Raises:
SearchError: If the search fails
RateLimitError: If we hit rate limits
"""
Code Comments
- Explain WHY, not WHAT
- Document non-obvious patterns (e.g., why
requestsnothttpxfor ClinicalTrials) - Mark critical sections:
# CRITICAL: ... - Document rate limiting rationale
- Explain async patterns when non-obvious
Prompt Engineering & Citation Validation
Judge Prompts
- System prompt in
src/prompts/judge.py - Format evidence with truncation (1500 chars per item)
- Handle empty evidence case separately
- Always request structured JSON output
- Use
format_user_prompt()andformat_empty_evidence_prompt()helpers
Hypothesis Prompts
- Use diverse evidence selection (MMR algorithm)
- Sentence-aware truncation (
truncate_at_sentence()) - Format: Drug β Target β Pathway β Effect
- System prompt emphasizes mechanistic reasoning
- Use
format_hypothesis_prompt()with embeddings for diversity
Report Prompts
- Include full citation details for validation
- Use diverse evidence selection (n=20)
- CRITICAL: Emphasize citation validation rules
- Format hypotheses with support/contradiction counts
- System prompt includes explicit JSON structure requirements
Citation Validation
- ALWAYS validate references before returning reports
- Use
validate_references()fromsrc/utils/citation_validator.py - Remove hallucinated citations (URLs not in evidence)
- Log warnings for removed citations
- Never trust LLM-generated citations without validation
Citation Validation Rules
- Every reference URL must EXACTLY match a provided evidence URL
- Do NOT invent, fabricate, or hallucinate any references
- Do NOT modify paper titles, authors, dates, or URLs
- If unsure about a citation, OMIT it rather than guess
- Copy URLs exactly as provided - do not create similar-looking URLs
Evidence Selection
- Use
select_diverse_evidence()for MMR-based selection - Balance relevance vs diversity (lambda=0.7 default)
- Sentence-aware truncation preserves meaning
- Limit evidence per prompt to avoid context overflow
MCP Integration
MCP Tools
- Functions in
src/mcp_tools.pyfor Claude Desktop - Full type hints required
- Google-style docstrings with Args/Returns sections
- Formatted string returns (markdown)
Gradio MCP Server
- Enable with
mcp_server=Trueindemo.launch() - Endpoint:
/gradio_api/mcp/ - Use
ssr_mode=Falseto fix hydration issues in HF Spaces
Common Pitfalls
- Blocking the event loop: Never use sync I/O in async functions
- Missing type hints: All functions must have complete type annotations
- Hallucinated citations: Always validate references
- Global mutable state: Use ContextVar or pass via parameters
- Import errors: Lazy-load optional dependencies (magentic, modal, embeddings)
- Rate limiting: Always implement for external APIs
- Error chaining: Always use
from ewhen raising exceptions
Key Principles
- Type Safety First: All code must pass
mypy --strict - Async Everything: All I/O must be async
- Test-Driven: Write tests before implementation
- No Hallucinations: Validate all citations
- Graceful Degradation: Support free tier (HF Inference) when no API keys
- Lazy Loading: Don't require optional dependencies at import time
- Structured Logging: Use structlog, never print()
- Error Chaining: Always preserve exception context
Pull Request Process
- Ensure all checks pass:
make check - Update documentation if needed
- Add tests for new features
- Update CHANGELOG if applicable
- Request review from maintainers
- Address review feedback
- Wait for approval before merging
Questions?
- Open an issue on GitHub
- Check existing documentation
- Review code examples in the codebase
Thank you for contributing to DeepCritical!