BirdScopeAI / docs /dev /main-README.md
facemelter's picture
Added audio_finder subagent to Specialist supervisor
128f5d1 verified
---
title: BirdScope AI - MCP Multi-Agent System
emoji: πŸ¦…
colorFrom: green
colorTo: blue
sdk: gradio
python_version: 3.11
app_file: app.py
pinned: false
---
# πŸ¦… BirdScope AI - Multi-Agent Bird Identification System
**AI-powered bird identification with specialized MCP agents**
Built for the [MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)
---
## 🎯 Overview
BirdScope AI is a production-ready multi-agent system that combines **Modal GPU classification** with **Nuthatch species database** to provide comprehensive bird identification and exploration. Users can upload photos, search species, explore taxonomic families, and access rich multimedia content (images, audio recordings, conservation data).
**Two Agent Modes:**
1. **Specialized Subagents (3 Specialists)** - Router orchestrates image identifier, species explorer, and taxonomy specialist
2. **Audio Finder Agent** - Specialized agent for discovering bird audio recordings
---
## ✨ Features
- πŸ” **Image Classification**: Upload bird photos for instant GPU-powered identification
- πŸ“Έ **Reference Images**: High-quality Unsplash photos for each species
- 🎡 **Audio Recordings**: Bird calls and songs from xeno-canto.org
- 🌍 **Conservation Data**: IUCN status and taxonomic information
- 🧠 **Multi-Agent Architecture**: Specialized agents with focused tool subsets
- πŸ”„ **Dual Streaming**: Separate outputs for chat responses and tool execution logs
- πŸ€– **Multi-Provider**: OpenAI (GPT-4), Anthropic (Claude), HuggingFace (Qwen)
---
## πŸš€ Quick Start (For Users)
### Option 1: OpenAI (Recommended)
1. Get your OpenAI API key from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
2. Select **OpenAI** as provider in the sidebar
3. Enter your API key
4. Model used: `gpt-4o-mini`
### Option 2: Anthropic (Claude)
1. Get your Anthropic API key from [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
2. Select **Anthropic** as provider
3. Enter your API key
4. Model used: `claude-sonnet-4-5`
### Option 3: HuggingFace
⚠️ **Note**: HuggingFace Inference API has limited function calling support. OpenAI or Anthropic recommended for full functionality.
---
## πŸ› οΈ Environment Setup (For Developers)
### Prerequisites
- Python 3.11+
- Modal account (for GPU classifier)
- Nuthatch API key
- LLM API key (OpenAI, Anthropic, or HuggingFace)
---
### 🏠 Local Development Setup
#### Step 1: Clone and Install
```bash
cd ~/Desktop/hackathon/hackathon_draft
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
#### Step 2: Configure Environment Variables
Create a `.env` file from the example:
```bash
cp .env.example .env
```
Edit `.env` with your API keys:
```bash
# ================================================
# REQUIRED: Modal Bird Classifier (GPU)
# ================================================
MODAL_MCP_URL=https://your-modal-app--mcp-server.modal.run/mcp
BIRD_CLASSIFIER_API_KEY=your-modal-api-key-here
# ================================================
# REQUIRED: Nuthatch Species Database
# ================================================
NUTHATCH_API_KEY=your-nuthatch-api-key-here
NUTHATCH_BASE_URL=https://nuthatch.lastelm.software/v2 # Default, can omit
# Nuthatch Transport Mode (STDIO or HTTP)
NUTHATCH_USE_STDIO=true # Recommended for local development
# Only needed if NUTHATCH_USE_STDIO=false:
# NUTHATCH_MCP_URL=http://localhost:8001/mcp
# NUTHATCH_MCP_AUTH_KEY=your-auth-key-here
# ================================================
# LLM Provider (Choose ONE)
# ================================================
# OpenAI (Recommended)
OPENAI_API_KEY=sk-your-openai-key-here
DEFAULT_OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.0
# OR Anthropic
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
# DEFAULT_ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
# ANTHROPIC_TEMPERATURE=0.0
# OR HuggingFace (Limited function calling support)
# HF_API_KEY=hf_your-huggingface-token-here
# DEFAULT_HF_MODEL=Qwen/Qwen2.5-Coder-32B-Instruct
# HF_TEMPERATURE=0.1
```
#### Step 3: Understanding Nuthatch Transport Modes
**STDIO Mode (Recommended for Local):**
- Nuthatch MCP server runs as subprocess
- Automatically started by the app
- No separate server process needed
- Set `NUTHATCH_USE_STDIO=true`
**HTTP Mode (Alternative for Local):**
- Nuthatch MCP server runs as separate HTTP server
- Useful for debugging or multiple clients
- Requires running server in separate terminal
To use HTTP mode:
```bash
# Terminal 1: Run Nuthatch MCP server
python nuthatch_tools.py --http --port 8001
# Terminal 2: Run the app
# Set in .env:
# NUTHATCH_USE_STDIO=false
# NUTHATCH_MCP_URL=http://localhost:8001/mcp
python app.py
```
#### Step 4: Run the App
```bash
# With STDIO mode (default, easiest):
python app.py
# Or using Gradio CLI:
gradio app.py
```
App will be available at: `http://127.0.0.1:7860`
---
### ☁️ HuggingFace Spaces Deployment
#### Step 1: Create a New Space
1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
2. Choose:
- **SDK**: Gradio
- **Hardware**: CPU Basic (free) or CPU Upgrade (faster)
- **Visibility**: Public or Private
#### Step 2: Upload Your Code
**Option A: Using `upload_to_space.py` (Recommended)**
```bash
# 1. Install HuggingFace CLI
pip install huggingface_hub
# 2. Login
huggingface-cli login
# 3. Update upload_to_space.py with your Space name
# Edit line with repo_id:
# repo_id="YOUR-USERNAME/YOUR-SPACE-NAME"
# 4. Upload
python upload_to_space.py
```
**Option B: Using Git**
```bash
git remote add hf-space https://huggingface.co/spaces/YOUR-USERNAME/YOUR-SPACE-NAME
git push hf-space main
```
#### Step 3: Configure Secrets in HuggingFace Spaces
⚠️ **CRITICAL**: Spaces use **Secrets**, not `.env` files!
Go to your Space β†’ **Settings** β†’ **Variables and secrets**
**Add these secrets:**
```bash
# REQUIRED: Modal Bird Classifier
MODAL_MCP_URL = https://your-modal-app--mcp-server.modal.run/mcp
BIRD_CLASSIFIER_API_KEY = your-modal-api-key-here
# REQUIRED: Nuthatch Species Database
NUTHATCH_API_KEY = your-nuthatch-api-key-here
NUTHATCH_BASE_URL = https://nuthatch.lastelm.software/v2 # Optional
NUTHATCH_USE_STDIO = true # MUST be "true" for Spaces
# OPTIONAL: Backend-provided LLM keys (users can provide their own)
# Only add if you want to provide default keys:
# OPENAI_API_KEY = sk-your-key-here
# ANTHROPIC_API_KEY = sk-ant-your-key-here
```
**Important Notes:**
- βœ… **ALWAYS** use `NUTHATCH_USE_STDIO=true` on Spaces (subprocess mode)
- βœ… HTTP mode not supported on Spaces (port binding restrictions)
- βœ… Users can provide their own LLM keys via the UI
- βœ… Environment variables from Spaces **do not** auto-inherit to subprocesses
- The app explicitly passes `NUTHATCH_API_KEY` and `NUTHATCH_BASE_URL` to the subprocess (see `mcp_clients.py`)
#### Step 4: Verify Deployment
1. Wait for Space to build (2-5 minutes)
2. Check **Logs** tab for errors
3. Try the app - upload a bird photo or ask about species
---
## πŸ“ Project Structure
```
hackathon_draft/
β”œβ”€β”€ app.py # Main Gradio app
β”œβ”€β”€ upload_to_space.py # HF Spaces upload script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env.example # Environment template
β”œβ”€β”€ langgraph_agent/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ agents.py # Agent factory (single/multi-agent)
β”‚ β”œβ”€β”€ config.py # Configuration loader
β”‚ β”œβ”€β”€ mcp_clients.py # MCP client setup
β”‚ β”œβ”€β”€ subagent_config.py # Agent mode definitions
β”‚ β”œβ”€β”€ prompts.py # System prompts
β”‚ └── structured_output.py # Response formatting
β”œβ”€β”€ nuthatch_tools.py # Nuthatch MCP server
└── agent_cache.py # Session-based agent caching
```
---
## πŸ—οΈ Architecture
### MCP Servers
**1. Modal Bird Classifier (GPU)**
- Hosted on Modal (serverless GPU)
- ResNet50 trained on 555 bird species
- Tools: `classify_from_url`, `classify_from_base64`
- Transport: Streamable HTTP
**2. Nuthatch Species Database**
- Species reference API (1000+ birds)
- Tools: `search_birds`, `get_bird_info`, `get_bird_images`, `get_bird_audio`, `search_by_family`, `filter_by_status`, `get_all_families`
- Transport: **STDIO** (subprocess on Spaces), STDIO or HTTP (local)
- Data sources: Unsplash (images), xeno-canto (audio)
### Agent Modes
**Mode 1: Specialized Subagents (3 Specialists)**
- **Router** orchestrates 3 specialized agents:
1. **Image Identifier**: classify images, show reference photos
2. **Species Explorer**: search by name, provide multimedia
3. **Taxonomy Specialist**: conservation status, family search
- Each specialist has focused tool subset
**Mode 2: Audio Finder Agent**
- Single specialized agent for finding bird audio
- Tools: `search_birds`, `get_bird_info`, `get_bird_audio`
- Optimized workflow for xeno-canto recordings
### Tech Stack
- **Frontend**: Gradio 6.0 with custom CSS (cloud/sky theme)
- **Agent Framework**: LangGraph with streaming
- **MCP Integration**: FastMCP client library
- **LLM Support**: OpenAI, Anthropic, HuggingFace
- **Session Management**: In-memory agent caching
- **Output Parsing**: LlamaIndex Pydantic + regex (optimized)
---
## 🎨 Special Features
### Dual Streaming Output
- **Chat Panel**: LLM responses with markdown rendering
- **Tool Log Panel**: Real-time tool execution traces (inputs/outputs)
### Dynamic Examples
- Examples change based on selected agent mode
- Photo examples always visible
- Text examples adapt to Audio Finder vs Multi-Agent
### Structured Output
- Automatic image/audio URL extraction
- Markdown formatting for media
- xeno-canto audio links (browser-friendly)
---
## πŸ“ API Key Sources
| Service | Get Key From | Purpose |
|---------|-------------|---------|
| **Modal** | [modal.com](https://modal.com) | GPU bird classifier |
| **Nuthatch** | [nuthatch.lastelm.software](https://nuthatch.lastelm.software) | Species database |
| **OpenAI** | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | LLM (recommended) |
| **Anthropic** | [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys) | LLM (Claude) |
| **HuggingFace** | [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) | LLM (limited support) |
---
## πŸ› Troubleshooting
### Space stuck on "Building"
- Check **Logs** tab for errors
- Verify all required secrets are set
- Try Factory Reboot (Settings β†’ Factory Reboot)
### "Invalid API key" errors
- Ensure secrets are set correctly (no quotes needed)
- Check secret names match exactly (case-sensitive)
### HuggingFace provider fails with "function calling not support"
- HuggingFace Inference API has limited tool calling
- Use OpenAI or Anthropic instead
### Nuthatch server not starting (local)
- Check `NUTHATCH_API_KEY` is set in `.env`
- Verify API key is valid
- Try STDIO mode: `NUTHATCH_USE_STDIO=true`
### Audio links broken
- Check AUDIO_FINDER_PROMPT is working
- Verify xeno-canto URLs include `/download`
- Check structured output parsing logs
---
## πŸ“š Documentation
For detailed implementation docs, see:
- `project_docs/implementation/phase_5_final.md` - Complete agent architecture
- `project_docs/commands_guide/git_spaces_cheatsheet.md` - Deployment guide
---
## πŸ† Credits
- **Bird Species Data**: [Nuthatch API](https://nuthatch.lastelm.software) by Last Elm Software
- **Bird Audio**: [xeno-canto.org](https://xeno-canto.org) - Community bird recordings
- **Reference Images**: [Unsplash](https://unsplash.com) + curated collections
- **MCP Protocol**: [Anthropic Model Context Protocol](https://github.com/anthropics/mcp)
- **Hackathon**: [HuggingFace MCP-1st-Birthday](https://huggingface.co/MCP-1st-Birthday)
---
## πŸ“„ License
MIT License - Built for educational and research purposes