BirdScopeAI / README.md
facemelter's picture
oops final this time
2a6dab1 verified
---
title: BirdScope AI - MCP Multi-Agent System
emoji: πŸ¦…
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
python_version: 3.11
app_file: app.py
pinned: false
license: mit
short_description: AI-powered bird identification with MCP multi-agent system
tags:
- building-mcp-track-enterprise
- building-mcp-track-consumer
- building-mcp-track-creative
- mcp-in-action-track-enterprise
- mcp-in-action-track-consumer
- mcp-in-action-track-creative
---
# πŸ¦… BirdScope AI - MCP Multi-Agent System
**AI-powered bird identification with specialized MCP agents**
Built for the [MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)
---
## πŸ“’ Hackathon Submission
**Social Media:** [Twitter/X Post](https://x.com/zulucoconuts/status/1995255281064755708)
**Demo Video:** [Watch on YouTube/Loom](https://youtu.be/V_ZoOkyjEyU)
**Track Submissions:**
- πŸ”§ **Track 1 (Building MCP)**: Two custom MCP servers
- **Nuthatch MCP Server** - 7 tools for bird species database (search, species info, images, audio, family search, conservation filtering)
- **Modal Bird Classifier MCP** - 2 Modal-hosted GPU-powered image classification tools (base64 & URL inputs)
- Categories: Enterprise (wildlife conservation) | Consumer (bird enthusiasts and education) | Creative (multimedia exploration)
- πŸ€– **Track 2 (MCP in Action)**: Full multi-agent system with supervisor routing
- LangGraph-based supervisor orchestrating 3 specialized subagents
- Integrates both MCP servers with intelligent tool routing
- Categories: Enterprise (conservation orgs) | Consumer (bird watchers) | Creative (educational multimedia)
**Author:** [@facemelter](https://huggingface.co/facemelter)
**Built with:** Gradio 6 | LangGraph | FastMCP | Modal (GPU) | OpenAI/Anthropic/HuggingFace LLMs
---
## 🌐 Project Overview
BirdScope AI showcases an advanced multi-agent system powered by **Gradio 6** and **LangGraph**, designed to identify bird species, explore multimedia content, and provide educational information about birds worldwide.
**Our innovation:** We built **two complete systems in one**:
- πŸ”§ **Two Custom MCP Servers** (Track 1): Nuthatch species database (7 tools) + Modal GPU classifier (2 tools)
- πŸ€– **Multi-Agent Application** (Track 2): Supervisor-orchestrated specialist agents
This dual approach demonstrates both **building MCP infrastructure** and **leveraging MCP for autonomous agents**.
---
## ✨ Key Features
### πŸ€– Multi-Agent Orchestration
- **LangGraph Supervisor Pattern** with intelligent LLM-based routing
- **3 Specialized Subagents** (Image Identifier, Species Explorer, Taxonomy Specialist)
- **Session-based Agent Caching** - Agents reused within user sessions for 10x faster responses
- **Provider-Specific Prompts** - Optimized system prompts for OpenAI, Anthropic, and HuggingFace
### πŸ”§ Dual MCP Server Architecture
- **Modal Bird Classifier** ([modal.com](https://modal.com))
- [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) from HuggingFace
- 526 bird species classification on Modal T4 GPU
- Serverless GPU deployment for on-demand classification
- Streamable HTTP transport with base64 and URL input support
- **Nuthatch MCP Server** (Custom Built - Track 1)
- FastMCP framework with 7 specialized tools
- Integrates [Nuthatch API](https://nuthatch.lastelm.software) (1000+ species)
- **Dual Transport Support**: STDIO (subprocess) for HF Spaces + HTTP for local debugging
- Data sources: Nuthatch DB, Unsplash (images), xeno-canto (audio)
### πŸ“‘ Dual Streaming Output
- **Chat Response Stream** - Real-time markdown rendering with embedded media
- **Tool Execution Log Stream** - Parallel visibility into MCP tool calls (inputs/outputs)
- **Async Progress Indicators** - Immediate user feedback before processing begins
### 🎨 Structured Output Parsing
- **LlamaIndex Pydantic Models** - Type-safe response formatting
- **Regex URL Extraction** - Automatic detection of image and audio URLs
- **Smart Audio Normalization** - xeno-canto links converted to browser-friendly format (`/download` β†’ playable)
- **Markdown Media Embedding** - Images and audio automatically formatted
### 🌐 Multi-Provider LLM Support
- **OpenAI** (GPT-4o-mini) - Recommended for reliability
- **Anthropic** (Claude Sonnet 4) - Best for complex reasoning
- **HuggingFace Inference API** - Open-source models (limited tool calling)
- **User-Provided Keys** - No backend API key required, users supply their own
### πŸ’… Production UI/UX
- **Gradio 6.0 SSR** - Server-side rendering for enhanced performance
- **Custom Cloud Theme** - Sky-inspired CSS with mobile-responsive design
- **Dynamic Examples** - Example queries adapt to selected agent mode
- **Instant Feedback** - "⏳ Starting..." indicator appears immediately on submit
---
## πŸ—‚οΈ Data Sources & MCP Servers
We built **two custom MCP servers** that integrate with bird data APIs and GPU-powered classification:
**Data Sources:**
- **Nuthatch API** ([nuthatch.lastelm.software](https://nuthatch.lastelm.software)) - 1000+ bird species database by Last Elm Software
- **Unsplash** - High-quality reference images for visual identification
- **xeno-canto.org** - Community-contributed bird audio recordings worldwide
- **HuggingFace Model** - [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) for GPU classification
**MCP Servers:**
1. **Nuthatch MCP Server** (Track 1 - Building MCP)
- 7 specialized tools: search, species info, images, audio, family search, conservation filtering
- STDIO transport for HF Spaces, HTTP option for local debugging
- FastMCP framework with async API integration
2. **Modal Bird Classifier** (GPU-powered)
- Image classification tools: URL and base64 input support
- Serverless GPU deployment via Modal
- Streamable HTTP transport
---
## 🧩 Core Components
**Multi-Agent Orchestration:**
- **LangGraph Supervisor Pattern** - LLM-based routing between specialist agents
- **3 Specialized Subagents** - Each with focused tool subset (image ID, species exploration, taxonomy)
- **Session-based Caching** - Agent instances reused within user sessions for performance
- **Dual Streaming** - Parallel chat response + tool execution log streams
**Agent Architecture:**
- `subagent_supervisor.py` - Creates supervisor workflow with LangGraph
- `subagent_factory.py` - Builds specialists with filtered tool access
- `subagent_config.py` - Defines agent modes and tool allocations
- `prompts.py` - Provider-specific system prompts (OpenAI, Anthropic, HuggingFace)
**UI & UX:**
- **Gradio 6.0** with SSR for enhanced performance
- Custom cloud-themed CSS with mobile-responsive design
- Dynamic examples that adapt to agent mode selection
- Immediate processing feedback with async streaming updates
---
## πŸš€ Quick Start
**Try the Live Demo:** Just provide your LLM API key (OpenAI, Anthropic, or HuggingFace) in the sidebar and start exploring!
**For Developers:**
```bash
# Clone and install
git clone <repo-url>
cd hackathon_draft
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys
# Run locally
python app.py
```
**Deploy to HuggingFace Spaces:**
```bash
python upload_to_space.py
# Configure Secrets in Space Settings (see docs/dev/main-README.md)
```
**Full Setup Guide:** See [docs/dev/main-README.md](docs/dev/main-README.md) for comprehensive deployment instructions
---
## πŸ† Credits & License
Built for the [HuggingFace MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)
**Data Sources:** [Nuthatch API](https://nuthatch.lastelm.software) (Last Elm Software) | [xeno-canto.org](https://xeno-canto.org) | [Unsplash](https://unsplash.com)
**Technology:** [Model Context Protocol](https://github.com/anthropics/mcp) | [LangGraph](https://github.com/langchain-ai/langgraph) | [Gradio 6](https://gradio.app) | [Modal](https://modal.com)
MIT License - Educational and research purposes