Spaces:

MCP-1st-Birthday
/

BirdScopeAI

Paused

App Files Files Community

BirdScopeAI / README.md

facemelter

oops final this time

2a6dab1 verified 24 days ago

preview code

raw

history blame contribute delete

8.12 kB

	---
	title: BirdScope AI - MCP Multi-Agent System
	emoji: 🦅
	colorFrom: green
	colorTo: blue
	sdk: gradio
	sdk_version: 6.0.1
	python_version: 3.11
	app_file: app.py
	pinned: false
	license: mit
	short_description: AI-powered bird identification with MCP multi-agent system
	tags:
	- building-mcp-track-enterprise
	- building-mcp-track-consumer
	- building-mcp-track-creative
	- mcp-in-action-track-enterprise
	- mcp-in-action-track-consumer
	- mcp-in-action-track-creative
	---

	# 🦅 BirdScope AI - MCP Multi-Agent System

	AI-powered bird identification with specialized MCP agents

	Built for the [MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)

	---

	## 📢 Hackathon Submission

	Social Media: [Twitter/X Post](https://x.com/zulucoconuts/status/1995255281064755708)

	Demo Video: [Watch on YouTube/Loom](https://youtu.be/V_ZoOkyjEyU)

	Track Submissions:
	- 🔧 Track 1 (Building MCP): Two custom MCP servers
	- Nuthatch MCP Server - 7 tools for bird species database (search, species info, images, audio, family search, conservation filtering)
	- Modal Bird Classifier MCP - 2 Modal-hosted GPU-powered image classification tools (base64 & URL inputs)
	- Categories: Enterprise (wildlife conservation) \| Consumer (bird enthusiasts and education) \| Creative (multimedia exploration)
	- 🤖 Track 2 (MCP in Action): Full multi-agent system with supervisor routing
	- LangGraph-based supervisor orchestrating 3 specialized subagents
	- Integrates both MCP servers with intelligent tool routing
	- Categories: Enterprise (conservation orgs) \| Consumer (bird watchers) \| Creative (educational multimedia)

	Author: [@facemelter](https://huggingface.co/facemelter)

	Built with: Gradio 6 \| LangGraph \| FastMCP \| Modal (GPU) \| OpenAI/Anthropic/HuggingFace LLMs

	---

	## 🌐 Project Overview

	BirdScope AI showcases an advanced multi-agent system powered by Gradio 6 and LangGraph, designed to identify bird species, explore multimedia content, and provide educational information about birds worldwide.

	Our innovation: We built two complete systems in one:
	- 🔧 Two Custom MCP Servers (Track 1): Nuthatch species database (7 tools) + Modal GPU classifier (2 tools)
	- 🤖 Multi-Agent Application (Track 2): Supervisor-orchestrated specialist agents

	This dual approach demonstrates both building MCP infrastructure and leveraging MCP for autonomous agents.

	---

	## ✨ Key Features

	### 🤖 Multi-Agent Orchestration
	- LangGraph Supervisor Pattern with intelligent LLM-based routing
	- 3 Specialized Subagents (Image Identifier, Species Explorer, Taxonomy Specialist)
	- Session-based Agent Caching - Agents reused within user sessions for 10x faster responses
	- Provider-Specific Prompts - Optimized system prompts for OpenAI, Anthropic, and HuggingFace

	### 🔧 Dual MCP Server Architecture
	- Modal Bird Classifier ([modal.com](https://modal.com))
	- [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) from HuggingFace
	- 526 bird species classification on Modal T4 GPU
	- Serverless GPU deployment for on-demand classification
	- Streamable HTTP transport with base64 and URL input support
	- Nuthatch MCP Server (Custom Built - Track 1)
	- FastMCP framework with 7 specialized tools
	- Integrates [Nuthatch API](https://nuthatch.lastelm.software) (1000+ species)
	- Dual Transport Support: STDIO (subprocess) for HF Spaces + HTTP for local debugging
	- Data sources: Nuthatch DB, Unsplash (images), xeno-canto (audio)

	### 📡 Dual Streaming Output
	- Chat Response Stream - Real-time markdown rendering with embedded media
	- Tool Execution Log Stream - Parallel visibility into MCP tool calls (inputs/outputs)
	- Async Progress Indicators - Immediate user feedback before processing begins

	### 🎨 Structured Output Parsing
	- LlamaIndex Pydantic Models - Type-safe response formatting
	- Regex URL Extraction - Automatic detection of image and audio URLs
	- Smart Audio Normalization - xeno-canto links converted to browser-friendly format (`/download` → playable)
	- Markdown Media Embedding - Images and audio automatically formatted

	### 🌐 Multi-Provider LLM Support
	- OpenAI (GPT-4o-mini) - Recommended for reliability
	- Anthropic (Claude Sonnet 4) - Best for complex reasoning
	- HuggingFace Inference API - Open-source models (limited tool calling)
	- User-Provided Keys - No backend API key required, users supply their own

	### 💅 Production UI/UX
	- Gradio 6.0 SSR - Server-side rendering for enhanced performance
	- Custom Cloud Theme - Sky-inspired CSS with mobile-responsive design
	- Dynamic Examples - Example queries adapt to selected agent mode
	- Instant Feedback - "⏳ Starting..." indicator appears immediately on submit

	---

	## 🗂️ Data Sources & MCP Servers

	We built two custom MCP servers that integrate with bird data APIs and GPU-powered classification:

	Data Sources:
	- Nuthatch API ([nuthatch.lastelm.software](https://nuthatch.lastelm.software)) - 1000+ bird species database by Last Elm Software
	- Unsplash - High-quality reference images for visual identification
	- xeno-canto.org - Community-contributed bird audio recordings worldwide
	- HuggingFace Model - [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) for GPU classification

	MCP Servers:
	1. Nuthatch MCP Server (Track 1 - Building MCP)
	- 7 specialized tools: search, species info, images, audio, family search, conservation filtering
	- STDIO transport for HF Spaces, HTTP option for local debugging
	- FastMCP framework with async API integration

	2. Modal Bird Classifier (GPU-powered)
	- Image classification tools: URL and base64 input support
	- Serverless GPU deployment via Modal
	- Streamable HTTP transport

	---

	## 🧩 Core Components

	Multi-Agent Orchestration:
	- LangGraph Supervisor Pattern - LLM-based routing between specialist agents
	- 3 Specialized Subagents - Each with focused tool subset (image ID, species exploration, taxonomy)
	- Session-based Caching - Agent instances reused within user sessions for performance
	- Dual Streaming - Parallel chat response + tool execution log streams

	Agent Architecture:
	- `subagent_supervisor.py` - Creates supervisor workflow with LangGraph
	- `subagent_factory.py` - Builds specialists with filtered tool access
	- `subagent_config.py` - Defines agent modes and tool allocations
	- `prompts.py` - Provider-specific system prompts (OpenAI, Anthropic, HuggingFace)

	UI & UX:
	- Gradio 6.0 with SSR for enhanced performance
	- Custom cloud-themed CSS with mobile-responsive design
	- Dynamic examples that adapt to agent mode selection
	- Immediate processing feedback with async streaming updates

	---

	## 🚀 Quick Start

	Try the Live Demo: Just provide your LLM API key (OpenAI, Anthropic, or HuggingFace) in the sidebar and start exploring!

	For Developers:
	```bash
	# Clone and install
	git clone <repo-url>
	cd hackathon_draft
	pip install -r requirements.txt

	# Configure environment
	cp .env.example .env
	# Edit .env with your API keys

	# Run locally
	python app.py
	```

	Deploy to HuggingFace Spaces:
	```bash
	python upload_to_space.py
	# Configure Secrets in Space Settings (see docs/dev/main-README.md)
	```

	Full Setup Guide: See [docs/dev/main-README.md](docs/dev/main-README.md) for comprehensive deployment instructions

	---

	## 🏆 Credits & License

	Built for the [HuggingFace MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday)

	Data Sources: [Nuthatch API](https://nuthatch.lastelm.software) (Last Elm Software) \| [xeno-canto.org](https://xeno-canto.org) \| [Unsplash](https://unsplash.com)

	Technology: [Model Context Protocol](https://github.com/anthropics/mcp) \| [LangGraph](https://github.com/langchain-ai/langgraph) \| [Gradio 6](https://gradio.app) \| [Modal](https://modal.com)

	MIT License - Educational and research purposes