BirdScopeAI / README.md
facemelter's picture
oops final this time
2a6dab1 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: BirdScope AI - MCP Multi-Agent System
emoji: πŸ¦…
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
python_version: 3.11
app_file: app.py
pinned: false
license: mit
short_description: AI-powered bird identification with MCP multi-agent system
tags:
  - building-mcp-track-enterprise
  - building-mcp-track-consumer
  - building-mcp-track-creative
  - mcp-in-action-track-enterprise
  - mcp-in-action-track-consumer
  - mcp-in-action-track-creative

πŸ¦… BirdScope AI - MCP Multi-Agent System

AI-powered bird identification with specialized MCP agents

Built for the MCP 1st Birthday Hackathon


πŸ“’ Hackathon Submission

Social Media: Twitter/X Post

Demo Video: Watch on YouTube/Loom

Track Submissions:

  • πŸ”§ Track 1 (Building MCP): Two custom MCP servers
    • Nuthatch MCP Server - 7 tools for bird species database (search, species info, images, audio, family search, conservation filtering)
    • Modal Bird Classifier MCP - 2 Modal-hosted GPU-powered image classification tools (base64 & URL inputs)
    • Categories: Enterprise (wildlife conservation) | Consumer (bird enthusiasts and education) | Creative (multimedia exploration)
  • πŸ€– Track 2 (MCP in Action): Full multi-agent system with supervisor routing
    • LangGraph-based supervisor orchestrating 3 specialized subagents
    • Integrates both MCP servers with intelligent tool routing
    • Categories: Enterprise (conservation orgs) | Consumer (bird watchers) | Creative (educational multimedia)

Author: @facemelter

Built with: Gradio 6 | LangGraph | FastMCP | Modal (GPU) | OpenAI/Anthropic/HuggingFace LLMs


🌐 Project Overview

BirdScope AI showcases an advanced multi-agent system powered by Gradio 6 and LangGraph, designed to identify bird species, explore multimedia content, and provide educational information about birds worldwide.

Our innovation: We built two complete systems in one:

  • πŸ”§ Two Custom MCP Servers (Track 1): Nuthatch species database (7 tools) + Modal GPU classifier (2 tools)
  • πŸ€– Multi-Agent Application (Track 2): Supervisor-orchestrated specialist agents

This dual approach demonstrates both building MCP infrastructure and leveraging MCP for autonomous agents.


✨ Key Features

πŸ€– Multi-Agent Orchestration

  • LangGraph Supervisor Pattern with intelligent LLM-based routing
  • 3 Specialized Subagents (Image Identifier, Species Explorer, Taxonomy Specialist)
  • Session-based Agent Caching - Agents reused within user sessions for 10x faster responses
  • Provider-Specific Prompts - Optimized system prompts for OpenAI, Anthropic, and HuggingFace

πŸ”§ Dual MCP Server Architecture

  • Modal Bird Classifier (modal.com)
    • prithivMLmods/Bird-Species-Classifier-526 from HuggingFace
    • 526 bird species classification on Modal T4 GPU
    • Serverless GPU deployment for on-demand classification
    • Streamable HTTP transport with base64 and URL input support
  • Nuthatch MCP Server (Custom Built - Track 1)
    • FastMCP framework with 7 specialized tools
    • Integrates Nuthatch API (1000+ species)
    • Dual Transport Support: STDIO (subprocess) for HF Spaces + HTTP for local debugging
    • Data sources: Nuthatch DB, Unsplash (images), xeno-canto (audio)

πŸ“‘ Dual Streaming Output

  • Chat Response Stream - Real-time markdown rendering with embedded media
  • Tool Execution Log Stream - Parallel visibility into MCP tool calls (inputs/outputs)
  • Async Progress Indicators - Immediate user feedback before processing begins

🎨 Structured Output Parsing

  • LlamaIndex Pydantic Models - Type-safe response formatting
  • Regex URL Extraction - Automatic detection of image and audio URLs
  • Smart Audio Normalization - xeno-canto links converted to browser-friendly format (/download β†’ playable)
  • Markdown Media Embedding - Images and audio automatically formatted

🌐 Multi-Provider LLM Support

  • OpenAI (GPT-4o-mini) - Recommended for reliability
  • Anthropic (Claude Sonnet 4) - Best for complex reasoning
  • HuggingFace Inference API - Open-source models (limited tool calling)
  • User-Provided Keys - No backend API key required, users supply their own

πŸ’… Production UI/UX

  • Gradio 6.0 SSR - Server-side rendering for enhanced performance
  • Custom Cloud Theme - Sky-inspired CSS with mobile-responsive design
  • Dynamic Examples - Example queries adapt to selected agent mode
  • Instant Feedback - "⏳ Starting..." indicator appears immediately on submit

πŸ—‚οΈ Data Sources & MCP Servers

We built two custom MCP servers that integrate with bird data APIs and GPU-powered classification:

Data Sources:

MCP Servers:

  1. Nuthatch MCP Server (Track 1 - Building MCP)

    • 7 specialized tools: search, species info, images, audio, family search, conservation filtering
    • STDIO transport for HF Spaces, HTTP option for local debugging
    • FastMCP framework with async API integration
  2. Modal Bird Classifier (GPU-powered)

    • Image classification tools: URL and base64 input support
    • Serverless GPU deployment via Modal
    • Streamable HTTP transport

🧩 Core Components

Multi-Agent Orchestration:

  • LangGraph Supervisor Pattern - LLM-based routing between specialist agents
  • 3 Specialized Subagents - Each with focused tool subset (image ID, species exploration, taxonomy)
  • Session-based Caching - Agent instances reused within user sessions for performance
  • Dual Streaming - Parallel chat response + tool execution log streams

Agent Architecture:

  • subagent_supervisor.py - Creates supervisor workflow with LangGraph
  • subagent_factory.py - Builds specialists with filtered tool access
  • subagent_config.py - Defines agent modes and tool allocations
  • prompts.py - Provider-specific system prompts (OpenAI, Anthropic, HuggingFace)

UI & UX:

  • Gradio 6.0 with SSR for enhanced performance
  • Custom cloud-themed CSS with mobile-responsive design
  • Dynamic examples that adapt to agent mode selection
  • Immediate processing feedback with async streaming updates

πŸš€ Quick Start

Try the Live Demo: Just provide your LLM API key (OpenAI, Anthropic, or HuggingFace) in the sidebar and start exploring!

For Developers:

# Clone and install
git clone <repo-url>
cd hackathon_draft
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Run locally
python app.py

Deploy to HuggingFace Spaces:

python upload_to_space.py
# Configure Secrets in Space Settings (see docs/dev/main-README.md)

Full Setup Guide: See docs/dev/main-README.md for comprehensive deployment instructions


πŸ† Credits & License

Built for the HuggingFace MCP 1st Birthday Hackathon

Data Sources: Nuthatch API (Last Elm Software) | xeno-canto.org | Unsplash

Technology: Model Context Protocol | LangGraph | Gradio 6 | Modal

MIT License - Educational and research purposes