Spaces:

zade-frontier
/

andrej-karpathy-llm-council

Running

App Files Files Community

Krishna Chaitanya Cheedella commited on 9 days ago

Commit

f3045de

1 Parent(s): 1e90386

Update deployment guides for OpenAI + HuggingFace setup

Browse files

Files changed (2) hide show

DEPLOYMENT_GUIDE.md +190 -148
QUICKSTART.md +64 -54

DEPLOYMENT_GUIDE.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # LLM Council - Comprehensive Guide
-## 📋 Overview
 The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
@@ -8,6 +8,8 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
 2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
 3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
 ## 🏗️ Architecture
 ### Current Implementation
@@ -41,143 +43,115 @@ The LLM Council is a sophisticated multi-agent system that uses multiple Large L
 └─────────────────────────────────────────────────────────────┘
 ```
-## 🔧 Current Models (Original)
-### Council Members
-- `openai/gpt-oss-120b:hyperbolic` - Open source model via Hyperbolic
-- `deepseek-ai/DeepSeek-V3.2-Exp:novita` - DeepSeek experimental via Novita
-- `Qwen/Qwen3-235B-A22B-Instruct-2507:hyperbolic` - Qwen large model
 ### Chairman
-- `deepseek-ai/DeepSeek-V3.2-Exp:novita`
-**Issues with Current Setup:**
-- Using experimental/beta endpoints which may be unstable
-- Limited diversity in model providers
-- Some models may not be optimally configured
-## ✨ IMPROVED Model Recommendations
-### Recommended Council (Balanced Quality & Cost)
 ```python
 COUNCIL_MODELS = [
-    "deepseek/deepseek-chat",           # DeepSeek V3 - excellent reasoning
-    "anthropic/claude-3.7-sonnet",      # Claude 3.7 - strong analysis
-    "openai/gpt-4o",                    # GPT-4o - reliable & versatile
-    "google/gemini-2.0-flash-thinking-exp:free",  # Fast thinking
-    "qwen/qwq-32b-preview",             # Strong reasoning
 ]
-CHAIRMAN_MODEL = "deepseek/deepseek-reasoner"  # DeepSeek R1 for synthesis
 ```
-### Alternative Configurations
-#### Budget Council (Fast & Cost-Effective)
-```python
-COUNCIL_MODELS = [
-    "deepseek/deepseek-chat",
-    "google/gemini-2.0-flash-exp:free",
-    "qwen/qwen-2.5-72b-instruct",
-    "meta-llama/llama-3.3-70b-instruct",
-]
-CHAIRMAN_MODEL = "deepseek/deepseek-chat"
-```
-#### Premium Council (Maximum Quality)
 ```python
 COUNCIL_MODELS = [
-    "anthropic/claude-3.7-sonnet",
-    "openai/o1",
-    "google/gemini-exp-1206",
-    "anthropic/claude-3-opus",
-    "x-ai/grok-2-1212",
 ]
-CHAIRMAN_MODEL = "openai/o1"  # or "anthropic/claude-3.7-sonnet"
-```
-#### Reasoning Council (Complex Problems)
-```python
-COUNCIL_MODELS = [
-    "openai/o1-mini",
-    "deepseek/deepseek-reasoner",
-    "google/gemini-2.0-flash-thinking-exp:free",
-    "qwen/qwq-32b-preview",
-]
-CHAIRMAN_MODEL = "deepseek/deepseek-reasoner"
 ```
 ## 🚀 Running on Hugging Face Spaces
 ### Prerequisites
-1. **OpenRouter API Key**: Sign up at [openrouter.ai](https://openrouter.ai/) and get your API key
-2. **Hugging Face Account**: Create account at [huggingface.co](https://huggingface.co/)
-### Step-by-Step Deployment
-#### Method 1: Using Existing Space (Fork)
-1. **Fork the Space**
-   - Visit your existing HuggingFace Space
-   - Click "⋮" → "Duplicate this Space"
-   - Choose a name for your space
-2. **Configure Secrets**
-   - Go to your space → Settings → Repository secrets
-   - Add secret: `OPENROUTER_API_KEY` with your OpenRouter API key
-3. **Update Models (Optional)**
-   - Edit `backend/config.py` to use recommended models
-   - Commit changes
-4. **Space Auto-Restarts**
-   - HF Spaces will automatically rebuild and deploy
-#### Method 2: Create New Space from Scratch
 1. **Create New Space**
-   ```
    - Go to huggingface.co/new-space
    - Choose "Gradio" as SDK
    - Select SDK version: 6.0.0
-   - Choose hardware: CPU (free) or GPU (paid)
-   ```
-2. **Upload Files**
    ```bash
-   # Clone your local repo
    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
-   # Copy your files
    cp -r /path/to/llm_council/* .
-   # Add and commit
    git add .
-   git commit -m "Initial commit"
    git push
    ```
-3. **Configure Space**
-   - Create `README.md` with metadata:
-   ```markdown
-   ---
-   title: LLM Council
-   emoji: 🏢
-   colorFrom: pink
-   colorTo: green
-   sdk: gradio
-   sdk_version: 6.0.0
-   app_file: app.py
-   pinned: false
-   ---
-   ```
-4. **Add Secret**
-   - Settings → Repository secrets → Add `OPENROUTER_API_KEY`
 ### Required Files Structure
@@ -198,10 +172,37 @@ your-space/
 ## 🔐 Environment Variables
 Create `.env` file locally (DO NOT commit to git):
 ```env
-OPENROUTER_API_KEY=your_openrouter_api_key_here
 ```
 For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
@@ -212,11 +213,14 @@ For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
 gradio>=6.0.0
 httpx>=0.27.0
 python-dotenv>=1.0.0
-fastapi>=0.115.0          # Optional - for REST API
-uvicorn>=0.30.0           # Optional - for REST API
-pydantic>=2.0.0           # Optional - for REST API
 ```
 ## 💻 Running Locally
 ```bash
@@ -231,8 +235,9 @@ source venv/bin/activate  # On Windows: venv\Scripts\activate
 # 3. Install dependencies
 pip install -r requirements.txt
-# 4. Create .env file
-echo "OPENROUTER_API_KEY=your_key_here" > .env
 # 5. Run the app
 python app.py
@@ -240,44 +245,56 @@ python app.py
 The app will be available at `http://localhost:7860`
-## 🔧 Code Improvements Made
-### 1. Enhanced Error Handling
-- Retry logic with exponential backoff
-- Graceful handling of model failures
-- Better timeout management
-- Detailed error logging
-### 2. Better Model Configuration
-- Updated to latest stable models
-- Multiple configuration presets
-- Configurable timeouts and retries
-- Clear documentation of alternatives
-### 3. Improved API Client
-- Proper HTTP headers (Referer, Title)
-- Robust streaming support
-- Better exception handling
-- Status reporting during parallel queries
-### 4. Documentation
-- Comprehensive deployment guide
-- Architecture diagrams
-- Configuration examples
-- Troubleshooting tips
 ## 📊 Performance Characteristics
-### Typical Response Times (Balanced Config)
-- **Stage 1**: 10-30 seconds (parallel execution)
-- **Stage 2**: 15-45 seconds (parallel ranking)
-- **Stage 3**: 20-60 seconds (synthesis)
-- **Total**: ~45-135 seconds per question
-### Cost per Query (Approximate)
-- Budget Council: $0.01 - $0.03
-- Balanced Council: $0.05 - $0.15
-- Premium Council: $0.20 - $0.50
 *Costs vary based on prompt length and response complexity*
@@ -285,36 +302,60 @@ The app will be available at `http://localhost:7860`
 ### Common Issues
-1. **"All models failed to respond"**
-   - Check API key is valid
-   - Verify OpenRouter credit balance
-   - Check model availability on OpenRouter
 2. **Timeout errors**
-   - Increase timeout in config
-   - Use faster models
    - Check network connectivity
 3. **Space won't start**
-   - Verify `requirements.txt` is correct
    - Check logs in Space → Logs tab
-   - Ensure Python version compatibility
-4. **Slow responses**
-   - Consider Budget Council configuration
-   - Reduce number of council members
-   - Use faster models
 ## 🎯 Best Practices
 1. **Model Selection**
-   - Use 3-5 council members (sweet spot)
-   - Choose diverse models from different providers
    - Match chairman to task complexity
 2. **Cost Management**
-   - Start with Budget Council for testing
-   - Monitor usage on OpenRouter dashboard
    - Set spending limits
 3. **Quality Optimization**
@@ -324,7 +365,8 @@ The app will be available at `http://localhost:7860`
 ## 📚 Additional Resources
-- [OpenRouter Documentation](https://openrouter.ai/docs)
 - [Gradio Documentation](https://gradio.app/docs)
 - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

 # LLM Council - Comprehensive Guide
+## 📝 Overview
 The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:
 2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
 3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs
+**Current Implementation**: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%)
 ## 🏗️ Architecture
 ### Current Implementation
 └─────────────────────────────────────────────────────────────┘
 ```
+## 🔧 Current Models (FREE HuggingFace + OpenAI)
+### Council Members (5 models)
+**FREE HuggingFace Models** (via Inference API):
+- `meta-llama/Llama-3.3-70B-Instruct` - Meta's latest Llama (FREE)
+- `Qwen/Qwen2.5-72B-Instruct` - Alibaba's Qwen (FREE)
+- `mistralai/Mixtral-8x7B-Instruct-v0.1` - Mistral MoE (FREE)
+**OpenAI Models** (paid but cheap):
+- `gpt-4o-mini` - Fast, affordable GPT-4 variant
+- `gpt-3.5-turbo` - Ultra cheap, still capable
 ### Chairman
+- `gpt-4o-mini` - Final synthesis model
+**Benefits of Current Setup:**
+- 60% of models are completely FREE (HuggingFace)
+- 40% use cheap OpenAI models ($0.001-0.01 per query)
+- 90-99% cost reduction compared to all-paid alternatives
+- No experimental/beta endpoints - all stable APIs
+- Diverse model providers for varied perspectives
+## ✨ Alternative Model Configurations
+### All-FREE Council (100% HuggingFace)
 ```python
 COUNCIL_MODELS = [
+    {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
+    {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
+    {"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"},
+    {"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"},
+    {"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"},
 ]
+CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}
 ```
+**Cost**: $0.00 per query!
+### Premium Council (OpenAI + HuggingFace)
 ```python
 COUNCIL_MODELS = [
+    {"provider": "openai", "model": "gpt-4o"},
+    {"provider": "openai", "model": "gpt-4-turbo"},
+    {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
+    {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
+    {"provider": "openai", "model": "gpt-3.5-turbo"},
 ]
+CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"}
 ```
+**Cost**: ~$0.05-0.15 per query
 ## 🚀 Running on Hugging Face Spaces
 ### Prerequisites
+1. **OpenAI API Key**:
+   - Sign up at [platform.openai.com](https://platform.openai.com/)
+   - Go to API Keys → Create new secret key
+   - Copy your key (starts with `sk-`)
+   - Add billing info and credits ($5-10 is plenty)
+2. **HuggingFace API Token**:
+   - Sign up at [huggingface.co](https://huggingface.co/)
+   - Go to Settings → Access Tokens → New token
+   - Copy your token (starts with `hf_`)
+   - FREE! No billing required
+3. **HuggingFace Account**: For deploying Spaces
+### Step-by-Step Deployment
+### Step-by-Step Deployment
+#### Method 1: Deploy Your Existing Code
 1. **Create New Space**
    - Go to huggingface.co/new-space
    - Choose "Gradio" as SDK
    - Select SDK version: 6.0.0
+   - Choose hardware: CPU (free)
+2. **Push Your Code**
    ```bash
+   # Clone your space
    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
+   # Copy your LLM Council code
    cp -r /path/to/llm_council/* .
+   # Commit and push
    git add .
+   git commit -m "Initial deployment"
    git push
    ```
+3. **Configure Secrets**
+   - Go to your space → Settings → Repository secrets
+   - Add secret #1:
+     - Name: `OPENAI_API_KEY`
+     - Value: (your OpenAI key starting with `sk-`)
+   - Add secret #2:
+     - Name: `HUGGINGFACE_API_KEY`
+     - Value: (your HuggingFace token starting with `hf_`)
+4. **Space Auto-Restarts**
+   - HF Spaces will automatically rebuild and deploy
+   - Check the "Logs" tab to verify successful startup
 ### Required Files Structure
 ## 🔐 Environment Variables
+### Required Variables
+**For Local Development** (`.env` file):
+```bash
+OPENAI_API_KEY=sk-proj-your-key-here
+HUGGINGFACE_API_KEY=hf_your-token-here
+```
+**For HuggingFace Spaces** (Settings → Repository secrets):
+- Secret 1: `OPENAI_API_KEY` = `sk-proj-...`
+- Secret 2: `HUGGINGFACE_API_KEY` = `hf_...`
+### API Endpoints Used
+**HuggingFace Inference API**:
+- Endpoint: `https://router.huggingface.co/v1/chat/completions`
+- Format: OpenAI-compatible
+- Cost: FREE for inference API
+- Models: Llama, Qwen, Mixtral, etc.
+**OpenAI API**:
+- Endpoint: `https://api.openai.com/v1/chat/completions`
+- Format: Native OpenAI
+- Cost: Pay-per-token (very cheap for mini/3.5-turbo)
+- Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o
 Create `.env` file locally (DO NOT commit to git):
 ```env
+OPENAI_API_KEY=sk-proj-your-key-here
+HUGGINGFACE_API_KEY=hf_your-token-here
 ```
 For Hugging Face Spaces, use Repository Secrets instead of `.env` file.
 gradio>=6.0.0
 httpx>=0.27.0
 python-dotenv>=1.0.0
+openai>=1.0.0             # For OpenAI API
 ```
+**Note**: The system uses:
+- `httpx` for async HTTP requests to HuggingFace API
+- `openai` SDK for OpenAI API calls
+- `python-dotenv` to load environment variables from `.env`
 ## 💻 Running Locally
 ```bash
 # 3. Install dependencies
 pip install -r requirements.txt
+# 4. Create .env file with both API keys
+echo OPENAI_API_KEY=sk-proj-your-key-here > .env
+echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env
 # 5. Run the app
 python app.py
 The app will be available at `http://localhost:7860`
+## 🔧 Code Architecture
+### Key Components
+**1. Dual API Client** (`backend/api_client.py`):
+- Supports both HuggingFace and OpenAI APIs
+- Automatic retry logic with exponential backoff
+- Graceful error handling and fallbacks
+- Parallel model querying for efficiency
+**2. FREE Model Configuration** (`backend/config_free.py`):
+- Mix of FREE HuggingFace + cheap OpenAI models
+- Configurable timeouts and retries
+- Easy to customize and extend
+**3. Council Orchestration** (`backend/council_free.py`):
+- Stage 1: Parallel response collection
+- Stage 2: Peer ranking system
+- Stage 3: Chairman synthesis with streaming
+### Error Handling Features
+- Retry logic with exponential backoff (3 attempts)
+- Graceful handling of individual model failures
+- Detailed error logging for debugging
+- Timeout management (60s default)
+### Benefits of Current Architecture
+- **Cost Efficient**: 60% FREE models, 40% ultra-cheap
+- **Robust**: Retry logic handles transient failures
+- **Fast**: Parallel execution minimizes wait time
+- **Flexible**: Easy to add/remove models
+- **Observable**: Detailed logging for debugging
 ## 📊 Performance Characteristics
+### Typical Response Times (Current Setup)
+- **Stage 1**: 10-30 seconds (5 models in parallel)
+- **Stage 2**: 15-45 seconds (peer rankings)
+- **Stage 3**: 15-40 seconds (synthesis with streaming)
+- **Total**: ~40-115 seconds per question
+### Cost per Query (Current Setup)
+- **FREE HuggingFace portion**: $0.00 (3 models)
+- **OpenAI portion**: $0.001-0.01 (2 models)
+- **Total**: ~$0.001-0.01 per query
+**Comparison to alternatives**:
+- 90-99% cheaper than all-paid services
+- Similar quality to premium setups
+- Faster than sequential execution
 *Costs vary based on prompt length and response complexity*
 ### Common Issues
+1. **"401 Unauthorized" errors**
+   - Check both API keys are set correctly
+   - Verify OpenAI key starts with `sk-`
+   - Verify HuggingFace key starts with `hf_`
+   - Ensure OpenAI account has billing/credits enabled
+   - Check Space secrets are named exactly: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
 2. **Timeout errors**
+   - Increase timeout in `backend/config_free.py`
    - Check network connectivity
+   - Some models may be slow - consider replacing
 3. **Space won't start**
+   - Verify `requirements.txt` includes all dependencies
    - Check logs in Space → Logs tab
+   - Ensure both secrets are added (not just one)
+   - Verify Python version compatibility (3.10+)
+4. **Some models fail, others work**
+   - Normal! System is designed to handle partial failures
+   - Check logs to see which models failed
+   - HuggingFace API may have rate limits (rare)
+   - OpenAI API requires billing setup
+5. **HuggingFace 410 error**
+   - Old endpoint deprecated
+   - Ensure using `router.huggingface.co/v1/chat/completions`
+   - Update `backend/api_client.py` if needed
 ## 🎯 Best Practices
 1. **Model Selection**
+   - Use 3-5 council members (sweet spot for quality vs speed)
+   - Mix FREE HuggingFace + cheap OpenAI for best value
+   - Choose diverse models for varied perspectives
    - Match chairman to task complexity
 2. **Cost Management**
+   - Start with current setup ($0.001-0.01 per query)
+   - Consider all-FREE HuggingFace config for $0 cost
+   - Monitor OpenAI usage at platform.openai.com/usage
+   - Set spending limits in OpenAI billing settings
+3. **Quality Optimization**
+   - Use more council members for important queries (5-7)
+   - Use better chairman (gpt-4o instead of gpt-4o-mini)
+   - Adjust timeouts based on model speed
+   - Test different model combinations
+4. **Security**
+   - NEVER commit .env to git (use .gitignore)
+   - Use HuggingFace Space secrets for production
+   - Rotate API keys periodically
+   - Monitor usage for anomalies
    - Set spending limits
 3. **Quality Optimization**
 ## 📚 Additional Resources
+- [OpenAI API Documentation](https://platform.openai.com/docs)
+- [HuggingFace Inference API](https://huggingface.co/docs/api-inference)
 - [Gradio Documentation](https://gradio.app/docs)
 - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

QUICKSTART.md CHANGED Viewed

@@ -9,11 +9,19 @@ A sophisticated multi-LLM system where multiple AI models:
 ## ⚡ Quick Setup (5 minutes)
-### 1️⃣ Get OpenRouter API Key
-1. Go to [openrouter.ai](https://openrouter.ai/)
 2. Sign up / Login
-3. Go to Keys → Create new key
-4. Copy your API key
 ### 2️⃣ Set Up Locally
@@ -22,10 +30,8 @@ A sophisticated multi-LLM system where multiple AI models:
 pip install -r requirements.txt
 # Create environment file
-cp .env.example .env
-# Edit .env and add your API key
-# OPENROUTER_API_KEY=your_key_here
 ```
 ### 3️⃣ Run It!
@@ -38,13 +44,7 @@ Visit `http://localhost:7860` 🎉
 ## 🌐 Deploy to Hugging Face Spaces (FREE)
-### Option A: Fork Existing Space
-1. Visit your HuggingFace Space
-2. Click "⋮" → "Duplicate this Space"
-3. Settings → Repository secrets → Add `OPENROUTER_API_KEY`
-4. Done! Your space will auto-deploy
-### Option B: Create New Space
 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
 2. Choose Gradio SDK 6.0.0
 3. Clone and push your code:
@@ -56,65 +56,70 @@ git add .
 git commit -m "Initial commit"
 git push
 ```
-4. Settings → Repository secrets → Add `OPENROUTER_API_KEY`
 ## 🎯 Usage Examples
 ### Simple Question
 ```
 Question: What is the capital of France?
-⏱️ Response time: ~30 seconds
-💰 Cost: ~$0.01
 ```
 ### Complex Analysis
 ```
 Question: Compare pros and cons of renewable energy
-⏱️ Response time: ~90 seconds
-💰 Cost: ~$0.07
 ```
-## 🔧 Use Improved Models
-Replace these files to use latest stable models:
-```bash
-# Backup originals
-mv backend/config.py backend/config_old.py
-mv backend/openrouter.py backend/openrouter_old.py
-# Use improved versions
-mv backend/config_improved.py backend/config.py
-mv backend/openrouter_improved.py backend/openrouter.py
-```
-**Improved models:**
-- DeepSeek V3 (Chat & Reasoner)
-- Claude 3.7 Sonnet
-- GPT-4o
-- Gemini 2.0 Flash Thinking
-- QwQ 32B
 ## 📊 Monitor Usage
-Check your costs at: [openrouter.ai/activity](https://openrouter.ai/activity)
-Typical costs:
-- Budget Council: $0.01-0.03 per query
-- Balanced Council: $0.05-0.15 per query
-- Premium Council: $0.20-0.50 per query
 ## ❓ Troubleshooting
-**"All models failed to respond"**
-- ✅ Check API key in .env
-- ✅ Verify OpenRouter credit balance
-- ✅ Test API key: https://openrouter.ai/playground
 **Space won't start on HF**
 - ✅ Check logs in Space → Logs tab
-- ✅ Verify secret name is exact: `OPENROUTER_API_KEY`
 - ✅ Ensure requirements.txt is present
 **Slow responses**
 - ✅ Normal! 3 stages take 45-135 seconds
@@ -128,20 +133,25 @@ Typical costs:
 ## 💡 Tips
-1. **Start with Budget Council** to test without spending much
-2. **Use Premium Council** for important questions
-3. **Monitor costs** in OpenRouter dashboard
-4. **Set spending limits** to avoid surprises
 ## 🎨 Customization
-Edit `backend/config.py` to:
-- Change council models
 - Adjust chairman model
 - Modify timeouts
 - Configure retries
-See `DEPLOYMENT_GUIDE.md` for preset configurations!
 ---

 ## ⚡ Quick Setup (5 minutes)
+### 1️⃣ Get API Keys
+**OpenAI API Key** (for GPT models):
+1. Go to [platform.openai.com](https://platform.openai.com/)
+2. Sign up / Login
+3. Go to API Keys → Create new secret key
+4. Copy your API key (starts with `sk-`)
+**HuggingFace API Key** (for FREE models):
+1. Go to [huggingface.co](https://huggingface.co/)
 2. Sign up / Login
+3. Go to Settings → Access Tokens → Create new token
+4. Copy your token (starts with `hf_`)
 ### 2️⃣ Set Up Locally
 pip install -r requirements.txt
 # Create environment file
+echo OPENAI_API_KEY=your_openai_key_here > .env
+echo HUGGINGFACE_API_KEY=your_hf_token_here >> .env
 ```
 ### 3️⃣ Run It!
 ## 🌐 Deploy to Hugging Face Spaces (FREE)
+### Step 1: Create New Space
 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
 2. Choose Gradio SDK 6.0.0
 3. Clone and push your code:
 git commit -m "Initial commit"
 git push
 ```
+### Step 2: Add API Keys as Secrets
+1. Go to your Space → Settings → Repository secrets
+2. Add first secret:
+   - Name: `OPENAI_API_KEY`
+   - Value: (your OpenAI API key starting with `sk-`)
+3. Add second secret:
+   - Name: `HUGGINGFACE_API_KEY`
+   - Value: (your HuggingFace token starting with `hf_`)
+4. Space will auto-restart and deploy!
 ## 🎯 Usage Examples
 ### Simple Question
 ```
 Question: What is the capital of France?
+⏱️ Response time: ~20-40 seconds
+💰 Cost: ~$0.001-0.005 (90% cheaper with FREE HF models!)
 ```
 ### Complex Analysis
 ```
 Question: Compare pros and cons of renewable energy
+⏱️ Response time: ~60-120 seconds
+💰 Cost: ~$0.005-0.015 (3 FREE HF models + 2 cheap OpenAI models)
 ```
+## 🤖 Current Models
+**FREE HuggingFace Models** (60% of council):
+- Meta Llama 3.3 70B Instruct
+- Qwen 2.5 72B Instruct
+- Mistral Mixtral 8x7B Instruct
+**OpenAI Models** (40% of council):
+- GPT-4o-mini (very cheap)
+- GPT-3.5-turbo (ultra cheap)
+**Chairman**: GPT-4o-mini (final synthesis)
 ## 📊 Monitor Usage
+**OpenAI Costs**: Check at [platform.openai.com/usage](https://platform.openai.com/usage)
+**HuggingFace**: FREE! No monitoring needed
+Typical costs per query:
+- **Current Setup**: $0.001-0.01 (90-99% cheaper than alternatives!)
+- 3 models are completely FREE (HuggingFace)
+- Only pay for OpenAI models (GPT-4o-mini, GPT-3.5-turbo)
 ## ❓ Troubleshooting
+**"401 Unauthorized" errors**
+- ✅ Check both API keys in .env (locally) or Space secrets (HuggingFace)
+- ✅ Verify OpenAI key starts with `sk-`
+- ✅ Verify HuggingFace key starts with `hf_`
+- ✅ Ensure OpenAI account has credits (check billing)
 **Space won't start on HF**
 - ✅ Check logs in Space → Logs tab
+- ✅ Verify secret names are exact: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`
 - ✅ Ensure requirements.txt is present
+- ✅ Both secrets must be added (not just one)
 **Slow responses**
 - ✅ Normal! 3 stages take 45-135 seconds
 ## 💡 Tips
+1. **Already using FREE models!** - 3 out of 5 models cost nothing
+2. **Very cheap**: Only ~$0.001-0.01 per query (OpenAI portion)
+3. **Monitor OpenAI usage** at platform.openai.com/usage
+4. **Set OpenAI spending limits** in billing settings to avoid surprises
 ## 🎨 Customization
+Edit `backend/config_free.py` to:
+- Change council models (add more FREE HuggingFace models!)
 - Adjust chairman model
 - Modify timeouts
 - Configure retries
+**FREE HuggingFace models you can add**:
+- `meta-llama/Llama-3.1-405B-Instruct` (huge!)
+- `mistralai/Mistral-Nemo-Instruct-2407`
+- `microsoft/Phi-3.5-MoE-instruct`
+See `backend/config_free.py` for examples!
 ---