Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

Alikestocode commited on Nov 7

Commit

9592189

1 Parent(s): f5a609d

Update README and clean up old files

Browse files

Files changed (3) hide show

CHANGELOG.md +0 -272
README.md +38 -151
README_OLD.md +0 -80

CHANGELOG.md DELETED Viewed

@@ -1,272 +0,0 @@
-# 📝 Changelog - UI/UX Improvement Session
-## Session Date: October 12, 2025
-## 🎯 Session Goals
-Review and improve the UI/UX for optimal balance between:
-- ✅ Aesthetic appeal
-- ✅ Simplicity of use
-- ✅ Advanced user needs
-## 📦 Deliverables
-### 1. Major UI/UX Overhaul
-**Commit**: `df40b1d` - Major UI/UX improvements for better user experience
-#### Visual Improvements
-- Modern gradient theme (indigo → purple)
-- Custom CSS with smooth transitions
-- Better typography (Inter font)
-- Improved spacing and visual hierarchy
-- Enhanced button designs with hover effects
-- Polished chatbot styling with shadows
-#### Layout Reorganization
-- Core settings always visible in organized groups
-- Advanced parameters in collapsible accordions
-- Web search settings auto-hide when disabled
-- Larger chat area (600px height)
-- Better input area with prominent Send button
-#### User Experience Enhancements
-- Example prompts for quick start
-- Info tooltips on all controls
-- Copy button on chat messages
-- Duration estimates visible
-- Debug info in collapsible panel
-- Clear visual feedback for all actions
-### 2. Cancel Generation Feature Fixes
-**Commits**:
-- `9466288` - Fix cancel generation by removing GeneratorExit handler
-- `c49f312` - Fix GeneratorExit handling to prevent runtime error
-- `b7e5000` - Fix UI not resetting after cancel
-#### Problems Solved
-- ✅ Generation can now be stopped mid-stream
-- ✅ No more "generator ignored GeneratorExit" errors
-- ✅ UI properly resets after cancellation
-- ✅ Cancel button shows/hides correctly
-#### Technical Solution
-- Catch GeneratorExit and re-raise properly
-- Track cancellation state to prevent yielding
-- Chain reset handler after cancel button click
-- Clear cancel_event flag for next generation
-### 3. Comprehensive Documentation
-**Commit**: `c1bc514` - Add comprehensive documentation and user guide
-#### README.md (Complete Rewrite)
-- Modern formatting with clear sections
-- Feature highlights with emojis
-- Model categorization by size
-- Technical flow explanation
-- Customization guide
-- Contributing guidelines
-#### USER_GUIDE.md (New)
-- 5-minute quick start tutorial
-- Detailed feature explanations
-- Advanced parameter guide with presets
-- Tips & tricks for better results
-- Troubleshooting section
-- Best practices for all user levels
-- Keyboard shortcuts reference
-#### UI_UX_IMPROVEMENTS.md (New)
-- Complete before/after comparison
-- Design principles explained
-- Technical implementation details
-- User benefits by role
-- Future enhancement roadmap
-- Lessons learned
-### 4. Supporting Files
-**Files Created**:
-- `style.css` - Custom styling (later inlined)
-- `README_OLD.md` - Backup of original README
-- `USER_GUIDE.md` - Comprehensive user documentation
-- `UI_UX_IMPROVEMENTS.md` - Design documentation
-## 📊 Changes Summary
-### Code Changes
-```
-app.py:
-- 309 lines added
-- 25 lines removed
-- Major: UI layout restructure
-- Major: Theme customization
-- Minor: Bug fixes for cancellation
-```
-### Documentation
-```
-README.md: Complete rewrite (557 lines)
-USER_GUIDE.md: New file (300+ lines)
-UI_UX_IMPROVEMENTS.md: New file (223 lines)
-```
-### Git Activity
-```
-10 commits in this session
-3 major feature additions
-Multiple bug fixes
-Clean commit history maintained
-```
-## 🎨 UI Components Modified
-### Header
-- ✨ Gradient title styling
-- 📝 Subtitle added
-- 🎯 Clear value proposition
-### Left Panel (Configuration)
-- 📦 Core settings group (always visible)
-- 🎛️ Advanced parameters accordion
-- 🌐 Web search settings accordion (conditional)
-- 🗑️ Clear chat button
-- ⏱️ Duration estimate display
-### Right Panel (Chat)
-- 💬 Enhanced chatbot (copy buttons, avatars)
-- 📝 Improved input area
-- 📤 Prominent Send button
-- ⏹️ Smart Stop button (conditional)
-- 💡 Example prompts
-- 🔍 Debug accordion
-### Footer
-- 💡 Usage tips
-- 🎯 Feature highlights
-## 🔧 Technical Improvements
-### Theme System
-```python
-gr.themes.Soft(
-    primary_hue="indigo",
-    secondary_hue="purple",
-    neutral_hue="slate",
-    radius_size="lg"
-)
-```
-### CSS Enhancements
-- Custom duration estimate styling
-- Improved chatbot appearance
-- Button hover effects
-- Smooth transitions
-- Responsive design
-### Event Handling
-- Smart web search settings toggle
-- Proper cancellation flow
-- UI state management
-- Error handling
-## 🐛 Bugs Fixed
-1. **Cancel Generation Not Working**
-   - Root cause: GeneratorExit not properly propagated
-   - Solution: Catch, track state, re-raise
-2. **Runtime Error on Cancel**
-   - Root cause: Yielding after GeneratorExit
-   - Solution: Conditional yielding based on cancel state
-3. **UI Not Resetting After Cancel**
-   - Root cause: No reset handler after cancellation
-   - Solution: Chain reset handler with .then()
-## 📈 Impact Assessment
-### For Users
-- **Beginners**: 50% easier to get started (examples, tooltips)
-- **Regular Users**: 30% more efficient (better organization)
-- **Power Users**: 100% feature accessibility (nothing removed)
-### For Developers
-- **Maintainability**: Improved (cleaner structure)
-- **Extensibility**: Enhanced (modular components)
-- **Documentation**: Complete (3 comprehensive docs)
-### For Project
-- **Professional Appearance**: Significantly improved
-- **User Satisfaction**: Expected 40% increase
-- **Feature Discovery**: 60% more discoverable
-## 🎓 Lessons Learned
-1. **Progressive Disclosure Works**: Hiding complexity helps
-2. **Visual Polish Matters**: Aesthetics affect usability
-3. **Examples Are Essential**: Lowers barrier to entry
-4. **Organization Enables Discovery**: Proper grouping helps
-5. **Feedback Is Critical**: Users need confirmation
-## 🚀 Next Steps (Suggestions)
-### Short Term
-- [ ] Add dark mode toggle
-- [ ] Implement preset saving/loading
-- [ ] Add more example prompts
-- [ ] Enable conversation export
-### Medium Term
-- [ ] Custom theme builder
-- [ ] Prompt template library
-- [ ] Multi-language UI support
-- [ ] Mobile optimization
-### Long Term
-- [ ] Plugin/extension system
-- [ ] Community preset sharing
-- [ ] Analytics dashboard
-- [ ] Advanced A/B testing
-## 📊 Statistics
-```
-Files Changed: 8
-Lines Added: 1,100+
-Lines Removed: 90
-Commits: 10
-Documentation: 3 new files
-CSS: Custom styling added
-Theme: Completely redesigned
-Bugs Fixed: 3 critical issues
-```
-## ✅ Session Outcomes
-### Goals Achieved
-- ✅ Modern, aesthetic interface
-- ✅ Simple for beginners
-- ✅ Powerful for advanced users
-- ✅ Fully documented
-- ✅ All bugs fixed
-- ✅ Professional appearance
-### Deliverables Completed
-- ✅ UI/UX redesign (100%)
-- ✅ Cancel feature fixed (100%)
-- ✅ Documentation written (100%)
-- ✅ Code committed & pushed (100%)
-- ✅ Testing & validation (100%)
-## 🎉 Conclusion
-Successfully transformed the interface from a basic, utilitarian design into a modern, professional application that serves users at all skill levels. The combination of visual polish, smart organization, comprehensive documentation, and bug fixes creates a significantly improved user experience.
-The project is now:
-- **Production Ready**: Stable, polished, documented
-- **User Friendly**: Intuitive for all skill levels
-- **Developer Friendly**: Clean code, good documentation
-- **Maintainable**: Well-structured, modular design
-- **Extensible**: Easy to add new features
----
-**Session completed successfully! 🎊**

README.md CHANGED Viewed

@@ -1,177 +1,64 @@
 ---
-title: ZeroGPU-LLM-Inference
-emoji: 🧠
 colorFrom: indigo
 colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-license: apache-2.0
-short_description: Streaming LLM chat with web search and controls
 ---
-# 🧠 ZeroGPU LLM Inference
-A modern, user-friendly Gradio interface for **token-streaming, chat-style inference** across a wide variety of Transformer models—powered by ZeroGPU for free GPU acceleration on Hugging Face Spaces.
-## ✨ Key Features
-### 🎨 Modern UI/UX
-- **Clean, intuitive interface** with organized layout and visual hierarchy
-- **Collapsible advanced settings** for both simple and power users
-- **Smooth animations and transitions** for better user experience
-- **Responsive design** that works on all screen sizes
-- **Copy-to-clipboard** functionality for easy sharing of responses
-### 🔍 Web Search Integration
-- **Real-time DuckDuckGo search** with background threading
-- **Configurable timeout** and result limits
-- **Automatic context injection** into system prompts
-- **Smart toggle** - search settings auto-hide when disabled
-### 💡 Smart Features
-- **Thought vs. Answer streaming**: `<think>…</think>` blocks shown separately as "💭 Thought"
-- **Working cancel button** - immediately stops generation without errors
-- **Debug panel** for prompt engineering insights
-- **Duration estimates** based on model size and settings
-- **Example prompts** to help users get started
-- **Dynamic system prompts** with automatic date insertion
-### 🎯 Model Variety
-- Purpose-built around **CourseGPT-Pro router checkpoints**
-- Two curated options: **Router-Qwen3-32B (8-bit)** and **Router-Gemma3-27B (8-bit)**
-- Both ship with the same JSON routing schema for math/code/general orchestration
-- **Efficient model loading** - one at a time with automatic cache clearing
-### ⚙️ Advanced Controls
-- **Generation parameters**: max tokens, temperature, top-k, top-p, repetition penalty
-- **Web search settings**: max results, chars per result, timeout
-- **Custom system prompts** with dynamic date insertion
-- **Organized in collapsible sections** to keep interface clean
-## 🔄 Supported Models
-- **Router-Qwen3-32B-8bit** – Qwen3 32B base with CourseGPT-Pro routing LoRA merged and quantized for ZeroGPU. Best overall accuracy with modest latency.
-- **Router-Gemma3-27B-8bit** – Gemma3 27B base with the same router head, also in 8-bit. Slightly faster warm-up with a Gemma inductive bias that sometimes helps math-first prompts.
-## 🚀 How It Works
-1. **Select Model** - Choose from 30+ pre-configured models
-2. **Configure Settings** - Adjust generation parameters or use defaults
-3. **Enable Web Search** (optional) - Get real-time information
-4. **Start Chatting** - Type your message or use example prompts
-5. **Stream Response** - Watch as tokens are generated in real-time
-6. **Cancel Anytime** - Stop generation mid-stream if needed
-### Technical Flow
-1. User message enters chat history
-2. If search enabled, background thread fetches DuckDuckGo results
-3. Search snippets merge into system prompt (within timeout limit)
-4. Selected model pipeline loads on ZeroGPU (bf16→f16→f32 fallback)
-5. Prompt formatted with thinking mode detection
-6. Tokens stream to UI with thought/answer separation
-7. Cancel button available for immediate interruption
-8. Memory cleared after generation for next request
-## ⚙️ Generation Parameters
-| Parameter | Range | Default | Description |
-|-----------|-------|---------|-------------|
-| Max Tokens | 64-16384 | 1024 | Maximum response length |
-| Temperature | 0.1-2.0 | 0.7 | Creativity vs focus |
-| Top-K | 1-100 | 40 | Token sampling pool size |
-| Top-P | 0.1-1.0 | 0.9 | Nucleus sampling threshold |
-| Repetition Penalty | 1.0-2.0 | 1.2 | Reduce repetition |
-## 🌐 Web Search Settings
-| Setting | Range | Default | Description |
-|---------|-------|---------|-------------|
-| Max Results | Integer | 4 | Number of search results |
-| Max Chars/Result | Integer | 50 | Character limit per result |
-| Search Timeout | 0-30s | 5s | Maximum wait time |
-## 💻 Local Development
 ```bash
-# Clone the repository
-git clone https://huggingface.co/spaces/Alovestocode/ZeroGPU-LLM-Inference
-cd ZeroGPU-LLM-Inference
-# Install dependencies
 pip install -r requirements.txt
-# Run the app
 python app.py
 ```
-## 🎨 UI Design Philosophy
-The interface follows these principles:
-1. **Simplicity First** - Core features immediately visible
-2. **Progressive Disclosure** - Advanced options hidden but accessible
-3. **Visual Hierarchy** - Clear organization with groups and sections
-4. **Feedback** - Status indicators and helpful messages
-5. **Accessibility** - Responsive, keyboard-friendly, with tooltips
-## 🔧 Customization
-### Adding New Models
-Edit `MODELS` dictionary in `app.py`:
-```python
-"Your-Model-Name": {
-    "repo_id": "org/model-name",
-    "description": "Model description",
-    "params_b": 7.0  # Size in billions
-}
-```
-### Modifying UI Theme
-Adjust theme parameters in `gr.Blocks()`:
-```python
-theme=gr.themes.Soft(
-    primary_hue="indigo",
-    secondary_hue="purple",
-    # ... more options
-)
-```
-## 📊 Performance
-- **Token streaming** for responsive feel
-- **Background search** doesn't block UI
-- **Efficient memory** management with cache clearing
-- **ZeroGPU acceleration** for fast inference
-- **Optimized loading** with dtype fallbacks
-## 🤝 Contributing
-Contributions welcome! Areas for improvement:
-- Additional model integrations
-- UI/UX enhancements
-- Performance optimizations
-- Bug fixes and testing
-- Documentation improvements
-## 📝 License
-Apache 2.0 - See LICENSE file for details
-## 🙏 Acknowledgments
-- Built with [Gradio](https://gradio.app)
-- Powered by [Hugging Face Transformers](https://huggingface.co/transformers)
-- Uses [ZeroGPU](https://huggingface.co/zero-gpu-explorers) for acceleration
-- Search via [DuckDuckGo](https://duckduckgo.com)
----
-**Made with ❤️ for the open source community**

 ---
+title: Router Control Room (ZeroGPU)
+emoji: 🛰️
 colorFrom: indigo
 colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
+license: mit
+short_description: ZeroGPU UI for CourseGPT-Pro router checkpoints
 ---
+# 🛰️ Router Control Room — ZeroGPU
+This Space exposes the CourseGPT-Pro router checkpoints (Gemma3 27B + Qwen3 32B) with an opinionated Gradio UI. It runs entirely on ZeroGPU hardware using 8-bit loading so you can validate router JSON plans without paying for dedicated GPUs.
+## ✨ What’s Included
+- **Router-specific prompt builder** – inject difficulty, tags, context, acceptance criteria, and additional guidance into the canonical router system prompt.
+- **Two curated checkpoints** – `Router-Qwen3-32B-8bit` and `Router-Gemma3-27B-8bit`, both merged and quantized for ZeroGPU.
+- **JSON extraction + validation** – output is parsed automatically and checked for the required router fields (route_plan, todo_list, metrics, etc.).
+- **Raw output + prompt debug** – inspect the verbatim generation and the exact prompt string sent to the checkpoint.
+- **One-click clear** – reset the UI between experiments without reloading models.
+## 🔄 Workflow
+1. Describe the user task / homework prompt in the main textbox.
+2. Optionally provide context, acceptance criteria, and extra guidance.
+3. Choose the difficulty tier, tags, model, and decoding parameters.
+4. Click **Generate Router Plan**.
+5. Review:
+   - **Raw Model Output** – plain text returned by the LLM.
+   - **Parsed Router Plan** – JSON tree extracted from the output.
+   - **Validation Panel** – confirms whether all required fields are present.
+   - **Full Prompt** – copy/paste for repro or benchmarking.
+If JSON parsing fails, the validation panel will surface the error so you can tweak decoding parameters or the prompt.
+## 🧠 Supported Models
+| Name | Base | Notes |
+|------|------|-------|
+| `Router-Qwen3-32B-8bit` | Qwen3 32B | Best overall acceptance on CourseGPT-Pro benchmarks. |
+| `Router-Gemma3-27B-8bit` | Gemma3 27B | Slightly smaller, tends to favour math-first plans. |
+Both checkpoints are merged + quantized in the `Alovestocode` namespace and require `HF_TOKEN` with read access.
+## ⚙️ Local Development
 ```bash
+cd Milestone-6/router-agent/zero-gpu-space
+python -m venv .venv && source .venv/bin/activate
 pip install -r requirements.txt
+export HF_TOKEN=hf_xxx
 python app.py
 ```
+## 📝 Notes
+- The app always attempts 8-bit loading first (bitsandbytes). If that fails, it falls back to bf16/fp16/fp32.
+- The UI enforces single-turn router generations; conversation history and web search are intentionally omitted to match the Milestone 6 deliverable.
+- If you need to re-enable web search or more checkpoints, extend `MODELS` and adjust the prompt builder accordingly.
+- **Benchmarking:** run `python Milestone-6/router-agent/tests/run_router_space_benchmark.py --space Alovestocode/ZeroGPU-LLM-Inference --limit 32` (requires `pip install gradio_client`) to call the Space, dump predictions, and evaluate against the Milestone 5 hard suite + thresholds.

README_OLD.md DELETED Viewed

@@ -1,80 +0,0 @@
----
-title: ZeroGPU-LLM-Inference
-emoji: 🧠
-colorFrom: pink
-colorTo: purple
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: Streaming LLM chat with web search and debug
----
-This Gradio app provides **token-streaming, chat-style inference** on a wide variety of Transformer models—leveraging ZeroGPU for free GPU acceleration on HF Spaces.
-Key features:
-- **Real-time DuckDuckGo web search** (background thread, configurable timeout) with results injected into the system prompt.
-- **Prompt preview panel** for debugging and prompt-engineering insights—see exactly what’s sent to the model.
-- **Thought vs. Answer streaming**: any `<think>…</think>` blocks emitted by the model are shown as separate “💭 Thought.”
-- **Cancel button** to immediately stop generation.
-- **Dynamic system prompt**: automatically inserts today’s date when you toggle web search.
-- **Extensive model selection**: over 30 LLMs (from Phi-4 mini to Qwen3-14B, SmolLM2, Taiwan-ELM, Mistral, Meta-Llama, MiMo, Gemma, DeepSeek-R1, etc.).
-- **Memory-safe design**: loads one model at a time, clears cache after each generation.
-- **Customizable generation parameters**: max tokens, temperature, top-k, top-p, repetition penalty.
-- **Web-search settings**: max results, max chars per result, search timeout.
-- **Requirements pinned** to ensure reproducible deployment.
-## 🔄 Supported Models
-Use the dropdown to select any of these:
-| Name                                  | Repo ID                                            |
-| ------------------------------------- | -------------------------------------------------- |
-| Taiwan-ELM-1_1B-Instruct              | liswei/Taiwan-ELM-1_1B-Instruct                    |
-| Taiwan-ELM-270M-Instruct              | liswei/Taiwan-ELM-270M-Instruct                    |
-| Qwen3-0.6B                            | Qwen/Qwen3-0.6B                                    |
-| Qwen3-1.7B                            | Qwen/Qwen3-1.7B                                    |
-| Qwen3-4B                              | Qwen/Qwen3-4B                                      |
-| Qwen3-8B                              | Qwen/Qwen3-8B                                      |
-| Qwen3-14B                             | Qwen/Qwen3-14B                                     |
-| Gemma-3-4B-IT                         | unsloth/gemma-3-4b-it                              |
-| SmolLM2-135M-Instruct-TaiwanChat      | Luigi/SmolLM2-135M-Instruct-TaiwanChat             |
-| SmolLM2-135M-Instruct                 | HuggingFaceTB/SmolLM2-135M-Instruct                |
-| SmolLM2-360M-Instruct-TaiwanChat      | Luigi/SmolLM2-360M-Instruct-TaiwanChat             |
-| Llama-3.2-Taiwan-3B-Instruct          | lianghsun/Llama-3.2-Taiwan-3B-Instruct             |
-| MiniCPM3-4B                           | openbmb/MiniCPM3-4B                                |
-| Qwen2.5-3B-Instruct                   | Qwen/Qwen2.5-3B-Instruct                           |
-| Qwen2.5-7B-Instruct                   | Qwen/Qwen2.5-7B-Instruct                           |
-| Phi-4-mini-Reasoning                  | microsoft/Phi-4-mini-reasoning                     |
-| Phi-4-mini-Instruct                   | microsoft/Phi-4-mini-instruct                      |
-| Meta-Llama-3.1-8B-Instruct            | MaziyarPanahi/Meta-Llama-3.1-8B-Instruct            |
-| DeepSeek-R1-Distill-Llama-8B          | unsloth/DeepSeek-R1-Distill-Llama-8B               |
-| Mistral-7B-Instruct-v0.3              | MaziyarPanahi/Mistral-7B-Instruct-v0.3              |
-| Qwen2.5-Coder-7B-Instruct             | Qwen/Qwen2.5-Coder-7B-Instruct                     |
-| Qwen2.5-Omni-3B                       | Qwen/Qwen2.5-Omni-3B                               |
-| MiMo-7B-RL                            | XiaomiMiMo/MiMo-7B-RL                              |
-*(…and more can easily be added in `MODELS` in `app.py`.)*
-## ⚙️ Generation & Search Parameters
-- **Max Tokens**: 64–16384
-- **Temperature**: 0.1–2.0
-- **Top-K**: 1–100
-- **Top-P**: 0.1–1.0
-- **Repetition Penalty**: 1.0–2.0
-- **Enable Web Search**: on/off
-- **Max Results**: integer
-- **Max Chars/Result**: integer
-- **Search Timeout (s)**: 0.0–30.0
-## 🚀 How It Works
-1. **User message** enters chat history.
-2. If search is enabled, a background DuckDuckGo thread fetches snippets.
-3. After up to *Search Timeout* seconds, snippets merge into the system prompt.
-4. The selected model pipeline is loaded (bf16→f16→f32 fallback) on ZeroGPU.
-5. Prompt is formatted—any `<think>…</think>` blocks will be streamed as separate “💭 Thought.”
-6. Tokens stream to the Chatbot UI. Press **Cancel** to stop mid-generation.