Alikestocode commited on
Commit
9592189
·
1 Parent(s): f5a609d

Update README and clean up old files

Browse files
Files changed (3) hide show
  1. CHANGELOG.md +0 -272
  2. README.md +38 -151
  3. README_OLD.md +0 -80
CHANGELOG.md DELETED
@@ -1,272 +0,0 @@
1
- # 📝 Changelog - UI/UX Improvement Session
2
-
3
- ## Session Date: October 12, 2025
4
-
5
- ## 🎯 Session Goals
6
- Review and improve the UI/UX for optimal balance between:
7
- - ✅ Aesthetic appeal
8
- - ✅ Simplicity of use
9
- - ✅ Advanced user needs
10
-
11
- ## 📦 Deliverables
12
-
13
- ### 1. Major UI/UX Overhaul
14
- **Commit**: `df40b1d` - Major UI/UX improvements for better user experience
15
-
16
- #### Visual Improvements
17
- - Modern gradient theme (indigo → purple)
18
- - Custom CSS with smooth transitions
19
- - Better typography (Inter font)
20
- - Improved spacing and visual hierarchy
21
- - Enhanced button designs with hover effects
22
- - Polished chatbot styling with shadows
23
-
24
- #### Layout Reorganization
25
- - Core settings always visible in organized groups
26
- - Advanced parameters in collapsible accordions
27
- - Web search settings auto-hide when disabled
28
- - Larger chat area (600px height)
29
- - Better input area with prominent Send button
30
-
31
- #### User Experience Enhancements
32
- - Example prompts for quick start
33
- - Info tooltips on all controls
34
- - Copy button on chat messages
35
- - Duration estimates visible
36
- - Debug info in collapsible panel
37
- - Clear visual feedback for all actions
38
-
39
- ### 2. Cancel Generation Feature Fixes
40
- **Commits**:
41
- - `9466288` - Fix cancel generation by removing GeneratorExit handler
42
- - `c49f312` - Fix GeneratorExit handling to prevent runtime error
43
- - `b7e5000` - Fix UI not resetting after cancel
44
-
45
- #### Problems Solved
46
- - ✅ Generation can now be stopped mid-stream
47
- - ✅ No more "generator ignored GeneratorExit" errors
48
- - ✅ UI properly resets after cancellation
49
- - ✅ Cancel button shows/hides correctly
50
-
51
- #### Technical Solution
52
- - Catch GeneratorExit and re-raise properly
53
- - Track cancellation state to prevent yielding
54
- - Chain reset handler after cancel button click
55
- - Clear cancel_event flag for next generation
56
-
57
- ### 3. Comprehensive Documentation
58
- **Commit**: `c1bc514` - Add comprehensive documentation and user guide
59
-
60
- #### README.md (Complete Rewrite)
61
- - Modern formatting with clear sections
62
- - Feature highlights with emojis
63
- - Model categorization by size
64
- - Technical flow explanation
65
- - Customization guide
66
- - Contributing guidelines
67
-
68
- #### USER_GUIDE.md (New)
69
- - 5-minute quick start tutorial
70
- - Detailed feature explanations
71
- - Advanced parameter guide with presets
72
- - Tips & tricks for better results
73
- - Troubleshooting section
74
- - Best practices for all user levels
75
- - Keyboard shortcuts reference
76
-
77
- #### UI_UX_IMPROVEMENTS.md (New)
78
- - Complete before/after comparison
79
- - Design principles explained
80
- - Technical implementation details
81
- - User benefits by role
82
- - Future enhancement roadmap
83
- - Lessons learned
84
-
85
- ### 4. Supporting Files
86
- **Files Created**:
87
- - `style.css` - Custom styling (later inlined)
88
- - `README_OLD.md` - Backup of original README
89
- - `USER_GUIDE.md` - Comprehensive user documentation
90
- - `UI_UX_IMPROVEMENTS.md` - Design documentation
91
-
92
- ## 📊 Changes Summary
93
-
94
- ### Code Changes
95
- ```
96
- app.py:
97
- - 309 lines added
98
- - 25 lines removed
99
- - Major: UI layout restructure
100
- - Major: Theme customization
101
- - Minor: Bug fixes for cancellation
102
- ```
103
-
104
- ### Documentation
105
- ```
106
- README.md: Complete rewrite (557 lines)
107
- USER_GUIDE.md: New file (300+ lines)
108
- UI_UX_IMPROVEMENTS.md: New file (223 lines)
109
- ```
110
-
111
- ### Git Activity
112
- ```
113
- 10 commits in this session
114
- 3 major feature additions
115
- Multiple bug fixes
116
- Clean commit history maintained
117
- ```
118
-
119
- ## 🎨 UI Components Modified
120
-
121
- ### Header
122
- - ✨ Gradient title styling
123
- - 📝 Subtitle added
124
- - 🎯 Clear value proposition
125
-
126
- ### Left Panel (Configuration)
127
- - 📦 Core settings group (always visible)
128
- - 🎛️ Advanced parameters accordion
129
- - 🌐 Web search settings accordion (conditional)
130
- - 🗑️ Clear chat button
131
- - ⏱️ Duration estimate display
132
-
133
- ### Right Panel (Chat)
134
- - 💬 Enhanced chatbot (copy buttons, avatars)
135
- - 📝 Improved input area
136
- - 📤 Prominent Send button
137
- - ⏹️ Smart Stop button (conditional)
138
- - 💡 Example prompts
139
- - 🔍 Debug accordion
140
-
141
- ### Footer
142
- - 💡 Usage tips
143
- - 🎯 Feature highlights
144
-
145
- ## 🔧 Technical Improvements
146
-
147
- ### Theme System
148
- ```python
149
- gr.themes.Soft(
150
- primary_hue="indigo",
151
- secondary_hue="purple",
152
- neutral_hue="slate",
153
- radius_size="lg"
154
- )
155
- ```
156
-
157
- ### CSS Enhancements
158
- - Custom duration estimate styling
159
- - Improved chatbot appearance
160
- - Button hover effects
161
- - Smooth transitions
162
- - Responsive design
163
-
164
- ### Event Handling
165
- - Smart web search settings toggle
166
- - Proper cancellation flow
167
- - UI state management
168
- - Error handling
169
-
170
- ## 🐛 Bugs Fixed
171
-
172
- 1. **Cancel Generation Not Working**
173
- - Root cause: GeneratorExit not properly propagated
174
- - Solution: Catch, track state, re-raise
175
-
176
- 2. **Runtime Error on Cancel**
177
- - Root cause: Yielding after GeneratorExit
178
- - Solution: Conditional yielding based on cancel state
179
-
180
- 3. **UI Not Resetting After Cancel**
181
- - Root cause: No reset handler after cancellation
182
- - Solution: Chain reset handler with .then()
183
-
184
- ## 📈 Impact Assessment
185
-
186
- ### For Users
187
- - **Beginners**: 50% easier to get started (examples, tooltips)
188
- - **Regular Users**: 30% more efficient (better organization)
189
- - **Power Users**: 100% feature accessibility (nothing removed)
190
-
191
- ### For Developers
192
- - **Maintainability**: Improved (cleaner structure)
193
- - **Extensibility**: Enhanced (modular components)
194
- - **Documentation**: Complete (3 comprehensive docs)
195
-
196
- ### For Project
197
- - **Professional Appearance**: Significantly improved
198
- - **User Satisfaction**: Expected 40% increase
199
- - **Feature Discovery**: 60% more discoverable
200
-
201
- ## 🎓 Lessons Learned
202
-
203
- 1. **Progressive Disclosure Works**: Hiding complexity helps
204
- 2. **Visual Polish Matters**: Aesthetics affect usability
205
- 3. **Examples Are Essential**: Lowers barrier to entry
206
- 4. **Organization Enables Discovery**: Proper grouping helps
207
- 5. **Feedback Is Critical**: Users need confirmation
208
-
209
- ## 🚀 Next Steps (Suggestions)
210
-
211
- ### Short Term
212
- - [ ] Add dark mode toggle
213
- - [ ] Implement preset saving/loading
214
- - [ ] Add more example prompts
215
- - [ ] Enable conversation export
216
-
217
- ### Medium Term
218
- - [ ] Custom theme builder
219
- - [ ] Prompt template library
220
- - [ ] Multi-language UI support
221
- - [ ] Mobile optimization
222
-
223
- ### Long Term
224
- - [ ] Plugin/extension system
225
- - [ ] Community preset sharing
226
- - [ ] Analytics dashboard
227
- - [ ] Advanced A/B testing
228
-
229
- ## 📊 Statistics
230
-
231
- ```
232
- Files Changed: 8
233
- Lines Added: 1,100+
234
- Lines Removed: 90
235
- Commits: 10
236
- Documentation: 3 new files
237
- CSS: Custom styling added
238
- Theme: Completely redesigned
239
- Bugs Fixed: 3 critical issues
240
- ```
241
-
242
- ## ✅ Session Outcomes
243
-
244
- ### Goals Achieved
245
- - ✅ Modern, aesthetic interface
246
- - ✅ Simple for beginners
247
- - ✅ Powerful for advanced users
248
- - ✅ Fully documented
249
- - ✅ All bugs fixed
250
- - ✅ Professional appearance
251
-
252
- ### Deliverables Completed
253
- - ✅ UI/UX redesign (100%)
254
- - ✅ Cancel feature fixed (100%)
255
- - ✅ Documentation written (100%)
256
- - ✅ Code committed & pushed (100%)
257
- - ✅ Testing & validation (100%)
258
-
259
- ## 🎉 Conclusion
260
-
261
- Successfully transformed the interface from a basic, utilitarian design into a modern, professional application that serves users at all skill levels. The combination of visual polish, smart organization, comprehensive documentation, and bug fixes creates a significantly improved user experience.
262
-
263
- The project is now:
264
- - **Production Ready**: Stable, polished, documented
265
- - **User Friendly**: Intuitive for all skill levels
266
- - **Developer Friendly**: Clean code, good documentation
267
- - **Maintainable**: Well-structured, modular design
268
- - **Extensible**: Easy to add new features
269
-
270
- ---
271
-
272
- **Session completed successfully! 🎊**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,177 +1,64 @@
1
  ---
2
- title: ZeroGPU-LLM-Inference
3
- emoji: 🧠
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
- license: apache-2.0
11
- short_description: Streaming LLM chat with web search and controls
12
  ---
13
 
14
- # 🧠 ZeroGPU LLM Inference
15
 
16
- A modern, user-friendly Gradio interface for **token-streaming, chat-style inference** across a wide variety of Transformer models—powered by ZeroGPU for free GPU acceleration on Hugging Face Spaces.
17
 
18
- ## ✨ Key Features
19
 
20
- ### 🎨 Modern UI/UX
21
- - **Clean, intuitive interface** with organized layout and visual hierarchy
22
- - **Collapsible advanced settings** for both simple and power users
23
- - **Smooth animations and transitions** for better user experience
24
- - **Responsive design** that works on all screen sizes
25
- - **Copy-to-clipboard** functionality for easy sharing of responses
26
 
27
- ### 🔍 Web Search Integration
28
- - **Real-time DuckDuckGo search** with background threading
29
- - **Configurable timeout** and result limits
30
- - **Automatic context injection** into system prompts
31
- - **Smart toggle** - search settings auto-hide when disabled
32
 
33
- ### 💡 Smart Features
34
- - **Thought vs. Answer streaming**: `<think>…</think>` blocks shown separately as "💭 Thought"
35
- - **Working cancel button** - immediately stops generation without errors
36
- - **Debug panel** for prompt engineering insights
37
- - **Duration estimates** based on model size and settings
38
- - **Example prompts** to help users get started
39
- - **Dynamic system prompts** with automatic date insertion
 
 
40
 
41
- ### 🎯 Model Variety
42
- - Purpose-built around **CourseGPT-Pro router checkpoints**
43
- - Two curated options: **Router-Qwen3-32B (8-bit)** and **Router-Gemma3-27B (8-bit)**
44
- - Both ship with the same JSON routing schema for math/code/general orchestration
45
- - **Efficient model loading** - one at a time with automatic cache clearing
46
 
47
- ### ⚙️ Advanced Controls
48
- - **Generation parameters**: max tokens, temperature, top-k, top-p, repetition penalty
49
- - **Web search settings**: max results, chars per result, timeout
50
- - **Custom system prompts** with dynamic date insertion
51
- - **Organized in collapsible sections** to keep interface clean
52
 
53
- ## 🔄 Supported Models
 
 
 
54
 
55
- - **Router-Qwen3-32B-8bit** Qwen3 32B base with CourseGPT-Pro routing LoRA merged and quantized for ZeroGPU. Best overall accuracy with modest latency.
56
- - **Router-Gemma3-27B-8bit** – Gemma3 27B base with the same router head, also in 8-bit. Slightly faster warm-up with a Gemma inductive bias that sometimes helps math-first prompts.
57
 
58
- ## 🚀 How It Works
59
-
60
- 1. **Select Model** - Choose from 30+ pre-configured models
61
- 2. **Configure Settings** - Adjust generation parameters or use defaults
62
- 3. **Enable Web Search** (optional) - Get real-time information
63
- 4. **Start Chatting** - Type your message or use example prompts
64
- 5. **Stream Response** - Watch as tokens are generated in real-time
65
- 6. **Cancel Anytime** - Stop generation mid-stream if needed
66
-
67
- ### Technical Flow
68
-
69
- 1. User message enters chat history
70
- 2. If search enabled, background thread fetches DuckDuckGo results
71
- 3. Search snippets merge into system prompt (within timeout limit)
72
- 4. Selected model pipeline loads on ZeroGPU (bf16→f16→f32 fallback)
73
- 5. Prompt formatted with thinking mode detection
74
- 6. Tokens stream to UI with thought/answer separation
75
- 7. Cancel button available for immediate interruption
76
- 8. Memory cleared after generation for next request
77
-
78
- ## ⚙️ Generation Parameters
79
-
80
- | Parameter | Range | Default | Description |
81
- |-----------|-------|---------|-------------|
82
- | Max Tokens | 64-16384 | 1024 | Maximum response length |
83
- | Temperature | 0.1-2.0 | 0.7 | Creativity vs focus |
84
- | Top-K | 1-100 | 40 | Token sampling pool size |
85
- | Top-P | 0.1-1.0 | 0.9 | Nucleus sampling threshold |
86
- | Repetition Penalty | 1.0-2.0 | 1.2 | Reduce repetition |
87
-
88
- ## 🌐 Web Search Settings
89
-
90
- | Setting | Range | Default | Description |
91
- |---------|-------|---------|-------------|
92
- | Max Results | Integer | 4 | Number of search results |
93
- | Max Chars/Result | Integer | 50 | Character limit per result |
94
- | Search Timeout | 0-30s | 5s | Maximum wait time |
95
-
96
- ## 💻 Local Development
97
 
98
  ```bash
99
- # Clone the repository
100
- git clone https://huggingface.co/spaces/Alovestocode/ZeroGPU-LLM-Inference
101
- cd ZeroGPU-LLM-Inference
102
-
103
- # Install dependencies
104
  pip install -r requirements.txt
105
-
106
- # Run the app
107
  python app.py
108
  ```
109
 
110
- ## 🎨 UI Design Philosophy
111
-
112
- The interface follows these principles:
113
-
114
- 1. **Simplicity First** - Core features immediately visible
115
- 2. **Progressive Disclosure** - Advanced options hidden but accessible
116
- 3. **Visual Hierarchy** - Clear organization with groups and sections
117
- 4. **Feedback** - Status indicators and helpful messages
118
- 5. **Accessibility** - Responsive, keyboard-friendly, with tooltips
119
-
120
- ## 🔧 Customization
121
-
122
- ### Adding New Models
123
-
124
- Edit `MODELS` dictionary in `app.py`:
125
-
126
- ```python
127
- "Your-Model-Name": {
128
- "repo_id": "org/model-name",
129
- "description": "Model description",
130
- "params_b": 7.0 # Size in billions
131
- }
132
- ```
133
-
134
- ### Modifying UI Theme
135
-
136
- Adjust theme parameters in `gr.Blocks()`:
137
-
138
- ```python
139
- theme=gr.themes.Soft(
140
- primary_hue="indigo",
141
- secondary_hue="purple",
142
- # ... more options
143
- )
144
- ```
145
-
146
- ## 📊 Performance
147
-
148
- - **Token streaming** for responsive feel
149
- - **Background search** doesn't block UI
150
- - **Efficient memory** management with cache clearing
151
- - **ZeroGPU acceleration** for fast inference
152
- - **Optimized loading** with dtype fallbacks
153
-
154
- ## 🤝 Contributing
155
-
156
- Contributions welcome! Areas for improvement:
157
-
158
- - Additional model integrations
159
- - UI/UX enhancements
160
- - Performance optimizations
161
- - Bug fixes and testing
162
- - Documentation improvements
163
-
164
- ## 📝 License
165
-
166
- Apache 2.0 - See LICENSE file for details
167
-
168
- ## 🙏 Acknowledgments
169
-
170
- - Built with [Gradio](https://gradio.app)
171
- - Powered by [Hugging Face Transformers](https://huggingface.co/transformers)
172
- - Uses [ZeroGPU](https://huggingface.co/zero-gpu-explorers) for acceleration
173
- - Search via [DuckDuckGo](https://duckduckgo.com)
174
-
175
- ---
176
 
177
- **Made with ❤️ for the open source community**
 
 
 
 
1
  ---
2
+ title: Router Control Room (ZeroGPU)
3
+ emoji: 🛰️
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: ZeroGPU UI for CourseGPT-Pro router checkpoints
12
  ---
13
 
14
+ # 🛰️ Router Control Room — ZeroGPU
15
 
16
+ This Space exposes the CourseGPT-Pro router checkpoints (Gemma3 27B + Qwen3 32B) with an opinionated Gradio UI. It runs entirely on ZeroGPU hardware using 8-bit loading so you can validate router JSON plans without paying for dedicated GPUs.
17
 
18
+ ## ✨ What’s Included
19
 
20
+ - **Router-specific prompt builder** – inject difficulty, tags, context, acceptance criteria, and additional guidance into the canonical router system prompt.
21
+ - **Two curated checkpoints** `Router-Qwen3-32B-8bit` and `Router-Gemma3-27B-8bit`, both merged and quantized for ZeroGPU.
22
+ - **JSON extraction + validation** output is parsed automatically and checked for the required router fields (route_plan, todo_list, metrics, etc.).
23
+ - **Raw output + prompt debug** inspect the verbatim generation and the exact prompt string sent to the checkpoint.
24
+ - **One-click clear** reset the UI between experiments without reloading models.
 
25
 
26
+ ## 🔄 Workflow
 
 
 
 
27
 
28
+ 1. Describe the user task / homework prompt in the main textbox.
29
+ 2. Optionally provide context, acceptance criteria, and extra guidance.
30
+ 3. Choose the difficulty tier, tags, model, and decoding parameters.
31
+ 4. Click **Generate Router Plan**.
32
+ 5. Review:
33
+ - **Raw Model Output** plain text returned by the LLM.
34
+ - **Parsed Router Plan** JSON tree extracted from the output.
35
+ - **Validation Panel** – confirms whether all required fields are present.
36
+ - **Full Prompt** – copy/paste for repro or benchmarking.
37
 
38
+ If JSON parsing fails, the validation panel will surface the error so you can tweak decoding parameters or the prompt.
 
 
 
 
39
 
40
+ ## 🧠 Supported Models
 
 
 
 
41
 
42
+ | Name | Base | Notes |
43
+ |------|------|-------|
44
+ | `Router-Qwen3-32B-8bit` | Qwen3 32B | Best overall acceptance on CourseGPT-Pro benchmarks. |
45
+ | `Router-Gemma3-27B-8bit` | Gemma3 27B | Slightly smaller, tends to favour math-first plans. |
46
 
47
+ Both checkpoints are merged + quantized in the `Alovestocode` namespace and require `HF_TOKEN` with read access.
 
48
 
49
+ ## ⚙️ Local Development
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ```bash
52
+ cd Milestone-6/router-agent/zero-gpu-space
53
+ python -m venv .venv && source .venv/bin/activate
 
 
 
54
  pip install -r requirements.txt
55
+ export HF_TOKEN=hf_xxx
 
56
  python app.py
57
  ```
58
 
59
+ ## 📝 Notes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
+ - The app always attempts 8-bit loading first (bitsandbytes). If that fails, it falls back to bf16/fp16/fp32.
62
+ - The UI enforces single-turn router generations; conversation history and web search are intentionally omitted to match the Milestone 6 deliverable.
63
+ - If you need to re-enable web search or more checkpoints, extend `MODELS` and adjust the prompt builder accordingly.
64
+ - **Benchmarking:** run `python Milestone-6/router-agent/tests/run_router_space_benchmark.py --space Alovestocode/ZeroGPU-LLM-Inference --limit 32` (requires `pip install gradio_client`) to call the Space, dump predictions, and evaluate against the Milestone 5 hard suite + thresholds.
README_OLD.md DELETED
@@ -1,80 +0,0 @@
1
- ---
2
- title: ZeroGPU-LLM-Inference
3
- emoji: 🧠
4
- colorFrom: pink
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: Streaming LLM chat with web search and debug
12
- ---
13
-
14
- This Gradio app provides **token-streaming, chat-style inference** on a wide variety of Transformer models—leveraging ZeroGPU for free GPU acceleration on HF Spaces.
15
-
16
- Key features:
17
- - **Real-time DuckDuckGo web search** (background thread, configurable timeout) with results injected into the system prompt.
18
- - **Prompt preview panel** for debugging and prompt-engineering insights—see exactly what’s sent to the model.
19
- - **Thought vs. Answer streaming**: any `<think>…</think>` blocks emitted by the model are shown as separate “💭 Thought.”
20
- - **Cancel button** to immediately stop generation.
21
- - **Dynamic system prompt**: automatically inserts today’s date when you toggle web search.
22
- - **Extensive model selection**: over 30 LLMs (from Phi-4 mini to Qwen3-14B, SmolLM2, Taiwan-ELM, Mistral, Meta-Llama, MiMo, Gemma, DeepSeek-R1, etc.).
23
- - **Memory-safe design**: loads one model at a time, clears cache after each generation.
24
- - **Customizable generation parameters**: max tokens, temperature, top-k, top-p, repetition penalty.
25
- - **Web-search settings**: max results, max chars per result, search timeout.
26
- - **Requirements pinned** to ensure reproducible deployment.
27
-
28
- ## 🔄 Supported Models
29
-
30
- Use the dropdown to select any of these:
31
-
32
- | Name | Repo ID |
33
- | ------------------------------------- | -------------------------------------------------- |
34
- | Taiwan-ELM-1_1B-Instruct | liswei/Taiwan-ELM-1_1B-Instruct |
35
- | Taiwan-ELM-270M-Instruct | liswei/Taiwan-ELM-270M-Instruct |
36
- | Qwen3-0.6B | Qwen/Qwen3-0.6B |
37
- | Qwen3-1.7B | Qwen/Qwen3-1.7B |
38
- | Qwen3-4B | Qwen/Qwen3-4B |
39
- | Qwen3-8B | Qwen/Qwen3-8B |
40
- | Qwen3-14B | Qwen/Qwen3-14B |
41
- | Gemma-3-4B-IT | unsloth/gemma-3-4b-it |
42
- | SmolLM2-135M-Instruct-TaiwanChat | Luigi/SmolLM2-135M-Instruct-TaiwanChat |
43
- | SmolLM2-135M-Instruct | HuggingFaceTB/SmolLM2-135M-Instruct |
44
- | SmolLM2-360M-Instruct-TaiwanChat | Luigi/SmolLM2-360M-Instruct-TaiwanChat |
45
- | Llama-3.2-Taiwan-3B-Instruct | lianghsun/Llama-3.2-Taiwan-3B-Instruct |
46
- | MiniCPM3-4B | openbmb/MiniCPM3-4B |
47
- | Qwen2.5-3B-Instruct | Qwen/Qwen2.5-3B-Instruct |
48
- | Qwen2.5-7B-Instruct | Qwen/Qwen2.5-7B-Instruct |
49
- | Phi-4-mini-Reasoning | microsoft/Phi-4-mini-reasoning |
50
- | Phi-4-mini-Instruct | microsoft/Phi-4-mini-instruct |
51
- | Meta-Llama-3.1-8B-Instruct | MaziyarPanahi/Meta-Llama-3.1-8B-Instruct |
52
- | DeepSeek-R1-Distill-Llama-8B | unsloth/DeepSeek-R1-Distill-Llama-8B |
53
- | Mistral-7B-Instruct-v0.3 | MaziyarPanahi/Mistral-7B-Instruct-v0.3 |
54
- | Qwen2.5-Coder-7B-Instruct | Qwen/Qwen2.5-Coder-7B-Instruct |
55
- | Qwen2.5-Omni-3B | Qwen/Qwen2.5-Omni-3B |
56
- | MiMo-7B-RL | XiaomiMiMo/MiMo-7B-RL |
57
-
58
- *(…and more can easily be added in `MODELS` in `app.py`.)*
59
-
60
- ## ⚙️ Generation & Search Parameters
61
-
62
- - **Max Tokens**: 64–16384
63
- - **Temperature**: 0.1–2.0
64
- - **Top-K**: 1–100
65
- - **Top-P**: 0.1–1.0
66
- - **Repetition Penalty**: 1.0–2.0
67
-
68
- - **Enable Web Search**: on/off
69
- - **Max Results**: integer
70
- - **Max Chars/Result**: integer
71
- - **Search Timeout (s)**: 0.0–30.0
72
-
73
- ## 🚀 How It Works
74
-
75
- 1. **User message** enters chat history.
76
- 2. If search is enabled, a background DuckDuckGo thread fetches snippets.
77
- 3. After up to *Search Timeout* seconds, snippets merge into the system prompt.
78
- 4. The selected model pipeline is loaded (bf16→f16→f32 fallback) on ZeroGPU.
79
- 5. Prompt is formatted—any `<think>…</think>` blocks will be streamed as separate “💭 Thought.”
80
- 6. Tokens stream to the Chatbot UI. Press **Cancel** to stop mid-generation.