Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

Commit History

Fix QuantizationConfig: use config_groups with BaseQuantizationConfig

ecf6a69

Alikestocode commited on Nov 10, 2025

Fix AWQModifier: use quantization_config with num_bits

022b2da

Alikestocode commited on Nov 10, 2025

Add note about restarting kernel if AWQModifier errors occur

33a1d2e

Alikestocode commited on Nov 10, 2025

Simplify AWQModifier usage - remove try/except wrapper

e08f8c4

Alikestocode commited on Nov 10, 2025

Fix AWQModifier parameters - use default configuration

cef8ecd

Alikestocode commited on Nov 10, 2025

Fix delete_revisions import with fallback cache cleanup

7a2a590

Alikestocode commited on Nov 10, 2025

Fix delete_revisions import - use fallback cache cleanup method

4be72e0

Alikestocode commited on Nov 10, 2025

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization

f0033ab

Alikestocode commited on Nov 10, 2025

Fix LLM Compressor package name: llmcompressor (no hyphen)

2326498

Alikestocode commited on Nov 10, 2025

Remove duplicate LLM Compressor section - now primary method

d4bc333

Alikestocode commited on Nov 10, 2025

Replace AutoAWQ with LLM Compressor (vLLM native) in Colab notebook

ae07f77

Alikestocode commited on Nov 10, 2025

Add advanced vLLM and LLM Compressor optimizations

808203f

Alikestocode commited on Nov 10, 2025

Add disk space cleanup after quantization in Colab notebook

24107f3

Alikestocode commited on Nov 10, 2025

Fix linter error: use %pip instead of !pip in Colab notebook

2dff966

Alikestocode commited on Nov 10, 2025

Add Colab notebook for AWQ quantization of router models

a79bc8f

Alikestocode commited on Nov 10, 2025

Clarify LLM Compressor optional status - vLLM has native AWQ support

b2bf767

Alikestocode commited on Nov 10, 2025

Fix vLLM device detection for ZeroGPU

2ddfeca

Alikestocode commited on Nov 10, 2025

Fix vLLM token parameter and improve streaming error handling

b4fd5e9

Alikestocode commited on Nov 10, 2025

Add debug logging for model loading and generation issues

54880b1

Alikestocode commited on Nov 9, 2025

Fix streaming loop break condition - only break when finished is True

d6f9002

Alikestocode commited on Nov 9, 2025

Add deployment status document after re-authentication

1fb66ec

Alikestocode commited on Nov 9, 2025

Add permission fix guide for spherical-gate-477614-q7 project

162c75a

Alikestocode commited on Nov 8, 2025

Add Cloud Build deployment script and permission setup helper

fd26b3d

Alikestocode commited on Nov 8, 2025

Add Cloud Run PORT environment variable support

1b04006

Alikestocode commited on Nov 8, 2025

Add Google Cloud Platform deployment configurations

aa65d00

Alikestocode commited on Nov 8, 2025

Fix Gradio UI structure and add comprehensive fallback logging

03689e3

Alikestocode commited on Nov 8, 2025

Fix all indentation errors in Gradio UI components

06aef1b

Alikestocode commited on Nov 8, 2025

Fix syntax error: correct indentation in BitsAndBytes fallback block

f43bdac

Alikestocode commited on Nov 8, 2025

Suppress AutoAWQ deprecation warnings and improve vLLM logging

83a232d

Alikestocode commited on Nov 8, 2025

Implement vLLM with LLM Compressor and performance optimizations

a79facb

Alikestocode commited on Nov 8, 2025

Migrate to AWQ quantization with FlashAttention-2

06b4cf5

Alikestocode commited on Nov 8, 2025

Fix: Pre-create GPU wrappers at module load time for startup detection

cdac920

Alikestocode commited on Nov 8, 2025

Make GPU duration slider functional with dynamic wrapper creation

fc0ab14

Alikestocode commited on Nov 8, 2025

Fix indentation errors in _generate_router_plan_streaming_internal

c454e43

Alikestocode commited on Nov 8, 2025

Fix: Remove context manager usage for spaces.GPU decorator

a217627

Alikestocode commited on Nov 8, 2025

Add user-configurable GPU duration slider (60-1800 seconds)

9a4d6d3

Alikestocode commited on Nov 8, 2025

Fix: Move trim_at_stop_sequences function before it's used

597f1a9

Alikestocode commited on Nov 8, 2025

Add Gradio client API test script

de18e95

Alikestocode commited on Nov 8, 2025

Fix API launch configuration

9773e4b

Alikestocode commited on Nov 8, 2025

Enable API in Gradio launch configuration

1b16b00

Alikestocode commited on Nov 8, 2025

Update README and clean up old files

9592189

Alikestocode commited on Nov 7, 2025

Improve streaming with incremental JSON parsing and plan end token

f5a609d

Alikestocode commited on Nov 7, 2025

Add streaming support and increase max tokens to 20000

4f65341

Alikestocode commited on Nov 7, 2025

Fix deprecation warnings and improve error handling

bf2fdae

Alikestocode commited on Nov 7, 2025

Update app.py and requirements.txt for CourseGPT-Pro router models

4c3d05b

Alikestocode commited on Nov 7, 2025

Update README: Focus on CourseGPT-Pro router checkpoints

4706b45

Alikestocode commited on Nov 7, 2025

Update README with correct space URL

9af4b77

Alikestocode commited on Nov 7, 2025

Add .gitignore and remove cache files

7bc8a45

Alikestocode commited on Nov 7, 2025

Initial commit: ZeroGPU LLM Inference Space

f91e906

Alikestocode commited on Nov 7, 2025