Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

103 kB

1 contributor

History: 35 commits

Alikestocode's picture

Add Colab notebook for AWQ quantization of router models

a79bc8f about 1 month ago

.dockerignore
104 Bytes

Add Google Cloud Platform deployment configurations about 2 months ago
.gitattributes
1.52 kB

Initial commit: ZeroGPU LLM Inference Space about 2 months ago
.gitignore
27 Bytes

Add .gitignore and remove cache files about 2 months ago
DEPLOYMENT_STATUS.md
2.21 kB

Add deployment status document after re-authentication about 1 month ago
Dockerfile
680 Bytes

Add Google Cloud Platform deployment configurations about 2 months ago
FIX_PERMISSIONS.md
2.05 kB

Add permission fix guide for spherical-gate-477614-q7 project about 2 months ago
QUANTIZE_AWQ.md
3.22 kB

Add Colab notebook for AWQ quantization of router models about 1 month ago
QUICK_DEPLOY.md
2.86 kB

Add Cloud Build deployment script and permission setup helper about 2 months ago
README.md
4.23 kB

Implement vLLM with LLM Compressor and performance optimizations about 2 months ago
app.py
39.4 kB

Clarify LLM Compressor optional status - vLLM has native AWQ support about 1 month ago
apt.txt
11 Bytes

Initial commit: ZeroGPU LLM Inference Space about 2 months ago
cloudbuild.yaml
1.36 kB

Add Cloud Build deployment script and permission setup helper about 2 months ago
deploy-cloud-build.sh
3.31 kB

Add Cloud Build deployment script and permission setup helper about 2 months ago
deploy-compute-engine.sh
4.23 kB

Add Google Cloud Platform deployment configurations about 2 months ago
deploy-gcp.sh
2.67 kB

Add Google Cloud Platform deployment configurations about 2 months ago
gcp-deployment.md
5.32 kB

Add Google Cloud Platform deployment configurations about 2 months ago
quantize_to_awq_colab.ipynb
13.8 kB

Add Colab notebook for AWQ quantization of router models about 1 month ago
requirements.txt
397 Bytes

Clarify LLM Compressor optional status - vLLM has native AWQ support about 1 month ago
setup-gcp-permissions.sh
1.8 kB

Add Cloud Build deployment script and permission setup helper about 2 months ago
style.css
2.84 kB

Initial commit: ZeroGPU LLM Inference Space about 2 months ago
test_api.py
3.43 kB

Migrate to AWQ quantization with FlashAttention-2 about 2 months ago
test_api_gradio_client.py
7.2 kB

Implement vLLM with LLM Compressor and performance optimizations about 2 months ago