Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

Alikestocode commited on Nov 10

Commit

2dff966

1 Parent(s): a79bc8f

Fix linter error: use %pip instead of !pip in Colab notebook

Browse files

Files changed (1) hide show

quantize_to_awq_colab.ipynb +27 -2

quantize_to_awq_colab.ipynb CHANGED Viewed

@@ -29,8 +29,8 @@
       "outputs": [],
       "source": [
         "# Install required packages\n",
-        "!pip install -q autoawq transformers accelerate huggingface_hub\n",
-        "!pip install -q torch --index-url https://download.pytorch.org/whl/cu118\n"
       ]
     },
     {
@@ -354,6 +354,31 @@
         "for model_key, model_info in MODELS_TO_QUANTIZE.items():\n",
         "    verify_awq_model(model_info[\"output_repo\"])\n"
       ]
     }
   ],
   "metadata": {

       "outputs": [],
       "source": [
         "# Install required packages\n",
+        "%pip install -q autoawq transformers accelerate huggingface_hub\n",
+        "%pip install -q torch --index-url https://download.pytorch.org/whl/cu118\n"
       ]
     },
     {
         "for model_key, model_info in MODELS_TO_QUANTIZE.items():\n",
         "    verify_awq_model(model_info[\"output_repo\"])\n"
       ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Notes\n",
+        "\n",
+        "- **GPU Required**: This quantization requires a GPU with at least 40GB VRAM (A100/H100 recommended)\n",
+        "- **Time**: Each model takes approximately 30-60 minutes to quantize\n",
+        "- **Memory**: Ensure you have enough disk space (models are ~20-30GB each)\n",
+        "- **Output Repos**: You can either create new repos (with `-awq` suffix) or upload to existing repos\n",
+        "- **Usage**: After quantization, update your `app.py` to use the AWQ repos:\n",
+        "  ```python\n",
+        "  MODELS = {\n",
+        "      \"Router-Gemma3-27B-AWQ\": {\n",
+        "          \"repo_id\": \"Alovestocode/router-gemma3-merged-awq\",\n",
+        "          \"quantization\": \"awq\"\n",
+        "      },\n",
+        "      \"Router-Qwen3-32B-AWQ\": {\n",
+        "          \"repo_id\": \"Alovestocode/router-qwen3-32b-merged-awq\",\n",
+        "          \"quantization\": \"awq\"\n",
+        "      }\n",
+        "  }\n",
+        "  ```\n"
+      ]
     }
   ],
   "metadata": {