richardyoung commited on
Commit
f425c2a
·
verified ·
1 Parent(s): 2361f4c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -51,7 +51,40 @@ This is a **high-quality 8-bit quantized version** of Kimi K2 Instruct, optimize
51
 
52
  ## 🎯 Quick Start
53
 
54
- ### Installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ```bash
57
  pip install mlx-lm
 
51
 
52
  ## 🎯 Quick Start
53
 
54
+ #
55
+
56
+ ## Hardware Requirements
57
+
58
+ Kimi-K2 is a massive 671B parameter MoE model. Choose your quantization based on available unified memory:
59
+
60
+ | Quantization | Model Size | Min RAM | Quality |
61
+ |:------------:|:----------:|:-------:|:--------|
62
+ | **2-bit** | ~84 GB | 96 GB | Acceptable - some quality loss |
63
+ | **3-bit** | ~126 GB | 128 GB | Good - recommended minimum |
64
+ | **4-bit** | ~168 GB | 192 GB | Very Good - best quality/size balance |
65
+ | **5-bit** | ~210 GB | 256 GB | Excellent |
66
+ | **6-bit** | ~252 GB | 288 GB | Near original |
67
+ | **8-bit** | ~336 GB | 384 GB | Original quality |
68
+
69
+ ### Recommended Configurations
70
+
71
+ | Mac Model | Max RAM | Recommended Quantization |
72
+ |:----------|:-------:|:-------------------------|
73
+ | Mac Studio M2 Ultra | 192 GB | 4-bit |
74
+ | Mac Studio M4 Ultra | 512 GB | 8-bit |
75
+ | Mac Pro M2 Ultra | 192 GB | 4-bit |
76
+ | MacBook Pro M3 Max | 128 GB | 3-bit |
77
+ | MacBook Pro M4 Max | 128 GB | 3-bit |
78
+
79
+ ### Performance Notes
80
+
81
+ - **Inference Speed**: Expect ~5-15 tokens/sec depending on quantization and hardware
82
+ - **First Token Latency**: 10-30 seconds for model loading
83
+ - **Context Window**: Full 128K context supported
84
+ - **Active Parameters**: Only ~37B parameters active per token (MoE architecture)
85
+
86
+
87
+ ## Installation
88
 
89
  ```bash
90
  pip install mlx-lm