See IQuest-Coder-V1-40B-Loop-Instruct MLX in action - demonstration video
q6.5bit mixed quant typically achieves 1.128 perplexity in our testing
| Quantization | Perplexity |
|---|---|
| q2.5 | 41.293 |
| q3.5 | 1.900 |
| q4.5 | 1.168 |
| q4.8 | 1.140 |
| q5.5 | 1.141 |
| q6.5 | 1.128 |
| q8.5 | 1.128 |
Usage Notes
Tested on a M3 Ultra using Inferencer app v1.9.1
- Single inference ~9 tokens/s @ 1000 tokens
- Batched inference ~14 total tokens/s across two inferences
- Memory usage: ~30 GB
Quantized with a modified version of MLX 0.30
For more details see demonstration video or visit IQuest-Coder-V1-40B-Instruct.
- Downloads last month
- 445