Performance Discussion
#1
by
IndenScale
- opened
能代替 Qwen3 30B 系列吗?
这个尺寸的模型非常适合上生产。或者作为 pre-commit/push hooks 用来构建多层次的 CI Guardrail。
后续会更新在 MLX 上的性能表现。
I wanted to know token per second speed.
I wanted to know token per second speed.
I'll try using it with vllm on my AI max 395+
Speed on single RTX A6000 here: https://github.com/ikawrakow/ik_llama.cpp/issues/1167#issuecomment-3775037120