geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32

What's your recipe?

by MrDevolver - opened 4 days ago

Discussion

MrDevolver

4 days ago

•

edited 4 days ago

Hello,

could you please share your recipe for proper conversion to F32 and then making smaller quants from it? I tried to reproduce with a small model, but while the F32 version seemed to be fine, when I tried to quantize, the resulting smaller quants seemed to have more quality loss than what they should have.

I'm very convinced that your conversion to F32 actually made the quality of the responses much better, which is why I'd like to experiment using that method with other models. Thanks in advance.

geoffmunn

Owner about 21 hours ago

Hi - I just ran the standard conversion and quantisation scripts. I ran this using the (at the time) latest version of llama.cpp on a Windows laptop.

I am actually working on creating a better version of the Q3 compression models in particular, but it's a very slow process.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment