What's your recipe?
Hello,
could you please share your recipe for proper conversion to F32 and then making smaller quants from it? I tried to reproduce with a small model, but while the F32 version seemed to be fine, when I tried to quantize, the resulting smaller quants seemed to have more quality loss than what they should have.
I'm very convinced that your conversion to F32 actually made the quality of the responses much better, which is why I'd like to experiment using that method with other models. Thanks in advance.
Hi - I just ran the standard conversion and quantisation scripts. I ran this using the (at the time) latest version of llama.cpp on a Windows laptop.
I am actually working on creating a better version of the Q3 compression models in particular, but it's a very slow process.