Why is nothing smaller than Q4 quants?
#2
by
sbeltz
- opened
I was waiting for your quants, suspecting Unsloth's were larger due to their dynamic quantization. But it looks like nothing gets smaller than Q4 (18GB), even in your quants. Why is this?
Thanks!
ah that's a curious one! It looks like it's because most of the layers are not divisible by 256 which most smaller quants rely on, so they get upscaled to IQ4_NL...
sbeltz
changed discussion status to
closed