GGUF
draft
speculative-decoding
conversational

Draft/Speculative decoding

#1
by yano2mch - opened

Heh been looking for one to try out and i hadn't seen your stuff since the command-r v2 stuff a few months back. Time to give this a try :)

I'll give my impressions in a bit once i get a chance to try this.

Ahhh.... i see.. command-a isn't yours, you just made the draft model. Sorry for the confusion. oh well.

yano2mch changed discussion status to closed

No worries :)

Well i tried it anyways. But... the extra 40B (of the base model) is just too much so it runs at like 0.5T/s. 100B+ models are just too large/slow to be useful. even when reduced to Q2.

I mean it might work considerably better on the 16Gb video card i have but hard to say. We really need more VRam and consumer friendly options for these larger models over 70B.

Anyways. I'll still keep an eye out for your other -r writer models you might put out :)

yano2mch changed discussion status to open

Sign up or log in to comment