Draft/Speculative decoding

by yano2mch - opened 8 days ago

8 days ago

Heh been looking for one to try out and i hadn't seen your stuff since the command-r v2 stuff a few months back. Time to give this a try :)

I'll give my impressions in a bit once i get a chance to try this.

yano2mch

7 days ago

Ahhh.... i see.. command-a isn't yours, you just made the draft model. Sorry for the confusion. oh well.

yano2mch changed discussion status to closed 7 days ago

jukofyork

Owner 7 days ago

No worries :)

yano2mch

6 days ago

Well i tried it anyways. But... the extra 40B (of the base model) is just too much so it runs at like 0.5T/s. 100B+ models are just too large/slow to be useful. even when reduced to Q2.

I mean it might work considerably better on the 16Gb video card i have but hard to say. We really need more VRam and consumer friendly options for these larger models over 70B.

Anyways. I'll still keep an eye out for your other -r writer models you might put out :)

yano2mch changed discussion status to open 6 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment