With this one get the hybrid attention model treatment?
#7
by
TomLucidor - opened
There are a lot of linear attention models, but not that many do reasoning, could this one convert some of the regular attention layers into linear layers?