GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding Paper • 2605.15250 • Published May 14 • 14
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference Paper • 2605.07363 • Published May 8 • 12
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention Paper • 2603.28458 • Published Mar 30 • 44