MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 8 days ago • 43
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 8 days ago • 104
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated 28 days ago • 65
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 Text Generation • 67B • Updated 3 days ago • 1.66M • 264