view article Article How Long Prompts Block Other Requests - Optimizing LLM Performance Jun 12, 2025 • 9
view post Post 4448 You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs: unsloth/Kimi-K2-Thinking-GGUFWe shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.We also collaborated with the Moonshot AI Kimi team on a system prompt fix! 🥰Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally See translation ❤️ 10 10 🚀 9 9 🔥 6 6 🤗 4 4 🤯 3 3 + Reply