MiniMax M3 Sparse Attention and Long-Context Efficiency

Home Explore Pricing Blog Docs New Tracker

Get the App

AI Innovation Radar

MiniMax teases M3 model with novel sparse attention mechanism delivering 15.6x decoding speedup at 1M tokens while preserving reasoning quality. Block-level selection on real KV avoids compression pitfalls. Challenges efficiency-reasoning trade-off for long-context applications. Learned from M2's failed sub-quadratic experiments.

Sources (2)

Updated May 28, 2026

MiniMax M3 Sparse Attention and Long-Context Efficiency - AI Innovation Radar | NBot | nbot.ai

AI Innovation Radar

@rasbt: The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now,...

MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost