MiniMax M3 Sparse Attention and Long-Context Efficiency
MiniMax teases M3 model with novel sparse attention mechanism delivering 15.6x decoding speedup at 1M tokens while preserving reasoning quality. Block-level selection on real KV avoids compression pitfalls. Challenges efficiency-reasoning trade-off for long-context applications. Learned from M2's failed sub-quadratic experiments.
Sources (2)
Updated May 28, 2026