LLM Insight Tracker

推理/后训练架构突破:Gemma 4 MoE open (RTX 4090 162t/s/Jetson crushes Llama4/Qwen3.5/3.6) + Meta OSS upcoming + TAPS/HISA + post-transformer + Salomi + MLPerf v6 + DataFlex + Unsloth + TPU fine-tune + LeJEPA + Swift-SVD compression + Test-time scaling + NVIDIA OSS push + Geometric Alignment Tax + TriAttention/LightThinker++/Qualcomm MX mobile

推理/后训练架构突破:Gemma 4 MoE open (RTX 4090 162t/s/Jetson crushes Llama4/Qwen3.5/3.6) + Meta OSS upcoming + TAPS/HISA + post-transformer + Salomi + MLPerf v6 + DataFlex + Unsloth + TPU fine-tune + LeJEPA + Swift-SVD compression + Test-time scaling + NVIDIA OSS push + Geometric Alignment Tax + TriAttention/LightThinker++/Qualcomm MX mobile

Key Questions

How does Gemma 4 perform compared to other models?

Gemma 4 31B MoE beats GPT-5.4 on AIME/Codeforces and runs at 162t/s on RTX 4090, crushing Llama4/Qwen3.5/3.6 on Jetson.

What is Meta planning for open-source models?

Meta will open-source safety-stripped versions of new models soon.

What is Qwen-3.6-Plus's achievement?

Qwen-3.6-Plus processes 1T tokens per day.

What are recent post-training architecture advances?

Advances include TAPS/HISA, post-transformer, Salomi, MLPerf v6, DataFlex, Unsloth, TPU fine-tune, LeJEPA, Swift-SVD compression, and Test-time scaling.

What is TriAttention?

TriAttention uses trigonometric KV compression for efficient long reasoning.

What is the Geometric Alignment Tax?

It refers to tokenization vs. continuous geometry challenges in scientific foundation models.

How is fine-tuning Gemma on TPU done?

Tutorial uses Kinetic + Keras + JAX on TPU v5 for easiest full leverage.

What hardware supports these models?

RTX 4090, Jetson, Qualcomm MX mobile, and TPU show strong performance, with NVIDIA OSS push.

Gemma 4 31B beats GPT-5.4 AIME/Codeforces vs Llama/Qwen3.5; Meta to OSS new models soon (safety-stripped); Qwen-3.6-Plus 1T tokens/day; RTX/Jetson/Qualcomm MX/TPU (@ylecun anti-hype); Unsloth/Salomi/MLPerf/Swift-SVD/TriAttention trig KV; LightThinker++ mem; Test-time optimal; JEPA; Geometric tax tokenization sci models; NVIDIA evolution; Flow map LMs update (non-AR continuous flow gen).

Sources (31)
Updated Apr 8, 2026