Efficiency & reasoning primitives + new evals (dMoE, Light Interaction, StateKV, OSP-Next, Early Stopping, DeepSWE, llama.cpp MTP, Q-ARVD, SCRL, SEGA, PiD, DAR, Lens, RankE, VGenST-Bench, TerminalWorld, Claw-Anything, WBench, SpatialBench, AKBE, Stanford HAI, Scale vectors, DenoiseRL, BES, HRBench, Triplet-Block Diffusion RWKV, SAE-guided post-training, LiteCoder-Terminal, BeliefTrack, SIA, minWM, YoCausal, Netflix Wiz, ARC White-Box Estimation Challenge, OPRD, VideoKR, General Instinct compression, ToolMaze, multi-objective prompt optimization, WorldBench, Compress-Distill, FlashMemory-DeepSeek-V4, SpatialWorld, FrontierCode, CADGenBench, From Pixels to Words, MMAE, MoE-to-dense pruning, DRPO, FlowTracer, DiffusionGemma, AMD inference optimization, Cohere Transcribe, SG-OPD, JANG HSA/GGUF for MLX, OpenRouter cost optimization)

Key Questions

What efficiency improvements does dMoE deliver for diffusion LLMs?

dMoE block-level routing achieves 76-80% memory reduction and 1.66x speedup. The technique is open-sourced and targets diffusion-based language models.

What new benchmarks evaluate agentic coding and spatial reasoning?

DeepSWE contains 113 tasks across 91 repositories while SpatialWorld tests interactive spatial reasoning. GPT-5 scores only 17.4% TSR on the latter benchmark.

How does DiffusionGemma accelerate text generation?

DiffusionGemma uses text diffusion on Gemma 4 to generate multiple tokens in parallel. It reports up to 4x faster inference while remaining under an Apache 2.0 license.

New efficiency papers: dMoE block-level routing for diffusion LLMs (76-80% memory reduction, 1.66x speedup, open-source). Light Interaction training-free acceleration for interactive video world models (2.59x speedup). StateKV linear scaling for video VLM prefill (no fine-tuning). llama.cpp MTP in LM Studio (99 t/s), Q-ARVD video quantization, SCRL curriculum RL (+4.1 on Qwen3). SEGA spectral-energy attention. MTP ~2x local speedup. OSP-Next video generation efficiency (1.64x speedup on H200). Early Stopping Rollout improves on-policy distillation. DeepSWE benchmark (113 tasks, 91 repos) for agentic coding. TerminalWorld (62.5% max) reveals agent gaps. Claw-Anything (GPT-5.5 only 34.5% pass@1). WBench multi-turn benchmark. SpatialBench for spatial foundation models. AKBE reduces tool calls by 18%. Stanford HAI one-shot scaling law. Scale vectors paper. Also: NVIDIA PiD, DAR, Microsoft Lens, RankE, VGenST-Bench. New signals: formal theorem proving study, on-policy distillation hot, KV cache compression teaser, SomaliBench. DenoiseRL self-supervised RL from incorrect reasoning traces. BES bidirectional evolutionary search. HRBench hybrid-reasoning benchmark. Triplet-Block Diffusion RWKV (1.6x speedup). SAE-guided post-training (SAERL improves GRPO by 3%). LiteCoder-Terminal synthetic environment generator. BeliefTrack benchmark (RL reduces failures by 70.9%). SIA self-improving AI framework. minWM open-source framework for real-time interactive video world models. YoCausal benchmark for causality in video diffusion models. Netflix Wiz token compression (90% reduction, $700k savings). ARC White-Box Estimation Challenge introduces a new alignment signal for open-weight models. OPRD on-policy representation distillation eliminates sampling variance, faster and more memory-efficient, closing student-teacher gap on AIME. New benchmark VideoKR for knowledge-intensive video reasoning. General Instinct (YC P26) compresses 245GB MoE to 48GB while beating Gemma-4-26B using on-policy distillation and aggressive expert quantization. New: ToolMaze benchmark for dynamic replanning and anomaly recovery (fault-tolerance scales 3.66x slower than basic execution). New paper on multi-objective prompt optimization failure modes (gradient dilution, instruction interference) for LLM judges. WorldBench: challenging multimodal reasoning benchmark with visual diversity (top model 64%). Compress-Distill: reasoning trace compression for distillation, 2-7.6x speedup with up to 96% accuracy retention. FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention reducing KV cache to 13.5% of full context. SpatialWorld benchmark tests interactive spatial reasoning (GPT-5 only 17.4% TSR). FrontierCode benchmark measures code quality (mergeability). CADGenBench evaluates AI on engineering-grade 3D CAD parts. New paper 'From Pixels to Words' proposes encoder-free native vision-language architecture. New benchmark MMAE for audio editing models (<5% exact match). New paper on pruning MoE to dense models shows MoE-to-dense outperforms dense-to-dense pruning, relevant for Qwen3 and DeepSeek-V2 deployment. New RL regularization: DRPO (smooth divergence regularizer) improves training stability. New paper FlowTracer (attention flow for targeted RL) enables token-level credit assignment. New: DiffusionGemma claims up to 4x faster text generation via text diffusion, building on Gemma 4. New article on maximizing frontier models on AMD hardware: compression frees HBM for larger KV cache, a key optimization for open-source AI deployment. New: Cohere Transcribe open-source speech recognition model #1 on Hugging Face Far-Field ASR benchmark. New: SG-OPD (Sign-Gated On-Policy Distillation) paper improves on-policy distillation for LLMs using sign-consistency gating, outperforming standard OPD on math reasoning. New: JANG tool for Hash-Sparse Attention and GGUF for MLX, enabling efficient inference on Apple Silicon. New: Practical guide on getting lowest-cost LLM inference on OpenRouter, highlighting 10-50x savings with open-weight models like DeepSeek V4 and Llama 3.3 70B.

Sources (2)

Updated Jun 16, 2026

Open Source AI Digest

Key Questions

What efficiency improvements does dMoE deliver for diffusion LLMs?

What new benchmarks evaluate agentic coding and spatial reasoning?

How does DiffusionGemma accelerate text generation?

open-weight leaderboard scores: which survive a local rerun?

OpenRouter fused three budget models and beat GPT-5.5 and Claude Opus 4.8 at half the cost #AI