Open Source AI Digest

Efficiency & reasoning primitives + new evals (dMoE, Light Interaction, StateKV, OSP-Next, Early Stopping, DeepSWE, llama.cpp MTP, Q-ARVD, SCRL, SEGA, PiD, DAR, Lens, RankE, VGenST-Bench, TerminalWorld, Claw-Anything, WBench, SpatialBench, AKBE, Stanford HAI, Scale vectors, DenoiseRL, BES, HRBench, Triplet-Block Diffusion RWKV, SAE-guided post-training, LiteCoder-Terminal, BeliefTrack, SIA, minWM, YoCausal, Netflix Wiz, ARC White-Box Estimation Challenge, OPRD, VideoKR, General Instinct compression, ToolMaze, multi-objective prompt optimization, WorldBench, Compress-Distill, FlashMemory-DeepSeek-V4, SpatialWorld, FrontierCode, CADGenBench, From Pixels to Words, MMAE, MoE-to-dense pruning, DRPO, FlowTracer, DiffusionGemma, AMD inference optimization, Cohere Transcribe, SG-OPD, JANG HSA/GGUF for MLX, OpenRouter cost optimization)

Efficiency & reasoning primitives + new evals (dMoE, Light Interaction, StateKV, OSP-Next, Early Stopping, DeepSWE, llama.cpp MTP, Q-ARVD, SCRL, SEGA, PiD, DAR, Lens, RankE, VGenST-Bench, TerminalWorld, Claw-Anything, WBench, SpatialBench, AKBE, Stanford HAI, Scale vectors, DenoiseRL, BES, HRBench, Triplet-Block Diffusion RWKV, SAE-guided post-training, LiteCoder-Terminal, BeliefTrack, SIA, minWM, YoCausal, Netflix Wiz, ARC White-Box Estimation Challenge, OPRD, VideoKR, General Instinct compression, ToolMaze, multi-objective prompt optimization, WorldBench, Compress-Distill, FlashMemory-DeepSeek-V4, SpatialWorld, FrontierCode, CADGenBench, From Pixels to Words, MMAE, MoE-to-dense pruning, DRPO, FlowTracer, DiffusionGemma, AMD inference optimization, Cohere Transcribe, SG-OPD, JANG HSA/GGUF for MLX, OpenRouter cost optimization)

Key Questions

What efficiency improvements does dMoE deliver for diffusion LLMs?

dMoE block-level routing achieves 76-80% memory reduction and 1.66x speedup. The technique is open-sourced and targets diffusion-based language models.

What new benchmarks evaluate agentic coding and spatial reasoning?

DeepSWE contains 113 tasks across 91 repositories while SpatialWorld tests interactive spatial reasoning. GPT-5 scores only 17.4% TSR on the latter benchmark.

How does DiffusionGemma accelerate text generation?

DiffusionGemma uses text diffusion on Gemma 4 to generate multiple tokens in parallel. It reports up to 4x faster inference while remaining under an Apache 2.0 license.

New efficiency papers: dMoE block-level routing for diffusion LLMs (76-80% memory reduction, 1.66x speedup, open-source). Light Interaction training-free acceleration for interactive video world models (2.59x speedup). StateKV linear scaling for video VLM prefill (no fine-tuning). llama.cpp MTP in LM Studio (99 t/s), Q-ARVD video quantization, SCRL curriculum RL (+4.1 on Qwen3). SEGA spectral-energy attention. MTP ~2x local speedup. OSP-Next video generation efficiency (1.64x speedup on H200). Early Stopping Rollout improves on-policy distillation. DeepSWE benchmark (113 tasks, 91 repos) for agentic coding. TerminalWorld (62.5% max) reveals agent gaps. Claw-Anything (GPT-5.5 only 34.5% pass@1). WBench multi-turn benchmark. SpatialBench for spatial foundation models. AKBE reduces tool calls by 18%. Stanford HAI one-shot scaling law. Scale vectors paper. Also: NVIDIA PiD, DAR, Microsoft Lens, RankE, VGenST-Bench. New signals: formal theorem proving study, on-policy distillation hot, KV cache compression teaser, SomaliBench. DenoiseRL self-supervised RL from incorrect reasoning traces. BES bidirectional evolutionary search. HRBench hybrid-reasoning benchmark. Triplet-Block Diffusion RWKV (1.6x speedup). SAE-guided post-training (SAERL improves GRPO by 3%). LiteCoder-Terminal synthetic environment generator. BeliefTrack benchmark (RL reduces failures by 70.9%). SIA self-improving AI framework. minWM open-source framework for real-time interactive video world models. YoCausal benchmark for causality in video diffusion models. Netflix Wiz token compression (90% reduction, $700k savings). ARC White-Box Estimation Challenge introduces a new alignment signal for open-weight models. OPRD on-policy representation distillation eliminates sampling variance, faster and more memory-efficient, closing student-teacher gap on AIME. New benchmark VideoKR for knowledge-intensive video reasoning. General Instinct (YC P26) compresses 245GB MoE to 48GB while beating Gemma-4-26B using on-policy distillation and aggressive expert quantization. New: ToolMaze benchmark for dynamic replanning and anomaly recovery (fault-tolerance scales 3.66x slower than basic execution). New paper on multi-objective prompt optimization failure modes (gradient dilution, instruction interference) for LLM judges. WorldBench: challenging multimodal reasoning benchmark with visual diversity (top model 64%). Compress-Distill: reasoning trace compression for distillation, 2-7.6x speedup with up to 96% accuracy retention. FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention reducing KV cache to 13.5% of full context. SpatialWorld benchmark tests interactive spatial reasoning (GPT-5 only 17.4% TSR). FrontierCode benchmark measures code quality (mergeability). CADGenBench evaluates AI on engineering-grade 3D CAD parts. New paper 'From Pixels to Words' proposes encoder-free native vision-language architecture. New benchmark MMAE for audio editing models (<5% exact match). New paper on pruning MoE to dense models shows MoE-to-dense outperforms dense-to-dense pruning, relevant for Qwen3 and DeepSeek-V2 deployment. New RL regularization: DRPO (smooth divergence regularizer) improves training stability. New paper FlowTracer (attention flow for targeted RL) enables token-level credit assignment. New: DiffusionGemma claims up to 4x faster text generation via text diffusion, building on Gemma 4. New article on maximizing frontier models on AMD hardware: compression frees HBM for larger KV cache, a key optimization for open-source AI deployment. New: Cohere Transcribe open-source speech recognition model #1 on Hugging Face Far-Field ASR benchmark. New: SG-OPD (Sign-Gated On-Policy Distillation) paper improves on-policy distillation for LLMs using sign-consistency gating, outperforming standard OPD on math reasoning. New: JANG tool for Hash-Sparse Attention and GGUF for MLX, enabling efficient inference on Apple Silicon. New: Practical guide on getting lowest-cost LLM inference on OpenRouter, highlighting 10-50x savings with open-weight models like DeepSeek V4 and Llama 3.3 70B.

Sources (2)
Updated Jun 16, 2026