Novel Architectures, Agents, and Theory

Key Questions

What improvement does Parallax attention provide over softmax?

Parallax introduces parameterized local linear attention that offers Pareto improvements over standard softmax attention. It has been scaled successfully to 1.7B parameter models using Muon optimization.

How does GASP enhance vision-language models?

GASP injects 3D priors into VLMs, delivering 18-29% gains on spatial reasoning benchmarks. It improves geometric understanding without requiring full 3D supervision.

What is the Command A+ model and its licensing?

Command A+ is an open-source 218B MoE model released under Apache 2.0. It joins other releases such as GLM5.1-NVFP4 on Hugging Face.

How does FluxMem represent agent memory?

FluxMem models memory as an evolving graph topology and achieves state-of-the-art results on LoCoMo, Mind2Web, and GAIA. This reframing improves long-term agent coherence.

What decoding technique yields up to 27% gains?

Thinking Before Constraining is a unified decoding framework that improves LLM outputs by up to 27%. It separates reasoning steps from constraint application.

Which new agent was launched by Mistral?

Mistral released the Vibe agent designed for work and code tasks. It complements other open releases such as Command A+ and GLM5.1-NVFP4.

What compression level does Bonsai Image 4B achieve?

Bonsai Image 4B delivers an 8.3x compression ratio while maintaining competitive performance. It exemplifies the trend toward efficient specialized models.

How does AdaState support streaming video generation?

AdaState uses self-evolving anchors to improve streaming video generation quality and consistency. It adapts dynamically to changing content during generation.

Parallax parameterized local linear attention (Pareto improvement over softmax, scaled to 1.7B with Muon). GASP injects 3D priors into VLMs (+18-29% spatial benchmarks). AdaState self-evolving anchors for streaming video generation. Thinking Before Constraining decoding trick (up to 27% gain). Why Larger Models Learn theory (interference mechanism). Aleph Prover formalizes OpenAI's Erdős disproof. FluxMem memory as evolving graph topology (SOTA on LoCoMo/Mind2Web/GAIA). Mistral Vibe agent for work and code. Command A+ open-source (218B MoE, Apache 2.0). GLM5.1-NVFP4 on Hugging Face. NEO-ov encoder-free VLM. SenseNova-U1 native mixture of transformers. Bonsai Image 4B (8.3x compression). MiniMax M3 sparse attention. SAM state-adaptive memory. GRAM stochastic latent trajectories. AutoScientists self-organizing agent teams. ScientistOne Chain-of-Evidence. BES bidirectional evolutionary search. Learn from Weaknesses domain specialization. Also: MASTER, PiD, Geo-Align, RankE, Lens 3.8B, SkillOpt, SPD, VPO, MAESTRO, LT2, IBM Granite-20B-Code-QK, SMART, on-policy distillation (REOPOLD, Uni-OPD, EffOPD).

Sources (80)

Updated May 29, 2026

Novel Architectures, Agents, and Theory

Key Questions

What improvement does Parallax attention provide over softmax?

How does GASP enhance vision-language models?

What is the Command A+ model and its licensing?

How does FluxMem represent agent memory?

What decoding technique yields up to 27% gains?

Which new agent was launched by Mistral?

What compression level does Bonsai Image 4B achieve?

How does AdaState support streaming video generation?

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

AdaState: Self-Evolving Anchors for Streaming Video Generation

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Parallax: Parameterized Local Linear Attention for Language Modeling

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

@huggingface reposted: Official @NVIDIAAI GLM5.1-NVFP4 spotted on @huggingface 🤩 https://t.co/A2ycGBIp...

@sophiamyang: Announcing @MistralAI Vibe - AI agent for work and code!

@omarsar0: // Memory as Connectivity // One of the cleaner reframings of agent memory I have seen this month. ...

Chap10: How Native Multimodal AI is Rewriting Reality in 2026! 🚀

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

Self-Improving Language Models with Bidirectional Evolutionary Search

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

@megthescientist reposted: ESMFold2 and the ESM-C family, now available for use! We’ve partnered with @bi...

@ylecun reposted: Today we're announcing ESMFold2, an open scientific engine to power prediction, ...

@MimansaJ reposted: New paper! LLM memory keeps improving, but this makes them *worse* as user sims....

From Pixels to Words -- Towards Native One-Vision Models at Scale

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

LLaVA-OV-2: Advanced Video-Language Model

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Understanding Data Temporality Impact on Large Language Models Pre-training

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Can LLMs Introspect? A Reality Check

Beyond Determinism: Generative Recursive Reasoning Models (GRAM) & Latent Trajectories. TRM vs HRM.

MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost

Why Language Models Need Sleep: New Research Shows Offline Recurrence Unlocks Deeper Reasoning for AI Agents

Self-Tuning and Distributed Optimization Algorithms for ...

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

MobileMoE: Scaling On-Device Mixture of Experts

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

@StanfordHAI: One-shot LLM training demands reliable scaling laws to predict model behavior, but current scaling t...

@EMostaque reposted: Today we’re releasing 1-bit and Ternary Bonsai Image 4B. A new family of image-...

PrismML Releases Bonsai Image 4B

Enhanced and Efficient Reasoning in Large Language Models

This 100% Free Open-Source Model Generates Perfect Text ...

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Improving Frozen LLMs via Inference Looping

Towards AI-Powered Research Automation for Scientific Discovery

Toward Native Multimodal Modeling: A Roadmap

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Your Embedding Model is SMARTer Than You Think

LT2: Linear-Time Looped Transformers

MAESTRO: Reinforcement Learning for Multimodal Agent ...

Why Do General AI Models Struggle with Quantum Computing ...

SPD: Boosting LLMs via Self-Distillation

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Geo-Align: Video Generation Alignment via Metric Geometry Reward

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Hierarchical Multi-agent Large Language Model Reasoning for ...

DeepSeek-V4 Explained: The End of Standard Attention in LLMs?

@rasbt: Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo tha...

Command A+

@gdb: GPT-5.5 is a very good model

Gemini Omni is getting incredibly good at scene consistency.

Runway launches Aleph 2.0 video editing model inside new Edit Studio

@tkipf reposted: Gemini Omni can create action replays from different angles. I referenced a vid...

@AnimaAnandkumar: I am thrilled that my article in @americanacad Daedalus special issue on AI & Science: What Is the ...

@jeremyphoward reposted: Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging t...

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

@MimansaJ reposted: New paper! LLM memory keeps improving, but this makes them worse as user sims....