Vision Research Tracker

**************LeCun: Beyond LLMs — multimodal world-models, latent planning and video SSL**** [developing]

**************LeCun: Beyond LLMs — multimodal world-models, latent planning and video SSL**** [developing]

Key Questions

What is Yann LeCun's vision for AI beyond large language models?

Yann LeCun advocates for multimodal world-models using joint text+image+video pretraining, mixture-of-experts (MoE), conditional compute, and latent geometry. This approach aims to enable better physical planning and understanding through models like V-JEPA, ThinkJEPA, and Joint-Embedding Predictive World Models.

What is SIGReg/LeWorldModel?

SIGReg/LeWorldModel is a new 15M parameter JEPA model offering 48x speedup on Push-T tasks, available on YouTube and arXiv. It contributes to efficient joint-embedding predictive world models.

What are Joint-Embedding Predictive World Models?

These models, reposted by LeCun, focus on physical planning by predicting outcomes in world models. They integrate with advancements like Temporal Straightening and HyDRA for dynamic memory.

What is TrackMAE and its achievements?

TrackMAE is a motion-aware masked autoencoder achieving state-of-the-art results on 6 datasets. It enhances video representation learning relevant to LeCun's multimodal world models.

What does the 'Latent Space' survey cover?

The survey by @_akhaliq discusses the foundation, evolution, mechanism, ability, and outlook of latent spaces in AI. It provides context for latent geometry in world models.

Yann LeCun (Apr 2026) joint text+image+video pretraining, MoE/conditional compute, latent geometry. New: SIGReg/LeWorldModel (15M JEPA 48x speedup Push-T, YouTube/arXiv), Joint-Embedding Predictive World Models (physical planning LeCun repost), Temporal Straightening, HyDRA/Out-of-Sight dynamic memory (HM-World), V-JEPA 2.1, ThinkJEPA, Stereo WM/WorldAgents, WildWorld (108M-frame game), WorldCache/PackForcing, Omni-WorldBench/QuantiPhy/GameplayQA, Yilun Du, Pulse, DiT animal motion (300h dataset), TrackMAE motion-aware MAE (SOTA 6 datasets), 'Latent Space' survey. High-value repro: ablations, V-JEPA/SIGReg/LeWM/ThinkJEPA/Pulse/HyDRA/Out-of-Sight/DiT/TrackMAE/Joint-Embedding in MoE/TTT/GameplayQA w/ latency/power/QuantiPhy/WildBench.

Sources (5)
Updated Apr 8, 2026