******Multimodal efficiency, long-video, agent robustness (SpecEyes, ThinkJEPA, MuSEAgent, ZenBrain, Omni-SimpleMem, Tufts VLA)******
Key Questions
What is the geometric alignment tax in multimodal models?
Tokenization imposes a tax on continuous geometry in scientific models like SpecEyes, reducing efficiency. Continuous representations outperform discrete tokens.
How does ONE-SHOT enable video synthesis?
ONE-SHOT uses spatial-decoupled motion injection for compositional human-environment video synthesis in one shot. It integrates hybrid contexts for realism.
What is Token Warping in MLLMs?
Token Warping helps MLLMs view nearby viewpoints by adjusting tokens, improving spatial understanding. It addresses limitations in multimodal perception.
Do VLMs prioritize visuals or semantics?
VLMs often ignore visuals for semantics, as studies show, impacting robustness. World Action Models may generalize better than VLAs.
What efficiency gains does Tufts VLA achieve?
Tufts VLA cuts energy use 100x while boosting accuracy 95% on Hanoi tasks. It advances robotics with event-augmented processing.
What is Vero's contribution to visual reasoning?
Vero provides an open RL recipe for general visual reasoning across charts, science, and spatial tasks. It unifies training for multimodal agents.
How does LIBERO-Para test VLA robustness?
LIBERO-Para benchmarks paraphrase robustness in VLA models with diagnostic metrics. It exposes generalization gaps in vision-language-action systems.
What spatial language advancements are noted?
Communicating about Space enables language-mediated integration across partial views. Salt improves fast video generation with cache-aware training.
Geometric Alignment Tax on tokenization vs continuous geometry in sci models; ONE-SHOT compositional video synth via spatial-decoupled motion; Token Warping MLLMs; VLMs ignore visuals for semantics; World Action > VLAs?; Spatial lang; Salt video gen; Tufts VLA 100x energy/95% Hanoi; Gemma4 31B; E-VLA event-aug robotics; Vero RL visual reasoning; LIBERO-Para VLA para-robust.