Multimodal/research: GPT-Image-2/Meta Wang OSS/VLMs/LIBERO-Para/Agentic-MME/3D GaussianGPT/math/Altman/LeCun evals
Key Questions
What is GPT-Image-2?
A leaked OpenAI GPT-Image-2 model on Arena shows advanced capabilities, described as 100% production-ready.
What are Meta's new OSS models under Wang?
Meta plans to open-source versions of new models led by Alexandr Wang, with reorg for safety.
What is LIBERO-Para?
LIBERO-Para is a diagnostic benchmark for paraphrase robustness in VLA models.
What biases affect VLMs?
VLMs show semantic and geometric biases; papers explore tokenization vs. continuous geometry alignment tax.
What is Agentic-MME?
Agentic-MME evaluates agentic capabilities in multimodal models alongside ViGoR-Bench for visual reasoning.
What are GaussianGPT and EgoSim?
GaussianGPT and EgoSim advance 3D generation and egocentric world simulation for embodied interaction.
What math benchmarks involve Erdős and ARC-AGI-4?
Gemini posts solutions to ARC-AGI-4; focuses on math reasoning like Erdős problems.
What are views from Altman and LeCun on future AI?
Altman predicts post-transformers era; LeCun advances LpJEPA and Gemma4 evals, addressing paper hallucinations and DARPA zero agents.
Meta new OSS models (Wang reorg/safety); LIBERO-Para VLA paraphrase robustness; GPT-Image-2 leak; VLMs semantic/geometric bias; Agentic-MME; GaussianGPT/EgoSim; Erdős/ARC-AGI-4; Altman post-transformers; LeCun LpJEPA/Gemma4; paper hallucinations/DARPA zero agent.