具身AI&世界模型&基准统一

Key Questions

What did Google announce at I/O 2026 regarding world models?

Google introduced Gemini 3.5 Omni with video generation and world model capabilities that include multi-agent orchestration. Benchmarks reportedly surpass GPT-5.5 in multimodal tasks.

How does Li Xiang define the embodied intelligence 'halves'?

Li Xiang frames embodied intelligence as an 'upper and lower half' transition comparable to vehicle electrification. It involves vertical capability migration from existing automotive tech stacks.

What is the contribution of the returning ChatGPT researcher?

The former ChatGPT core contributor is bringing LLM scaling laws into embodied VLA models through a new China-based venture. Focus includes end-to-end vision-language-action architectures.

Which companies are advancing world models and simulation?

Lightwheel Intelligence, Runway, NVIDIA, and Tencent are progressing HY-World 2.0 and physical AI simulation standards. These efforts aim to unify benchmarks across 3D spatial tasks.

What does ESI-Bench reveal about frontier models?

Fei-Fei Li's ESI-Bench shows frontier models fail at active 3D spatial exploration where seeing and acting diverge. It highlights gaps in unified multimodal physical understanding.

What new model did ByteDance release for multimodal tasks?

ByteDance open-sourced Lance, a lightweight unified model handling image/video understanding, generation, and editing. It outperforms several 7B models across benchmarks under Apache 2.0.

How are Chinese embodied model efforts organized?

Five major Chinese firms are advancing original VLA architectures with varying open-source contributions. Collaborations include joint labs with Peking University and Zhiyuan Institute.

What benchmark progress is noted in spatial AI?

Efforts like CHEERS from Tsinghua and Xi'an Jiaotong provide unified multimodal routes. Hangzhou's Qianjiang district is building data and evaluation infrastructure for these capabilities.

Gemini 3.5 Omni视频生成+world model at Google I/O 2026 with multi-agent & Spark agents, benchmarks surpass GPT-5.5。李想具身智能“上下半场”定义；华人GPT贡献者回国创业将Scaling Law带入具身VLA。光轮智能、Runway、NVIDIA、Tencent HY-World 2.0持续推进。 ESI-Bench highlights frontier multimodal failure in active 3D spatial exploration.

Sources (36)