Computer-Use & Proactive Agent Advances
Key Questions
What performance level have computer-use agents reached on OSWorld?
Agents now achieve up to 80% success using pixel, DOM, and attention-based methods. This marks a substantial improvement for production-grade automation.
How does π-Bench contribute to proactive agent development?
It decouples proactivity evaluation from other capabilities, allowing targeted improvements in agent initiative. This supports more reliable real-world deployments.
What enables Gemini's single-call sandbox agents?
Gemini supports sandboxed execution in a single API call, simplifying integration for automation SaaS. This directly facilitates scalable computer-use applications.
OSWorld 80% via pixel/DOM/attention; π-Bench proactivity decoupling; Gemini single-call sandbox agents. Directly enables production automation SaaS.
Sources (2)
Updated May 23, 2026