AI Model Release Tracker

********OpenAI GPT-5 / GPT-5.4 flagship commercial agents incl. PC control** [developing] [developing] [developing] [developing]

********OpenAI GPT-5 / GPT-5.4 flagship commercial agents incl. PC control** [developing] [developing] [developing] [developing]

Key Questions

What is OpenAI's GPT-5 and its key improvements?

GPT-5 features unified routing and thinking with a 272k context window, multimodal tools, outperforming GPT-4o. It includes boosts like MIA for GPT-5.4, achieving 9% better LiveVQA, 31% across 11 benchmarks, and 92.9% on MMLU. It underperforms Claude in sports betting and cyber evaluations.

What is the GPT-5.4-Cyber model?

GPT-5.4-Cyber is a fine-tuned variant of GPT-5.4 designed for defensive cybersecurity tasks, capable of reverse engineering binaries. It is teased to rival models like Mythos and is restricted to vetted security professionals. OpenAI plans to expand access to thousands of verified defenders.

Why can't the public use GPT-5.4-Cyber?

The model is not available on ChatGPT or to the general public due to its high-risk capabilities in cybersecurity. It is launched exclusively for vetted security pros through OpenAI's Trusted Access for Cyber program. API and evaluations are ongoing with safety measures in place.

What are the new computer-use agents in GPT-5.4?

GPT-5.4 introduces native computer-use agents that reason and act on PCs, websites, and software. These high-risk agents enable direct interaction with digital environments. Development includes safety cards and ongoing evaluations.

How does GPT-5 compare to competitors like Claude and GLM-5.1?

GPT-5 leads in several benchmarks but GLM-5.1 closes the gap with 77.8% SWE, 94.6% coding matching Claude Opus, and 50 on Intel Index. GPT-5 underperforms Claude in sports betting and cyber evaluations. Overall, benchmarks show competitive performance across models.

Unified routing/thinking 272k ctx/multimodal tools beats GPT-4o; MIA boosts GPT-5.4 9% LiveVQA/31% 11 benches/MMLU 92.9%; GLM-5.1 closes gap (SWE 77.8%/coding 94.6% Claude Opus/Intel Index 50); underperforms Claude in sports betting/cyber eval; new GPT-5.4 native computer-use agents reason/act on PCs/websites/software (high-risk); cyber model to rival Mythos teased. Safety card; API/evals ongoing.

Sources (3)
Updated Apr 15, 2026