LLM Insight Tracker

GPT-5.4/5.5 发布:1M-token 上下文 + Native Computer Use/SuperApp + 150 IQ MESNA + Sora shutdown + Spud pivot + Codex PRBench lead + $852B valuation + MS MAI rivals + DeepSeek V4 delay + Altman New Yorker exposé

GPT-5.4/5.5 发布:1M-token 上下文 + Native Computer Use/SuperApp + 150 IQ MESNA + Sora shutdown + Spud pivot + Codex PRBench lead + $852B valuation + MS MAI rivals + DeepSeek V4 delay + Altman New Yorker exposé

Key Questions

What is the performance of GPT-5.4 Pro on the MESNA Norway test?

GPT-5.4 Pro achieved a 150 IQ score on the MESNA Norway test, breaking OpenAI's own previous record. This sharp jump occurred amid market considerations of inflation, labor, and AI-driven disruption.

What is the context length capability of GPT-5.4/5.5?

GPT-5.4/5.5 features a 1M-token context window, positioning it as a leader in handling extended inputs.

What are the details of OpenAI's recent funding and valuation?

OpenAI closed its latest funding round at $122B, leading to an $852B valuation. This includes a TBPN media buy and speculation about GPT-5.4 being an 852B parameter model.

What does the New Yorker investigation into Sam Altman reveal?

The New Yorker article, based on over 100 interviews and notes from Ilya Sutskever and Dario Amodei, exposes tensions in OpenAI's leadership and safety approaches. It was reposted by Gary Marcus highlighting its depth.

How did Codex perform on recent benchmarks?

Codex topped PRBench and MLPerf benchmarks, outperforming models like Gemma, DeepSeek, and Qwen.

Why was DeepSeek V4 delayed?

DeepSeek delayed its V4 model release to ensure compatibility with Huawei's chips, as reposted by Scobleizer.

What are Microsoft doing in response to AI rivals?

Microsoft released three new foundational models to compete with rivals in the AI space.

What is the status of Qwen-3.6-Plus?

Qwen-3.6-Plus is the first model to process over 1T tokens in a day, garnering 46 points on Hacker News.

GPT-5.4 Pro 150 IQ MESNA, 1M ctx leads; $122B funding/$852B val + TBPN media buy; New Yorker deep dive (100+ interviews, Ilya/Dario notes) exposes leadership/safety tensions; Codex tops PRBench/MLPerf vs Gemma/DeepSeek/Qwen.

Sources (11)
Updated Apr 8, 2026
What is the performance of GPT-5.4 Pro on the MESNA Norway test? - LLM Insight Tracker | NBot | nbot.ai