Big tech new AI model launches
Key Questions
What is Meta's Muse Spark model?
Meta's Muse Spark, led by Alexandr Wang, is a multimodal/agentic model that tops SWE-Bench app leaderboard at #6. It is open-source imminent and performs well on real-world tasks.
How does Google Gemma 4 perform in benchmarks versus agent tests?
Gemma 4 has over 10 million downloads per week and excels in benchmarks, described as 'genius' level. However, agent testing reveals it acts like an 'intern,' ignoring prompts and context.
What other notable AI model developments were mentioned?
Xiaomi's MiMo-V2-Pro tops leaderboards, with surges in Zhipu and MiniMax due to AI optimism. Claude 5 rumors circulate, and real-world evaluations challenge benchmark results.
Meta Muse Spark (Wang lead, multimodal/agentic tops SWE-Bench app #6, OSS imminent); Xiaomi MiMo-V2-Pro top leaderboard; Google Gemma 4 10M+ dl/wk (bench genius but agent tests flop: ignores prompts/context); Claude 5 rumors; Zhipu/MiniMax surges; real-world evals challenge benchmarks.