DeepReinforce Ornith-1.0: open-source self-scaffolding coding MoE (9B–397B) with strong SWE-bench scores
Key Questions
What is Ornith-1.0?
Ornith-1.0 is a family of open-source agentic coding LLMs from DeepReinforce, available in sizes from 9B to 397B parameters using a Mixture of Experts (MoE) architecture. Released under the MIT license, it employs a novel self-scaffolding reinforcement learning approach and includes weights and code.
What benchmarks does the 397B Ornith-1.0 model achieve?
The 397B variant scores 82.4% on SWE-bench verified, surpassing Qwen3.7-Max. It also performs competitively with Claude Opus 4.8 on Terminal Bench.
What makes Ornith-1.0's approach technically novel?
It features a self-improving framework that allows the model to write its own training scaffold during reinforcement learning, along with built-in defenses against reward hacking. The models were trained using GRPO and represent a timely open-source contribution with strong verified benchmarks.
DeepReinforce launched Ornith-1.0, a family of open-source agentic coding LLMs (MIT, 9B–397B MoE) with a novel self-scaffolding RL approach. The 397B variant achieves 82.4% on SWE-bench (verified), beating Qwen3.7-Max and competing with Claude Opus 4.8 on Terminal Bench. The self-improving framework and reward hacking defenses are technically novel. Weights and code available. A timely open-source release with concrete benchmarks.