OSS surge: DeepSeek V4 pricing + Kimi K2.6 + Qwen-Image-2.0 + Qwen 3.6-27B + ZAYA1-8B + Tencent HY3 + Meta Llama 4 + Xiaomi MiMo-v2.5 + MiniMax M3 + Qwen-VLA + Stepfun Step 3.7 Flash + NVIDIA Nemotron/Cosmos + OpenAI Triton + cost tools + OpenRouter unicorn + Tether BitNet + JetBrains Mellum2 + DeepSeek $7.4B funding + DeepSeek UI
Key Questions
What pricing change did DeepSeek announce?
DeepSeek implemented a permanent 75% discount effective May 31. The company is also raising $7.4B with Tencent and state-backed support.
How does MiniMax M3 compare to leading models?
MiniMax M3 is an open-weight model with 1M context that beats GPT-5.5 on coding tasks. It offers 9.7x faster prefill at $0.30 per million input tokens.
What new NVIDIA models were released?
NVIDIA released Cosmos 3 for physical AI and Nemotron 3 Ultra, a 550B MoE model that delivers 5x faster inference and 30% lower cost with open weights.
Which open-source models were highlighted from Chinese labs?
Qwen 3.6-27B, Qwen-Image-2.0, Qwen-VLA, Kimi K2.6, Tencent HY3, Stepfun Step 3.7 Flash, and ZAYA1-8B were released or updated with strong benchmarks and low costs.
What tools help reduce AI API costs?
Token compression CLI tools can cut 91.8% of wasted spend. OpenRouter has reached unicorn status at $1.3B valuation while providing access to multiple models.
What is Tether BitNet b1.58?
Tether BitNet b1.58 is a 13B edge model that uses 77.8% less VRAM and includes a Vulkan backend for efficient local inference.
What did JetBrains release for coding?
JetBrains released Mellum2, a 12B MoE model under Apache 2.0 license with strong performance on coding benchmarks for multi-model pipelines.
What UI improvement did DeepSeek launch?
DeepSeek launched a new Codex-like UI to improve the developer experience alongside its V4 architecture details on RoPE, MLA, and MoE innovations.
DeepSeek permanent 75% discount (effective May 31) and raising $7.4B with Tencent/state-backed fund, doubling down on AGI research. Kimi K2.6 OSS beats Opus/Gemini on frontend at 94% lower cost. MiniMax M3 confirmed: open-weight, 1M context, beats GPT-5.5 on coding, 9.7x faster prefill, $0.30/M input. NVIDIA Cosmos 3 for physical AI (five open models, top benchmarks). NVIDIA Nemotron 3 Nano Omni open multimodal. NVIDIA Nemotron 3 Ultra: 550B MoE, 5x faster inference, 30% lower cost, open weights. OpenAI Triton challenging CUDA with AMD deal. ZAYA1-8B trained on AMD. Tencent HY3 100x cheaper than Opus 4.7. Qwen-Image-2.0, Qwen 3.6-27B (dense 27.8B, 77.2% SWE-bench), Qwen-VLA, Stepfun Step 3.7 Flash. OpenRouter $1.3B unicorn. Cost optimization tools emerging: token compression CLI cuts 91.8% waste. Tether BitNet b1.58: 13B edge model, 77.8% less VRAM, Vulkan backend. JetBrains Mellum2: 12B MoE, Apache 2.0, strong coding benchmarks. DeepSeek V4 architecture deep dive reveals RoPE, MLA, CSA, HCA, MoE innovations explaining cost efficiency. NEW: DeepSeek launches new Codex-like UI for improved developer experience.