Chinese OSS surge + DeepSeek V4/Kimi K2.6/Gemma4 + Qwen3.7 + efficiency + OpenRouter dominance + MiniMax M3 tests + Nvidia open model push + Gemma 4 12B
Key Questions
What price changes did DeepSeek V4 Pro receive?
DeepSeek V4 Pro received a permanent 75% price cut, making it 10-30x cheaper than competitors while achieving 93.5% on LiveCodeBench.
How does MiniMax M3 compare on benchmarks?
MiniMax M3 claims strong results such as 59.0% on SWE-Bench Pro, 66.0% on Terminal Bench, and 83.5% on BrowseComp at roughly 50x lower cost than frontier models.
What new Google model was released for local use?
Gemma 4 12B was released as an encoder-free multimodal model that runs on laptops with 16GB VRAM under Apache 2.0 license, with performance near 26B MoE models.
Which region dominates OpenRouter coding traffic?
Chinese open-weight models account for 61% of the top-10 tokens on OpenRouter, though they show high hallucination rates in some cases.
What is Qwen 3.7 Plus designed for?
Qwen 3.7 Plus is a multimodal agent with screen understanding capabilities that claims dominance on vision benchmarks and is available for free.
How is Harvey using open-source models?
Harvey uses GLM 5.1 as its primary worker with Opus 4.7 as fallback, achieving significant cost reductions in legal workflows.
What effort is Nvidia making in open models?
Nvidia is pushing open models with a planned 550B model and training data release as a US response to Chinese open-source dominance.
What efficiency technique did Together AI open-source?
Together AI open-sourced OSCAR, a 2-bit KV cache quantization method that improves inference efficiency for large models.
DeepSeek V4 Pro permanent 75% price cut, undercutting 10-30x; LiveCodeBench 93.5% beating Claude/GPT at 1/10 cost. Kimi K2.6 claims to rival GPT-5.5 on coding, 94-95% cheaper. Together AI open-sources OSCAR 2-bit KV cache quantization. MolmoAct2 open-source robotics model beats π0.5. MiniMax M3 tested: open-weight, 1M context, claims to beat Opus 4.7 and GPT-5.5 on several benchmarks at 50x lower cost; specific scores: SWE-Bench Pro 59.0%, Terminal Bench 2.1 66.0%, MCP Atlas 74.2%, BrowseComp 83.5%; trails Claude Opus 4.8 on agentic benchmarks. MiniMax M3 leads Next.js agent evals, trailing only Opus and GPT-5 at 10-20x lower cost. Qwen3-VL-8B ties GPT-5 (Low) on OCR. Qwen 3.7 Plus released as multimodal agent with screen understanding, claiming vision benchmark dominance. Chinese open-weight models lead OpenRouter coding traffic (61% of top-10 tokens), but data risk and hallucination rates high (DeepSeek V4 Pro 94% hallucination). Nvidia open model push: 550B model and training data release as US counter to Chinese OSS dominance. Gemma 4 12B released — encoder-free multimodal, laptop-ready with 16GB VRAM, Apache 2.0, benchmarks near 26B MoE, native audio input, MTP drafters. Real-world validation: Harvey uses GLM 5.1 as primary worker with Opus 4.7 fallback, cutting costs in legal workflows. On-the-ground insights from Kevin Xu highlight Chinese lab strategies, young talent, compute constraints, and US-China co-opetition.