AI Inference: Cloud, Edge, and On-Device

Key Questions

What vertical integration move is DeepSeek pursuing?

DeepSeek aims to develop its own AI chip to support its models and reduce reliance on external hardware providers.

How is Ollama performing in user adoption?

Ollama raised $65M Series B with 8.9M monthly users, including 85% of Fortune 500 companies, validating open-weight model demand.

What on-device inference options are expanding?

Apple's SpeechAnalyzer API offers on-device ASR faster than Whisper, while Ternlight provides a 7MB embedding model runnable in browsers via WASM.

DeepSeek aims to make its own AI chip, signaling vertical integration. Ollama raises $65M Series B, 8.9M monthly users (85% Fortune 500), validating open-weight model adoption. Opper AI offers European AI gateway with 300+ models. New research: Nemotron-Labs-Diffusion tri-mode model (6x tokens per forward). Ternlight 7MB embedding model in browser via WASM. Ongoing: SambaNova SN50, XCENA MX1 chip, Gemma 4, DiffusionGemma, etc. Apple's SpeechAnalyzer API for on-device ASR (faster than Whisper, slightly less accurate) adds to on-device inference options. Model scaling trend: frontier models moving to 10T parameters (Fable 5 already).

Sources (2)

Updated Jul 22, 2026

Applied AI Insights

AI Inference: Cloud, Edge, and On-Device

Key Questions

What vertical integration move is DeepSeek pursuing?

How is Ollama performing in user adoption?

What on-device inference options are expanding?

Samsung in talks to invest in Mistral at €20B valuation

@minchoi: Holy smokes... Opus 4.8 and GPT-5.6 Sol might have just gotten 50% cheaper. Thesean just launched S...