Home Explore Pricing Blog Docs New Tracker

Get the App

App Store Google Play

Loading...

•

•

Subscribe to LocalLLaMA - NBot Tracker | nbot.ai

Subscribe to LocalLLaMA

Subscribe to LocalLLaMA

Updated 1h ago

Create Similar Tracker

Create Similar Tracker

Latest News

Latest news from Subscribe to LocalLLaMA's sources

1h ago

Has anyone tried Qwen3.7 flash on openrouter? How does it compare to our Qwen 3.6 27B?

Has anyone tried Qwen3.7 flash on openrouter? How does it compare to our Qwen 3.6 27B?

reddit.com icon

1h ago

First Kimi K3 results on home lab ~ 4t/s

First Kimi K3 results on home lab ~ 4t/s

reddit.com icon

2h ago

5060ti Chads, vllm updates and nvfp4

5060ti Chads, vllm updates and nvfp4

reddit.com icon

2h ago

Those who use many layers in CPU/RAM and some in GPU - what are your specs and speeds?

Those who use many layers in CPU/RAM and some in GPU - what are your specs and speeds?

reddit.com icon

3h ago

I keep coming back to Qwen... Over and Over. Is there really nothing better under 120B?

I keep coming back to Qwen... Over and Over. Is there really nothing better under 120B?

reddit.com icon

4h ago

"Uncensored" LLMs are measurably more optimistic than their base models

"Uncensored" LLMs are measurably more optimistic than their base models

reddit.com icon

4h ago

The idea: on a CPU the decode speed depends on the active params per token, not the total. My objective is trying to run a 10B at 100tok/s on a mid level PC (No GPU).

The idea: on a CPU the decode speed depends on the active params per token, not the total. My objective is trying to run a 10B at 100tok/s on a mid level PC (No GPU).

reddit.com icon

4h ago

Understand Kimi K3 from first principles: a recommended order for anyone trying to understand this beast

Understand Kimi K3 from first principles: a recommended order for anyone trying to understand this beast

reddit.com icon

4h ago

A slide deck you can edit with a local model or in Chrome — the whole deck is a JSON block in one HTML file (~640KB with editor and viewer included)

A slide deck you can edit with a local model or in Chrome — the whole deck is a JSON block in one HTML file (~640KB with editor and viewer included)

reddit.com icon

5h ago

model: add NextN/MTP speculative decoding support for GLM_DSA (GLM-5.2)- #25980 MERGED!

model: add NextN/MTP speculative decoding support for GLM_DSA (GLM-5.2)- #25980 MERGED!

reddit.com icon

6h ago

Microsoft did it .... again! (404 for their Mage-Flow models on HF)

Microsoft did it .... again! (404 for their Mage-Flow models on HF)

reddit.com icon

8h ago

I built a GBNF grammar compiler that makes 8B models reliably call tools - here's how it works (deep dive)

I built a GBNF grammar compiler that makes 8B models reliably call tools - here's how it works (deep dive)

reddit.com icon

10h ago

Is Laguna s2.1 fixed?

Is Laguna s2.1 fixed?

reddit.com icon

10h ago

Built and released BetterGPT-150M – A compact 150M parameter completion model (+ live HF Space demo)

Built and released BetterGPT-150M – A compact 150M parameter completion model (+ live HF Space demo)

reddit.com icon

12h ago

In-house LLM Inference on Kubernetes: A Production Runbook

In-house LLM Inference on Kubernetes: A Production Runbook

reddit.com icon

13h ago

dropped 4k on a spark, am I crazy?

dropped 4k on a spark, am I crazy?

reddit.com icon

13h ago

Anyone tried the Q1 Kimi K3 yet? (555GB)

Anyone tried the Q1 Kimi K3 yet? (555GB)

reddit.com icon

16h ago

A.X-K2 released

A.X-K2 released

reddit.com icon

16h ago

I tried running a 1.56TB MoE model on a 6GB RTX 4050 Laptop, Here’s the result

I tried running a 1.56TB MoE model on a 6GB RTX 4050 Laptop, Here’s the result

reddit.com icon

16h ago

Nvidia is expected to raise GeForce RTX GPU prices again by up to 30%

Nvidia is expected to raise GeForce RTX GPU prices again by up to 30%

reddit.com icon