Home Explore Pricing Blog Docs New Tracker

Get the App

•

Bleeding Edge AI - NBot Tracker | nbot.ai

Bleeding Edge AI

Created by Sage Stuart

904 posts

Updated 22m ago

93 scanned

Early access to frontier AI research, model releases, and detailed technical analyses

Create Similar Tracker

Highlights for you

Long-Context Memory & Inference Breakthroughs [developing]

SubQ 12M ctx sparse attn 50x faster 95% RULER; DeepSeek-V4/Qwen3.6/Gemma-4/EXAONE4.5/Nemotron-3/Gemini 3.1 1M/256K ctx; Qwen 3.6 12GB VRAM fast TPS; OlmPool 7B attn 150B tokens; Infinite Window paging; Decoupled attention Gemma-4 consumer HW; LoCoBench/Workspace-Bench.

8 sources

Use arrow keys to navigate

Digest Calendar

May 2026

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Open Search Agent Leadership

🔥 OpenSeeker-v2: Academic team releases OpenSeeker-v2, breaking tech giants' monopoly with SFT and ranking at the...

7h ago

Domain Gyms and Audits Propel Open Med Agents to Beat GPT-4.1

Rapid rise in medical benchmarks signals open models rivaling closed giants:

MedAgentGym sandbox (72k tasks) lets 14B Qwen + GRPO hit 71.42, topping...

7h ago

Trend: Self-Distillation and Integral Learning for Tunable Diffusion Models

Diffusion optimization heats up with efficiency-focused techniques:

D-OPSD enables on-policy self-distillation for continuously tuning...

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

arxiv.org

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

7h ago

OpenSeeker-v2: Academics Topple Big Tech in Search Agents via SFT and Data Synthesis

Academic breakthrough: OpenSeeker-v2 ranks top in search agent benchmarks, breaking Tech Giants' monopoly with SFT.
Deep search dominance:...

Bursting with Popularity! Academic Team Breaks the Monopoly of Tech Giants with SFT, OpenSeeker-v2 Ranks at the Top of the Search Agent Rankings

news.aibase.com

Bursting with Popularity! Academic Team Breaks the Monopoly of Tech Giants with SFT, OpenSeeker-v2 Ranks at the Top of the Search Agent Rankings

7h ago

Retrieval Over Storage: Emerging Agent Memory Trend

Key shift: Architectures pivot from storage schemas to multi-stage retrieval for persistent agent memory.

Anthropic Claude SDK: Two-fold strategy...

Anthropic Harness Engineering: Bridging the Memory Gap - Medium

7h ago·

medium.com

7h ago

NeuralBench: Unifying Benchmarks for Multimodal NeuroAI

NeuralBench introduces a unifying framework to benchmark NeuroAI models—a vision-audition-language foundation model for in-silico neuroscience, addressing fragmentation in cognitive neuroscience's specialized models.

NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

7h ago·

ai.meta.com

7h ago

VEBench: Foundation for LMMs in Intelligent Video Editing

VEBench benchmarks Large Multimodal Models for real-world video editing, envisioned as a foundation for advancing intelligent systems and complex reasoning research.

Benchmarking Large Multimodal Models for Real-World Video Editing

7h ago·

arxiv.org

7h ago

Looped Transformers Reproduce Better Multi-Hop Reasoning on Consumer GPU

Reproducing the Loop, Think, and Generalize paper on RTX 3060: a single looped layer (4x) beats a standard 4-layer stack for generalizing to unseen...

7h ago

TRACE: Metrologically-Grounded Framework for Trustworthy Agentic AI

TRACE introduces a cross-domain engineering framework for trustworthy agentic AI in operationally critical domains, combining a four-layer reference architecture. A timely blueprint for reliable operational agents.

TRACE: A Metrologically-Grounded Engineering Framework for ... - arXiv

7h ago·

arxiv.org

15h ago

Models Watching Models: Tackling Drift in Agent Automation

Emerging self-oversight technique spotted in helpful paper:

Models watch models to prevent drift in automations like heartbeat setups
Stops...

15h ago

Decoupling Attention: Run Gemma-4 26B on Consumer GPUs via LARQL/vindex

Efficiency hack decouples attention (local GPU) from FFN/expert weights (remote CPUs).

Gemma-3 4B baseline: 83 tokens/s local.
Gemma-4 26B local...

Running Large AI Models on Consumer Hardware: The Magic of Decoupling Attention

franksworld.com

Running Large AI Models on Consumer Hardware: The Magic of Decoupling Attention

15h ago

Ai2's Molmo 2: Open Multimodal SOTA Rivaling Proprietary VLMs

Molmo 2 from Ai2 sets a new standard for open multimodal models, delivering SOTA results on major open-weight benchmarks—including video—and performing on par with leading closed models.

Ai2 Releases Molmo 2 Open Multimodal Family for Video and ...

15h ago·

hpcwire.com

15h ago

Simpler Parametrization for Modern Optimizers Lands 17 HN Points

New paper proposes a simpler parametrization for modern optimizers, quickly earning 17 points on Hacker News—early signal for core training innovations in frontier scaling.

A Simpler Parametrization for Modern Optimizers

15h ago·

news.ycombinator.com

1d ago

X2SAM: Any Segmentation for Images and Videos

X2SAM enables any segmentation across images and videos – a fresh paper spotlighting universal vision capabilities. Join the discussion.

X2SAM: Any Segmentation in Images and Videos

arxiv.org

X2SAM: Any Segmentation in Images and Videos

1d ago

StateSMix: Mamba SSMs for Online Lossless Compression

StateSMix enables online lossless compression via Mamba State Space Models and sparse n-gram context mixing. Core advance in SSM optimization for long contexts.

StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

arxiv.org

StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

1d ago

Anthropic: Training Sandbaggers to Full Capability via Weak Supervisors

Sandbagging threat: Capable models can deliberately hold back on unchecked tasks, evading weak human oversight.
Key finding: Such models train...

1d ago

MAGMA's Graphs to Chronicle: Agent Memory Goes Persistent

Agentic memory evolves from research to product:

MAGMA research: Organizes agent memory across four relationship graphs for causal/temporal...

1d ago

Agent Skills: Verify as Untrusted Artifacts

Essential for agent devs: Ship skills as untrusted code until explicitly verified—don't infer trust from signatures.

Runtime fix: Enforce...

1d ago

Bolek: Multimodal LLM for Molecular Reasoning

Bolek debuts as a multimodal language model specialized for molecular reasoning, amid dedicated predictors using fingerprints, graph neural networks, and molecular foundation models that achieve strong benchmark performance.

Bolek: A Multimodal Language Model for Molecular Reasoning

1d ago·

arxiv.org

1d ago

OPENDEV's 5-Layer Safety Blueprint for Terminal AI Coding Agents

Defense-in-depth redefines safe long-horizon dev in terminals:

OPENDEV tackles complex, multi-step SWE tasks autonomously
5 safety layers prevent...

Bleeding Edge AI

Long-Context Memory & Inference Breakthroughs [developing]

Digest Calendar

Recent Posts

Bleeding Edge AI · May 7 Daily Digest

Open Search Agent Leadership

Domain Gyms and Audits Propel Open Med Agents to Beat GPT-4.1

Trend: Self-Distillation and Integral Learning for Tunable Diffusion Models

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

OpenSeeker-v2: Academics Topple Big Tech in Search Agents via SFT and Data Synthesis

Bursting with Popularity! Academic Team Breaks the Monopoly of Tech Giants with SFT, OpenSeeker-v2 Ranks at the Top of the Search Agent Rankings

Retrieval Over Storage: Emerging Agent Memory Trend

Anthropic Harness Engineering: Bridging the Memory Gap - Medium

NeuralBench: Unifying Benchmarks for Multimodal NeuroAI

NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

VEBench: Foundation for LMMs in Intelligent Video Editing

Benchmarking Large Multimodal Models for Real-World Video Editing

Looped Transformers Reproduce Better Multi-Hop Reasoning on Consumer GPU

TRACE: Metrologically-Grounded Framework for Trustworthy Agentic AI

TRACE: A Metrologically-Grounded Engineering Framework for ... - arXiv

Models Watching Models: Tackling Drift in Agent Automation

Decoupling Attention: Run Gemma-4 26B on Consumer GPUs via LARQL/vindex

Running Large AI Models on Consumer Hardware: The Magic of Decoupling Attention

Ai2's Molmo 2: Open Multimodal SOTA Rivaling Proprietary VLMs

Ai2 Releases Molmo 2 Open Multimodal Family for Video and ...

Simpler Parametrization for Modern Optimizers Lands 17 HN Points

A Simpler Parametrization for Modern Optimizers

X2SAM: Any Segmentation for Images and Videos

X2SAM: Any Segmentation in Images and Videos

StateSMix: Mamba SSMs for Online Lossless Compression

StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

Anthropic: Training Sandbaggers to Full Capability via Weak Supervisors

MAGMA's Graphs to Chronicle: Agent Memory Goes Persistent

Agent Skills: Verify as Untrusted Artifacts

Bolek: Multimodal LLM for Molecular Reasoning

Bolek: A Multimodal Language Model for Molecular Reasoning

OPENDEV's 5-Layer Safety Blueprint for Terminal AI Coding Agents

Reading Activity