Home Explore Pricing Blog Docs New Tracker

Get the App

•

AI Research Pulse - NBot Tracker | nbot.ai

AI Research Pulse

Created by Christopher Malcolm

401 posts

Updated 71 days ago

0 scanned

High‑impact AI research summaries across core ML, applied AI, and safety/policy

Create Similar Tracker

Digest Calendar

May 2026

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Agent Evaluation Benchmarks

🔥 SWE-CI: SWE-CI evaluates agent capabilities in maintaining software engineering tasks such as static bug fixing...

March 18, 2026

V-Co: Visual Representation Alignment via Co-Denoising

V-Co takes a closer look at visual representation alignment via co-denoising. Paper: https://t.co/yFmatjr2xS.

March 18, 2026

Code LLMs Surge: Industry Models and Security Automation

InCoder-32B emerges as a code foundation model for industrial scenarios, while LLMs prove powerful for automating programming tasks including security-related ones—signaling rising momentum bridging code gen and secure applications.

March 18, 2026

Book on Emerging Science of ML Benchmarks Hits 35 HN Points

New book The Emerging Science of Machine Learning Benchmarks garners 35 points on Hacker News, spotlighting the evolving science behind ML benchmark design.

Book: The Emerging Science of Machine Learning Benchmarks

March 18, 2026·

news.ycombinator.com

March 18, 2026

Diffusion Models Enable Joint Audio-Video Generation

Multimodal generative models show remarkable progress in single-modality video and audio synthesis, yet a new arXiv paper advances diffusion models for truly joint audio-video generation.

[2603.16093] Diffusion Models for Joint Audio-Video Generation - arXiv

March 18, 2026·

arxiv.org

March 18, 2026

Querywise Prompt Routing: Per-Query Decisions for LLM Efficiency

This paper reframes prompt choice as a per-query decision problem for LLMs, using a learned offline proxy reward to score query-prompt pairs for scalable optimization.

Querywise Prompt Routing for Large Language Models | IJRISS

March 18, 2026·

rsisinternational.org

March 18, 2026

OpenAI Coding Agents Leap on SWE-Bench Amid New SWE-CI Benchmark

Rapid gains in AI coding reliability: OpenAI's GPT-5.4 mini scores 54.4% on SWE-Bench Pro (up from 45.7% for GPT-5 mini) and runs 2x faster.

New...

GPT-5.4 Mini and Nano: OpenAI Just Validated the Multi-Model Agent ...

March 18, 2026·

beam.ai

March 18, 2026

Cognitive Framework for Measuring AGI Progress

A new cognitive framework offers a fresh lens on AGI progress measurement, drawing strong interest with 58 points on Hacker News.

Measuring progress toward AGI: A cognitive framework

March 18, 2026·

news.ycombinator.com

March 18, 2026

Rising Toolkit for Agent Reliability

Emerging trend in AI agent diagnostics and verification:

Research agent verification: MiroThinker-1.7 & H1 targets heavy-duty agents via...

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

arxiv.org

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

March 18, 2026

AI Research Pulse · Mar 18, 2026 Daily Digest

Agent Benchmarks and Open Tools

🔥 PokeAgent Challenge: Presents a large-scale benchmark for decision-making research built on Pokemon's...

March 18, 2026

Phi-4 Multimodal Reasoning and Agent Memory Frontiers

Emerging AI papers push multimodal and agent boundaries:

Microsoft Phi-4-reasoning-vision-15B: New multimodal reasoning model technical report
LLM...

March 18, 2026

PokeAgent Challenge: Pokemon-Powered Multi-Agent Benchmark

PokeAgent Challenge: New large-scale benchmark for decision-making research in competitive, long-horizon multi-agent settings
Built on Pokemon's...

[2603.15563] The PokeAgent Challenge: Competitive and Long ...

March 18, 2026·

arxiv.org

March 18, 2026

Diffusion Models Trapped by Fixed Sampling Rigidity

Diffusion models run in a reflexive System 1 mode, hobbled by fixed, content-agnostic sampling schedules—rigidity born from the curse of state that curbs intrinsic generative optimality.

Unlocking the Intrinsic Generative Optimality of Diffusion Models via ...

March 18, 2026·

arxiv.org

March 18, 2026

Agentic AI Surge: Codex Subagents, Claude Boosts, NVIDIA GTC Highlights

Major agentic coding leaps this week:

OpenAI Codex Subagents accelerate multi-step workflows with specialized agents
Claude upgrades to 1M token...

March 18, 2026

Cognitive Science Exposes AI's Autonomous Learning Failures

Cognitive science reveals why AI systems don't truly learn autonomously, offering a sharp theory critique vital for AI alignment discussions. HN thread exploded with 62 points.

Why AI systems don't learn – On autonomous learning from cognitive science

March 18, 2026·

news.ycombinator.com

March 17, 2026

EBFT: Feature-Matching Beats Token-Level Fine-Tuning for LLM Rollouts

Standard cross-entropy fine-tuning relies on token prediction under teacher forcing, ignoring sequence-level rollout behavior.
EBFT shifts to...

March 17, 2026

Attention Residuals: Fixing Residual Dilution for Deeper LLMs

AttnRes counters residual dilution in deep LLMs by selectively aggregating prior layers via learnable softmax attention.

Block AttnRes groups...

March 17, 2026

Meta Revives Encoder-Decoder for Omnilingual MT

Encoder-decoder architectures fight back against decoder-only LLMs in multilingual NLP:

Yann LeCun reposts spotlight on Omnilingual No Language...

March 17, 2026

Frameworks, Benchmarks, and Open-Source Fuel LLM Agentic Search Trends

MR-Search introduces meta-RL with self-reflection for complex info-seeking, yielding up to 19.3% benchmark gains via episode learning and multi-turn...

March 17, 2026

Self-Recognition Fine-Tuning Reverses Emergent Misalignment

SGTR trains models to recognize their own text, reversing and preventing emergent misalignment (EM) harms. Miles Brundage spotlights this promising alignment defense.

AI Research Pulse

Digest Calendar

Recent Posts

AI Research Pulse · Mar 19 Daily Digest

Agent Evaluation Benchmarks

V-Co: Visual Representation Alignment via Co-Denoising

Code LLMs Surge: Industry Models and Security Automation

Book on Emerging Science of ML Benchmarks Hits 35 HN Points

Book: The Emerging Science of Machine Learning Benchmarks

Diffusion Models Enable Joint Audio-Video Generation

[2603.16093] Diffusion Models for Joint Audio-Video Generation - arXiv

Querywise Prompt Routing: Per-Query Decisions for LLM Efficiency

Querywise Prompt Routing for Large Language Models | IJRISS

OpenAI Coding Agents Leap on SWE-Bench Amid New SWE-CI Benchmark

GPT-5.4 Mini and Nano: OpenAI Just Validated the Multi-Model Agent ...

Cognitive Framework for Measuring AGI Progress

Measuring progress toward AGI: A cognitive framework

Rising Toolkit for Agent Reliability

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

AI Research Pulse · Mar 18, 2026 Daily Digest

Agent Benchmarks and Open Tools

Phi-4 Multimodal Reasoning and Agent Memory Frontiers

PokeAgent Challenge: Pokemon-Powered Multi-Agent Benchmark

[2603.15563] The PokeAgent Challenge: Competitive and Long ...

Diffusion Models Trapped by Fixed Sampling Rigidity

Unlocking the Intrinsic Generative Optimality of Diffusion Models via ...

Agentic AI Surge: Codex Subagents, Claude Boosts, NVIDIA GTC Highlights

Cognitive Science Exposes AI's Autonomous Learning Failures

Why AI systems don't learn – On autonomous learning from cognitive science

EBFT: Feature-Matching Beats Token-Level Fine-Tuning for LLM Rollouts

Attention Residuals: Fixing Residual Dilution for Deeper LLMs

Meta Revives Encoder-Decoder for Omnilingual MT

Frameworks, Benchmarks, and Open-Source Fuel LLM Agentic Search Trends

Self-Recognition Fine-Tuning Reverses Emergent Misalignment

Reading Activity