Home Explore Pricing Blog Docs New Tracker

Get the App

•

AI Preprint Pulse - NBot Tracker | nbot.ai

AI Preprint Pulse

Created by Valerie Flynn

385 posts

Updated 18 days ago

0 scanned

Daily top AI arXiv papers with abstracts and relevance notes

Create Similar Tracker

Highlights for you

Long-context inference optimizations (IndexCache + ... + BEAM + MemFactory + Omni-SimpleMem + AMA-Bench + ByteRover + OmniMEM + Neuro-Symbolic Dual Mem + Agent Trajectories + Fast Spatial Memory + DMax + MARS)

Neuro-Symbolic Dual Mem decouples progress/feasibility mems to cut agent drift in ALFWorld/WebShop; ByteRover 96.1% long-horizon; OmniMEM multimodal; AMA-Bench evals; BEAM 10M convos; MemFactory GRPO 14.8%; Agent Trajectories boosts self-evo recall; Fast Spatial elastic TTT; DMax parallel decoding dLLMs; MARS multi-token autoregressive gen. Power/photonics/multi-agent benches urgent. Status: developing.

6 sources

Use arrow keys to navigate

Digest Calendar

April 2026

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Agent Evaluation Benchmarks

🔥 ClawBench: Can AI Agents Complete Everyday Online Tasks?: New paper on ClawBench evaluating AI agents on everyday...

April 10, 2026

DMax & MARS: Surging Trend in Aggressive LLM Decoding Speedups

Hot trend in autoregressive LLM inference: aggressive techniques for parallel decoding and multi-token generation.

DMax introduces aggressive...

April 10, 2026

RLVR & Agentic RL Pitfalls: Exploration Barriers and Reasoning Collapse

RLVR stalls on hard problems: No success means zero learning signal, leaving issues unsolved. Cog-DRIFT breaks this exploration barrier.
Agentic...

April 10, 2026

Two New Benchmarks Tackle AI Agents in Mobile and Online Realities

KnowU-Bench evaluates interactive, proactive, personalized mobile agents.
ClawBench probes if agents handle everyday online tasks.
Trend signal: Specialized evals rising for proactive, real-world agent scenarios.

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

arxiv.org

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

April 10, 2026

Dissecting Reasoning SFT Generalization Limits

New preprint rethinks poor generalization in reasoning SFT via conditional analysis of optimization, data, and model capability roles. Essential read for understanding SFT bottlenecks.

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

arxiv.org

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

April 10, 2026

BM25 + monoT5 Matches GPT-5 in Agentic Search

Even in agentic search, core tech like BM25 (1994) + monoT5 (2020) lets a 20B agent rival GPT-5 (2025). SIGIR2026 paper "Revisiting Text Ranking in Deep Research" proves traditional IR remains highly competitive.

April 10, 2026

AI Preprint Pulse · Apr 10, 2026 Daily Digest

AI Safety Benchmarks

DeonticBench: A Benchmark for Reasoning over Rules: New benchmark introduced for reasoning over rules.
The Depth Ceiling:...

April 9, 2026

INSPATIO-WORLD: Real-Time 4D World Simulator

New preprint spotlight:

Real-time 4D world simulator
Via spatiotemporal autoregressive modeling
Join the discussion

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

arxiv.org

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

April 9, 2026

10K Open-Source Environments from 200 Software Packages Across All Sectors

Massive breadth: 10,000 training/eval environments from 200 unique software packages, spanning tasks from every major sector of the economy
Fully...

April 9, 2026

Cog-DRIFT Breaks RLVR Zero-Reward Barrier

RLVR stalls on hard problems: every rollout fails, yielding no meaningful training signal
Cause: Zero rewards in rollouts → no advantages → no gradient updates
Cog-DRIFT overcomes this barrier for progress on tough tasks

April 9, 2026

SEVerA: Verified Synthesis of Self-Evolving Agents

New preprint SEVerA introduces verified synthesis methods for self-evolving agents, targeting reliable self-improvement vital for AI safety. Join the discussion.

arxiv.org

SEVerA: Verified Synthesis of Self-Evolving Agents

April 9, 2026

The Depth Ceiling: LLM Limits in Latent Planning

New preprint 'The Depth Ceiling' spotlights the limits of large language models in discovering latent planning structures, urging deeper scrutiny of LLM planning depths.

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

arxiv.org

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

April 9, 2026

DeonticBench: Benchmark for Rule-Based Reasoning

DeonticBench launches as a new benchmark for reasoning over rules, probing LLMs on deontic obligations and permissions—vital for AI alignment.

DeonticBench: A Benchmark for Reasoning over Rules

arxiv.org

DeonticBench: A Benchmark for Reasoning over Rules

April 9, 2026

AI Preprint Pulse · Apr 9 Daily Digest

RLVR Innovations

🔥 Cog-DRIFT: @EliasEskin shares Cog-DRIFT, enabling RLVR to learn from zero-reward examples on hard problems by reformulating...

April 8, 2026

MegaTrain: Full Precision Training of 100B+ LLMs on a Single GPU

MegaTrain introduces full precision training for 100B+ parameter large language models on a single GPU – slashing barriers to massive model development.

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

arxiv.org

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

April 8, 2026

Free-Range Gaussians: Letting 3D Gaussians Roam Freely

Free-Range Gaussians introduce a core idea: instead of predicting Gaussians on pixel- or voxel-aligned grids, they live freely in 3D space for...

April 8, 2026

Learning to Retrieve from Agent Trajectories

Fresh arXiv preprint Learning to Retrieve from Agent Trajectories spotlights trajectory-based retrieval for agent enhancement. Join the discussion.

arxiv.org

Learning to Retrieve from Agent Trajectories

April 8, 2026

Cog-DRIFT Buzz: RLVR Breakthrough on Hard Tasks via ZPD-Inspired Scaffolding

Rapid excitement builds around Cog-DRIFT, tackling RLVR's failure on tough problems where rollouts yield zero rewards (pass@64=0).
Standard...

April 8, 2026

Action Images: End-to-End Policy Learning via Multiview Video Generation

New preprint introduces Action Images, an approach for end-to-end policy learning using multiview video generation. Join the discussion sparking interest in this innovative method.

arxiv.org

Action Images: End-to-End Policy Learning via Multiview Video Generation

April 8, 2026

ThinkTwice: Jointly Optimizing LLMs for Reasoning and Self-Refinement

ThinkTwice introduces a unified approach to jointly optimizing large language models for reasoning and self-refinement. Join the discussion on this new preprint.

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

arxiv.org

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

April 8, 2026

AI Preprint Pulse

Long-context inference optimizations (IndexCache + ... + BEAM + MemFactory + Omni-SimpleMem + AMA-Bench + ByteRover + OmniMEM + Neuro-Symbolic Dual Mem + Agent Trajectories + Fast Spatial Memory + DMax + MARS)

Digest Calendar

Recent Posts

AI Preprint Pulse · Apr 11 Daily Digest

Agent Evaluation Benchmarks

DMax & MARS: Surging Trend in Aggressive LLM Decoding Speedups

RLVR & Agentic RL Pitfalls: Exploration Barriers and Reasoning Collapse

Two New Benchmarks Tackle AI Agents in Mobile and Online Realities

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Dissecting Reasoning SFT Generalization Limits

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

BM25 + monoT5 Matches GPT-5 in Agentic Search

AI Preprint Pulse · Apr 10, 2026 Daily Digest

AI Safety Benchmarks

INSPATIO-WORLD: Real-Time 4D World Simulator

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

10K Open-Source Environments from 200 Software Packages Across All Sectors

Cog-DRIFT Breaks RLVR Zero-Reward Barrier

SEVerA: Verified Synthesis of Self-Evolving Agents

SEVerA: Verified Synthesis of Self-Evolving Agents

The Depth Ceiling: LLM Limits in Latent Planning

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

DeonticBench: Benchmark for Rule-Based Reasoning

DeonticBench: A Benchmark for Reasoning over Rules

AI Preprint Pulse · Apr 9 Daily Digest

RLVR Innovations

MegaTrain: Full Precision Training of 100B+ LLMs on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Free-Range Gaussians: Letting 3D Gaussians Roam Freely

Learning to Retrieve from Agent Trajectories

Learning to Retrieve from Agent Trajectories

Cog-DRIFT Buzz: RLVR Breakthrough on Hard Tasks via ZPD-Inspired Scaffolding

Action Images: End-to-End Policy Learning via Multiview Video Generation

Action Images: End-to-End Policy Learning via Multiview Video Generation

ThinkTwice: Jointly Optimizing LLMs for Reasoning and Self-Refinement

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Reading Activity