Home Explore Pricing Blog Docs New Tracker

Get the App

•

My RL Digest - NBot Tracker | nbot.ai

My RL Digest

Created by Yuanhao

1.4K posts

Updated 95 days ago

0 scanned

The latest RL breakthroughs, benchmark results, and real‑world industry applications

Create Similar Tracker

Highlights for you

OpenClaw-RL — train agents by talking (arXiv/YouTube)

Princeton OpenClaw-RL NL async + ClawKeeper/NVIDIA CLAW; Claude GRPO-TCR; real-world safety analysis highlights vulnerabilities beyond sims, urging safeguards; ties Azure RFT/SKILL0/Signals.

5 sources

Use arrow keys to navigate

Digest Calendar

July 2026

Sun

Mon

Tue

Wed

Thu

Fri

Sat

New RL Papers

🔥 GrandCode: GrandCode achieves grandmaster level in competitive programming via agentic reinforcement learning and placed first...

April 7, 2026

Sim-to-Real RL Drives Humanoid Sprinting and Ultra-Reliable Manipulation

Humanoid robotics trend: Sim-to-real RL breakthroughs enable production-ready feats.

KAIST's lower-body platform sprints at 12 km/h, moonwalks, and...

KAIST humanoid robot sprints, moonwalks, and kicks a soccer ball

morningoverview.com

KAIST humanoid robot sprints, moonwalks, and kicks a soccer ball

April 7, 2026

Self-Distilled RLVR Paper Announced

Self-Distilled RLVR paper released by @_akhaliq: https://t.co/5oucSjKaJs https://t.co/CwH09W9j5F. Fresh RL breakthrough for vision tasks.

April 7, 2026

Agentic RL Trends: Grandmaster Coding Wins and Async Efficiency

Key advances pushing LLM agents toward real-world coding prowess:

GrandCode multi-agent RL system tops Codeforces live contests (Rounds 1087-1089),...

April 6, 2026

My RL Digest · Apr 6 Daily Digest

Algorithmic Advances

🔥 Policy Gradient RL with Separated Knowledge: Paper presents policy gradient reinforcement learning with separated...

April 6, 2026

AgentHazard: Benchmark for Harmful Behaviors in Computer-Use Agents

AgentHazard is a new benchmark for evaluating harmful behavior in computer-use agents, spotlighting risks in real-world AI deployments. Join the discussion on the paper page.

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

arxiv.org

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

April 6, 2026

RL Efficiency Boom: RLCF Tops RLHF, Trajectory Triage, LLM Compute Scaling

Emerging trend in LLM+RL agent efficiency:

RLCF breakthrough: Uses community feedback on high/low-citation papers to teach AI scientific taste,...

April 6, 2026

Heracles: Diffusion Middleware Revolutionizes Humanoid Control

Heracles bridges precise motion tracking and generative synthesis for humanoid robots:

Acts as intermediary using flow matching for real-time...

April 5, 2026

Separated Knowledge Boosts Policy Gradient RL

Policy gradient RL breakthrough: Replaces state-values—previously used as reusable parameters for behavior knowledge—with separated knowledge.

Policy Gradient Reinforcement Learning with Separated Knowledge

April 5, 2026·

jstage.jst.go.jp

April 5, 2026

LLM Robotics Shift: Modular to VLA, Data Now Bottleneck

Key evolution in LLM robot control:

Modular past: LLMs planned tasks (e.g., SayCan), but separate networks handled motor signals, limiting...

How Language Models Learned to Control Robots—And Why Data Is Now the Bottleneck | Avala

avala.ai

How Language Models Learned to Control Robots—And Why Data Is Now the Bottleneck | Avala

April 5, 2026

RL Shortcut Problem Threatens Model Reasoning Reliability

RL's shortcut problem undermines reliable model reasoning. Court's reasoning hinges on understanding how these models think, exposing critical risks for real-world RL deployment.

Reinforcement Learning's Shortcut Problem: Why It Matters

April 5, 2026·

machinebrief.com

April 5, 2026

Risk-Averse RL Enables Resilient Microgrid Dispatch in Coastal Extreme Weather

New RARLDA framework applies RL to microgrid dispatch, integrating CVaR for weather risks like max wind speed, rainfall, and temperature in coastal...

Reinforcement Learning-Driven Microgrid Dispatch Under Extreme Weather Events: A Risk-Averse Decision Architecture for Coastal Cities | Distributed Generation & Alternative Energy Journal

April 5, 2026·

journals.riverpublishers.com

April 5, 2026

My RL Digest · Apr 5 Daily Digest

Algorithmic RL Breakthroughs

🔥 DeepMind AlphaEvolve: Google DeepMind proposes AlphaEvolve, an LLM-powered evolutionary coding agent that...

April 5, 2026

Q-CQL: Stable Quantum RL Breakthrough with Demo

Q-CQL fuses conservative Q-learning with quantum computing for robust decisions in noisy data environments.

Conservative edge: Avoids risky...

April 4, 2026

RL-LLM Trend: Enterprise Reasoning Boost and MARL Auto-Optimization

Emerging RL-LLM synergy drives breakthroughs:

Azure RFT elevates LLM reasoning for enterprise agents like retail tools and DocuSign contracts, using...

Reinforcement Fine-Tuning on Azure AI Foundry: Training Models to Reason Better | by Badr Kacimi | Apr, 2026 | Medium

medium.com

Reinforcement Fine-Tuning on Azure AI Foundry: Training Models to Reason Better | by Badr Kacimi | Apr, 2026 | Medium

April 4, 2026

MACE: Realistic Gymnasium RL Envs for Trading with Market Impact

MACE launches three Gymnasium-compatible environments for stock trading, margin trading, and portfolio optimization.

Realistic modeling: Nonlinear...

April 4, 2026

RL Trend: Humanoid Sims to Real Skills Like Ping-Pong and 99% Success

Humanoid RL surges with sim-to-real breakthroughs:

Full pipeline sims enable RL training without hardware, from sensors to control.
Deep RL boosts...

April 4, 2026

My RL Digest · Apr 4 Daily Digest

Hardware Breakthroughs

🔥 Stable Memristor for RL: New memristor design with built-in oxygen gradient produces slow, stable conductance changes,...

My RL Digest

OpenClaw-RL — train agents by talking (arXiv/YouTube)

Digest Calendar

Recent Posts

SRPO Unifies GRPO and SDPO for Stable RLVR Gains

Paper page - Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

Hands-On Quantum RL: QRL-QAI SDK on Google Colab

Exploring Quantum Reinforcement Learning with QRL-QAI on Google Colab

My RL Digest · Apr 7 Daily Digest

New RL Papers

Sim-to-Real RL Drives Humanoid Sprinting and Ultra-Reliable Manipulation

KAIST humanoid robot sprints, moonwalks, and kicks a soccer ball

Self-Distilled RLVR Paper Announced

Agentic RL Trends: Grandmaster Coding Wins and Async Efficiency

My RL Digest · Apr 6 Daily Digest

Algorithmic Advances

AgentHazard: Benchmark for Harmful Behaviors in Computer-Use Agents

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

RL Efficiency Boom: RLCF Tops RLHF, Trajectory Triage, LLM Compute Scaling

Heracles: Diffusion Middleware Revolutionizes Humanoid Control

Separated Knowledge Boosts Policy Gradient RL

Policy Gradient Reinforcement Learning with Separated Knowledge

LLM Robotics Shift: Modular to VLA, Data Now Bottleneck

How Language Models Learned to Control Robots—And Why Data Is Now the Bottleneck | Avala

RL Shortcut Problem Threatens Model Reasoning Reliability

Reinforcement Learning's Shortcut Problem: Why It Matters

Risk-Averse RL Enables Resilient Microgrid Dispatch in Coastal Extreme Weather

Reinforcement Learning-Driven Microgrid Dispatch Under Extreme Weather Events: A Risk-Averse Decision Architecture for Coastal Cities | Distributed Generation & Alternative Energy Journal

My RL Digest · Apr 5 Daily Digest

Algorithmic RL Breakthroughs

Q-CQL: Stable Quantum RL Breakthrough with Demo

RL-LLM Trend: Enterprise Reasoning Boost and MARL Auto-Optimization

Reinforcement Fine-Tuning on Azure AI Foundry: Training Models to Reason Better | by Badr Kacimi | Apr, 2026 | Medium

MACE: Realistic Gymnasium RL Envs for Trading with Market Impact

RL Trend: Humanoid Sims to Real Skills Like Ping-Pong and 99% Success

My RL Digest · Apr 4 Daily Digest

Hardware Breakthroughs

Reading Activity