AI Paper Tracker · May 7 Daily Digest
New Agent Benchmarks
- 🔥 DataClaw: DataClaw employs a three-stage human-in-the-loop annotation pipeline of expert task design, human-AI...

Created by Aleah Desiree
Latest AI/ML research papers from arXiv and major conferences
Explore the latest content tracked by AI Paper Tracker
Domain-specific benchmarks proliferate for complex agent tasks:
Challenging the growing mindset, this empirical study on neural force fields shows equivariance matters even more as models scale, with clear evidence reported.
Efficiency breakthroughs trend in generative modeling via flows and stochasticity:
Agentic AI vulnerabilities are expanding rapidly:
Beyond SFT-to-RL: This paper introduces pre-alignment via black-box on-policy distillation to bootstrap multimodal RL policies.
Fresh arXiv paper systematically evaluates LLMs' graph token understanding, fueled by their success motivating adaptation as universal predictors for graph tasks—pivotal for GNN-LLM synergy.
Compositionality and systematicity emerge from iterated learning, mimicking how children rapidly acquire language early in development. This theory shows language evolves through iterative processes, offering a path to foster compositional AI.
SplAttN leverages Gaussian soft splatting and attention to bridge 2D and 3D modalities for point cloud completion. A fresh arXiv upload pushing efficient reconstruction frontiers.
APEX is the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio to predict popularity with aesthetic insights.
HeteroSense-FL is a new Python software package for structured multimodal sensor simulation targeting modality-heterogeneous federated learning (FL) research. Bridges simulation gaps in heterogeneous FL studies.
DocETL optimizes complex document processing pipelines by addressing LLM shortcomings through a declarative interface for agentic query rewriting and evaluation.
Fresh arXiv paper introduces a compound AI system for conversational grant discovery:
New arXiv paper [2605.02810] compares human agency, which takes many years to develop as the frontal lobe activates, with potential agency in AI programs. A philosophical lens on AI's developmental path.
New paper audits AI-generated software flaws:
T^2PO introduces uncertainty-guided exploration control to achieve stable multi-turn agentic reinforcement learning. A fresh arXiv upload tackling key stability challenges in agentic RL.
Pushing AI reliability forward:
MolmoAct2 releases action reasoning models designed for real-world deployment. Join the discussion on this fresh paper.