AI Research Tracker

619 posts

Updated 5h ago

205 scanned

GUI-Libra trains native GUI agents with action-aware supervision and partially verifiable RL
JAEGER enables joint 3D audio-visual grounding...

New paper NanoKnow: How to Know What Your Language Model Knows introduces techniques to uncover internalized knowledge in LMs, boosting interpretability for diagnostics and fine-tuning. Join the discussion.

Emerging trend in diffusion models blending multimodal expansion and acceleration:

Tri-modal masked designs map out new multimodal generation...

Test-Time Optimizations

🔥 CoVer-VLA on PolaRiS: CoVer-VLA achieves 14% gains in task progress and 9% in success rate on the PolaRiS benchmark...

Key gains for Vision-Language Agents (VLAs) via test-time verification on challenging PolaRiS benchmark:

14% in task progress and 9% in success...

New paper slams standard Model Context Protocol (MCP) tool descriptions as smelly and proposes augmentations to boost AI agent efficiency.

Key breakthrough in ML interatomic potentials (MLIPs):

Multi-objective optimization on Allegro model, plus hybrid classical-quantum architectures,...

Trend toward generalizable embodied AI via zero-shot policies:

SimToolReal introduces an object-centric policy for zero-shot dexterous tool...

New AI model from MIT/Broad/ETH disentangles shared and unique info across cell modalities like gene expression, proteins, and morphology.

-...

RMM-C46 compresses high-dimensional rapidity-mass matrices (RMM) from 2601 to 46 physics-informed components, reducing size >10-fold while preserving...

Rising innovations in scalable long-sequence handling:

Test-Time Training (TTT) with KV binding secretly enables linear attention
Query-focused,...

Tool descriptions are a key bottleneck for LLM agents, written for humans not models, worsening as tools scale.

Trace-Free+ uses curriculum...

Emerging diffusion trends signal efficiency and new modalities:

Ψ-Samplers spotlighted with efficient curriculum
Text diffusion really happening
Mercury morels set speed and agentic quality benchmarks
Track these for generative breakthroughs.

Trend alert: Google, Cursor, and Anthropic are pushing agentic AI towards seamless no-code building, behavior verification, and enterprise scaling.

-...

Rising trial-and-error methods in embodied AI research:

PyVision-RL forges open agentic vision models via RL
Reflective test-time planning...

Karpathy flags massive token demand as a call for precise memory+compute orchestration in LLMs.

Core tradeoff: On-chip SRAM (fast, low capacity) vs...

Key signals in AI agent evaluation:

LongCLI-Bench introduces preliminary benchmarking for long-horizon agentic programming in command-line...

New Model Releases

🔥 Qwen 3.5 Medium Model Series: Alibaba Qwen Team released the Qwen 3.5 Medium Model Series, including Qwen3.5-Flash,...

ManCAR advances manifold-constrained latent reasoning via adaptive test-time computation tailored for sequential recommendations.

New paper advances interactive in-context learning using natural language feedback.

Later applied work: autonomy, social behavior, robotics, scientific discovery and AI+science ecosystem

Initial set of reasoning benchmarks, RL methods, and targeted data/optimization tweaks

Embodied intelligence, robotics benchmarks, and tooling/infrastructure for agents

Advanced diffusion/attention efficiency, safety tuning, memory, and AI+science links

Multimodal risks, world models, memory-based attacks, and evaluation protocols

Regional safety work, RAG/decoding, clinical and scientific AI safety, and hardware efficiency

Scaling, optimization tricks, RL stability, and research-agent evaluation

Reasoning faithfulness, diffusion/attention efficiency, and advanced optimization

Early applied work on agents, multimodal models, and domain-specific scientific/medical applications

Multimodal perception, world models, robotics, and energy-efficient generative models

Recent Posts

Frameworks Advancing Stable RL and Multimodal Grounding for Agents

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

NanoKnow: Probing What Language Models Know

NanoKnow: How to Know What Your Language Model Knows

Diffusion Models Evolving: Tri-Modal Design & Spectral Caching

The Design Space of Tri-Modal Masked Diffusion Models

AI Research Tracker · Feb 26 Daily Digest

Test-Time Optimizations

CoVer-VLA's Test-Time Verification Boosts VLAs on PolaRiS Red-Team Benchmark

Augmented MCP Tool Descriptions Fix 'Smelly' AI Agent Inefficiencies

Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

Allegro MLIP Optimized for Accurate, Efficient Atomic Modeling

AI Model Boosts Molecular Property Prediction Accuracy

Zero-Shot Robotics Breakthroughs: Tool Manipulation and Cross-Embodiment

AI Framework Unites Multimodal Cell Data for Deeper Cancer Insights

AI to help researchers see the bigger picture in cell biology

RMM-C46: 10x Compression Boosts ML on Particle Collision Data

Machine Learning Gains from Data Compression Technique

Long-Context Efficiency Trend: TTT as Linear Attention + Memory-Aware Reranking

Trace-Free+: Optimizing Tool Descriptions Beats Just Fine-Tuning Agents

Diffusion Surge: Ψ-Samplers and Text Diffusion Advance Efficiency

Agentic Workflow Tools Trend: No-Code, Verification, Enterprise Upgrades

Trend: RL and Reflective Planning Boost Agentic Vision & Embodied AI

PyVision-RL: Forging Open Agentic Vision Models via RL

Karpathy on LLM Memory Orchestration: SRAM vs HBM Amid Token Surge

Trend: Transparent Benchmarks for AI Agents Emerge

AI Research Tracker · Feb 25 Daily Digest

New Model Releases

ManCAR: Adaptive Test-Time Computation for Sequential Recommendations

Improving Interactive In-Context Learning with Natural Language Feedback