AI Research Spectrum

5h ago

LLM-as-a-Judge: Scaling Generative AI Evaluations in Medicine

Forum talk on using LLM-as-a-Judge to automate and scale generative AI evaluations in medicine
Presented by Emma Croxford, UW-Madison Biomedical...

5h ago

Energy Spills: Training-Free Detection of LLM Hallucinations and Errors

Lightweight safety boost for LLMs: Reinterprets softmax as an Energy-Based Model to track energy spills during decoding, correlating directly with...

14h ago

GUI Agents Evolve Beyond Reactivity: GUI-Libra and ActionEngine Lead the Way

Trend towards efficient GUI agents:

Today's GUI agents are reactive, with every step costing an LLM call—making them expensive, slow, and fragile.
-...

14h ago

OpenAI Model Cracks Erdős #846

LLM math milestone: Internal OpenAI model solved long-standing Erdős problem #846, inspiring a new paper. Researcher highlights it as one of the first proofs to genuinely impress, signaling frontier progress in automated reasoning.

14h ago

ARLArena: Unified Framework for Stable Agentic RL

ARLArena proposes a unified framework for stable agentic reinforcement learning, aiming to standardize RL training for reliable, scalable AI agents. Join the discussion.

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

arxiv.org

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

14h ago

Fixing 'Smelly' MCP Tool Descriptions to Boost AI Agent Efficiency

MCP tool descriptions are 'smelly', hindering AI agent efficiency. New work proposes augmented descriptions as a fix for better performance.

Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

arxiv.org

Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

14h ago

NoLan: Dynamic Suppression to Curb VLM Object Hallucinations

NoLan proposes mitigating object hallucinations in large vision-language models via dynamic suppression of language priors, targeting inference-time boosts to VLM reliability and safety.

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

arxiv.org

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

14h ago

Xray-Visual Models Scale Vision on Industry X-ray Data

Xray-Visual Models push vision architectures by scaling on industry-scale X-ray data, advancing healthcare AI.

14h ago

JAEGER: Joint 3D Audio-Visual Grounding in Simulated Environments

JAEGER advances multimodal embodied AI by introducing joint 3D audio-visual grounding and reasoning in simulated physical environments, bridging sensory integration for realistic spatial tasks.

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

arxiv.org

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

14h ago

AI Research Spectrum · Feb 26 Daily Digest

Agent and Manipulation Advances

🔥 Trace-Free+: Intuit AI Research introduces Trace-Free+, a curriculum learning framework that teaches models...

23h ago

Zero-Shot Robotics Trend: Object-Centric Tools and Language-Action Transfer

Key trend in embodied robotics: generalizable zero-shot policies via object-centric and language-action methods.

SimToolReal introduces...

23h ago

Trace-Free+: AI-Optimized Tool Descriptions Unlock Better LLM Agents

Tool quality bottleneck: LLM agent performance hinges on tool descriptions, often written for humans—not AI—creating scaling issues with growing...

23h ago

Query-Focused Memory-Aware Reranker for LLM Long Contexts

New preprint introduces a query-focused and memory-aware reranker to enhance long-context processing in LLMs.

23h ago

RMM-C46 Compresses HEP Data 10x for Scalable ML Gains

Unlocking efficient ML in high-energy physics:

RMM-C46 compresses high-dimensional particle collision data (>2601 values/event) by over 10-fold to...

Machine Learning Gains from Data Compression Technique

quantumzeitgeist.com

Machine Learning Gains from Data Compression Technique

23h ago

Test-Time Training with KV Binding Equals Linear Attention

Test-Time Training with KV Binding is secretly Linear Attention – revealing a theoretical bridge between test-time adaptation and efficient transformer mechanisms.

1d ago

P4D: Efficient 4D Distillation for Vision Models with Zero Inference Cost

Perceptual 4D Distillation (P4D) bridges 3D structure and temporal dynamics by distilling explicit 4D knowledge directly into models—no heavy architectural changes or added inference cost.

1d ago

AI Research Spectrum · Feb 25 Daily Digest

Agentic Systems Advances

🔥 TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics.
SkillOrchestra: Learning to Route Agents...

1d ago

Study Evaluates LLMs' Potential in Patient Health Education

New research evaluates the potential of large language models (LLMs) in health education for patients, assessing their performance in healthcare contexts.

Evaluating the performance of large language models in health ...

1d ago·

bmjopen.bmj.com

1d ago

Conv-FinRe: Conversational Benchmark for Utility-Grounded Finance

Conv-FinRe is a conversational and longitudinal benchmark for utility-grounded financial recommendation. Join the discussion on this paper page.

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

arxiv.org

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

1d ago

RL and Reflective Planning Trend in Agentic Vision

Emerging trend in building capable open agentic vision models for embodied tasks:

PyVision-RL uses RL to forge open agentic vision models
-...

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

arxiv.org

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

1d ago

Datasets and tools for multilingual NLP

Fine-tuned LLMs for educational assessment

Advances in large models, multimodal reasoning, agents, and efficient architectures

Large models and ML driving diagnostics, genomics, and therapeutics

High-level books and comprehensive reviews on ML

Recent Posts

LLM-as-a-Judge: Scaling Generative AI Evaluations in Medicine

Energy Spills: Training-Free Detection of LLM Hallucinations and Errors

GUI Agents Evolve Beyond Reactivity: GUI-Libra and ActionEngine Lead the Way

OpenAI Model Cracks Erdős #846

ARLArena: Unified Framework for Stable Agentic RL

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

Fixing 'Smelly' MCP Tool Descriptions to Boost AI Agent Efficiency

Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

NoLan: Dynamic Suppression to Curb VLM Object Hallucinations

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Xray-Visual Models Scale Vision on Industry X-ray Data

JAEGER: Joint 3D Audio-Visual Grounding in Simulated Environments

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

AI Research Spectrum · Feb 26 Daily Digest

Agent and Manipulation Advances

Zero-Shot Robotics Trend: Object-Centric Tools and Language-Action Transfer

Trace-Free+: AI-Optimized Tool Descriptions Unlock Better LLM Agents

Query-Focused Memory-Aware Reranker for LLM Long Contexts

RMM-C46 Compresses HEP Data 10x for Scalable ML Gains

Machine Learning Gains from Data Compression Technique

Test-Time Training with KV Binding Equals Linear Attention

P4D: Efficient 4D Distillation for Vision Models with Zero Inference Cost

AI Research Spectrum · Feb 25 Daily Digest

Agentic Systems Advances

Study Evaluates LLMs' Potential in Patient Health Education

Evaluating the performance of large language models in health ...

Conv-FinRe: Conversational Benchmark for Utility-Grounded Finance

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

RL and Reflective Planning Trend in Agentic Vision

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Reading Activity