AI Research Daily Digest · Mar 19 Daily Digest
3D Vision Advances
- 🔥 SegviGen: Repurposes 3D generative model for part segmentation.
- 🔥 M^3: Combines dense matching with multi-view...

Created by xu ji
Daily AI research digest of papers and pre‑prints, no fluff
Explore the latest content tracked by AI Research Daily Digest
SegviGen innovatively repurposes 3D generative models for part segmentation. Join the discussion on this paper.
M^3 integrates dense matching with multi-view foundation models to advance monocular Gaussian Splatting SLAM, a key step for real-time 3D reconstruction in robotics and AR/VR.
FinToolBench proposes a new benchmark to evaluate LLM agents on real-world financial tool use, standardizing assessments for high-stakes finance tasks.
TRUST-SQL proposes tool-integrated multi-turn reinforcement learning to enable Text-to-SQL querying over unknown schemas, tackling LLM agent reliability in schema uncertainty.
Researchers introduce VAREX, a benchmark using the Reverse Annotation pipeline to generate synthetic datasets for multi-modal structured extraction from documents, featuring deterministic value-level ground truth and auditable schemas.
OpenSeeker is the first fully open-source search agent (model and data) achieving frontier-level performance, democratizing advanced capabilities.
New paper compares Supervised Fine-Tuning versus Reinforcement Learning as post-training methods for Large Language Models. Join the discussion on this research.
HSImul3R advances physics-aware reconstruction of human-scene interactions, enabling simulation-ready outputs through a physics-in-the-loop approach.
MMOU introduces a massive multi-task benchmark for omni understanding and reasoning on long and complex real-world videos, advancing multimodal evaluation.
New paper asks: Can Vision-Language Models solve the shell game? – a benchmark probing object tracking under dynamic occlusion.
OpenPlant launches a large-scale benchmark dataset for agricultural plant identification.
Researchers introduce the first prototype speech-to-sign gesture translation system for Kazakh Sign Language (KRSL), featuring a new dataset and integrated pipeline to advance multimodal accessibility.
ComFree-Sim excels in contact-rich robotics:
Researchers propose InterPose, a large-scale and automatically created dataset of 3D human motions featuring diverse human-object interactions from web data.