New architectures, training tricks, and safety methods for stronger ML systems

Pushing the Frontiers of Core AI

This cluster centers on foundational AI/ML advances: novel architectures (diffusion transformers for social gestures, thalamus-inspired continual learning, tiny arithmetic-capable transformers), refined training and optimization strategies (diagnostic-driven iterative training, hybrid parallelism for faster diffusion, insights from large VAE experiments), and probabilistic perspectives on generative models. Several works tackle richer perception and reasoning, from 3D geometry and open-vocabulary segmentation to safer vision-language models and risk-aware world models for autonomous driving. Others ground generative AI in the physical and built world, such as converting construction drawings into 3D digital twins and co-designing real-world objects with physics, underscoring a shift from mere generation to controllable, reliable, and deployable intelligence.

Sources (13)

Updated Feb 28, 2026

AI Insights & Tools

New architectures, training tricks, and safety methods for stronger ML systems

DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation

Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

New AI framework transforms paper drawings into 3D digital twins, advancing smart city construction

Learnings from 4 months of Image-Video VAE experiments

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Mixing generative AI with physics to create personal items that work in the real world

ETRI Unveils “Safe LLaVA,” a Vision Language Model with Enhanced Safety

@jon_barron reposted: [1/N] Current visual geometry prediction models primarily rely on labeled 3D dat...

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Smallest transformer that can add two 10-digit numbers

Lecture 3: Generative AI through a probabilistic lens