AI Theory Daily

1d ago

ADCT Enhances Vision Model Robustness Against Illusions

ADCT boosts robustness and calibration of pattern recognition models to visual illusions.
Introduces AG-MNIST (224×224 gratings: hor4/8, ver4/8)...

ADCT: Improving Robustness and Calibration of Pattern Recognition Models Against Visual Illusions

mdpi.com

ADCT: Improving Robustness and Calibration of Pattern Recognition Models Against Visual Illusions

1d ago

2d ago

F-INR: Tensor Decomposition Scales High-Dimensional INRs

Key breakthrough in scalable INRs:

Factorizes monolithic INRs into compact, axis-specific sub-networks via functional tensor decomposition, cutting...

2d ago

AI Theory Daily · Feb 24 Daily Digest

Optimization Advances

🔥 Adam Improves Muon: Presents adaptive moment estimation with orthogonalized momentum to improve Adam on Muon.
🔥...

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

arxiv.org

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

3d ago

Orthogonalized Momentum Improves Adam Over Muon

New optimizer advance: Adam Improves Muon via orthogonalized momentum in adaptive moment estimation, enhancing training performance.

arxiv.org

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

3d ago

Unifying Decoding Samplers as Optimization on Probability Simplex

New paper reframes decoding strategies—from Top-K and Top-P (Nucleus) to Best-of-K samplers—as optimization problems on the probability simplex. A foundational advance in decoding theory.

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

arxiv.org

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

3d ago

VESPO: Variance-Reduced Variational Method for Stable Off-Policy LLM Training

VESPO tackles LLM RL training instability via a variational formulation with variance reduction, correcting policy divergence without length normalization. Key for stable off-policy RLHF.

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

arxiv.org

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

3d ago

Linear Speedup in PFEDRL via Shared Feature Representations

New analysis reveals linear speedup in personalized federated reinforcement learning (PFEDRL) through learning a shared feature representation Φ, directly linking to representation learning theory.

[PDF] on the linear speedup of personalized fed- - erated reinforcement learning ...

3d ago·

par.nsf.gov

4d ago

NeST: Neuron-Selective Tuning for Efficient LLM Safety

NeST offers a lightweight safety alignment method, selectively tuning safety-relevant neurons while freezing the rest of the LLM, yielding significant...

NeST: Neuron Selective Tuning for LLM Safety

arxiv.org

NeST: Neuron Selective Tuning for LLM Safety

4d ago

Minimal RNN Models Interleaved Practice's Robustness in Multi-Skill Learning

Minimal recurrent neural networks explain the superior robustness of interleaved practice (IP) for acquiring multiple procedural skills simultaneously, versus other methods. Key insight for generalization in sequential models.

A minimal recurrent neural network models the robustness of ... - Nature

4d ago·

nature.com

5d ago

Fractal Activations: Lipschitz Edges for Generalization

Key insights from new paper on fractal activations in neural networks:

Designed specifically for artificial neural networks
Fig. 4 visualizes...

Designing fractal activation functions for artificial neural networks

5d ago·

sciencedirect.com

5d ago

AI Theory Daily · Feb 21 Daily Digest

Generalization Theory

🔥 Modular Learning Framework: Provides generalization bounds scaling with the lightweight gate's complexity and proves...

Orthogonal Representation Learning for Estimating Causal Quantities

mcml.ai

5d ago

Orthogonal Reps Boost Causal Inference from Observational Data

Orthogonal representation learning tackles estimating causal quantities (e.g., conditional average treatment effect) from observational data, building on widely used representation methods.

Orthogonal Representation Learning for Estimating Causal Quantities

5d ago·

mcml.ai

5d ago

InsNet's Rigorous Approximation and Generalization Bounds

InsNet offers a deep indefinite spectral kernel network with rigorous theoretical analysis of its components, approximation guarantees, and generalization error bounds.

InsNet: Deep Indefinite Spectral Kernel Network - ScienceDirect.com

5d ago·

sciencedirect.com

5d ago

OOD Detection via Model Optimization Directions

New framework identifies out-of-distribution samples by quantifying the optimized direction of the model in response to test inputs.

A Framework for Out-of-Distribution Sample Identification Using Signed ...

5d ago·

link.springer.com

5d ago

Nature Publishes DL Generative Model for Conditional Crystal Structures

Nature introduces a deep learning generative model for conditional crystal structure generation, referencing foundational work like nonequilibrium thermodynamics in unsupervised learning.

Deep learning generative model for conditional crystal structure ... - Nature

5d ago·

nature.com

6d ago

FRAPPE: Two-Stage Fine-Tuning for World Modeling in Generalist Policies

FRAPPE—Future Representation Alignment via Parallel Progressive Expansion—adopts a two-stage fine-tuning method to infuse world modeling into generalist policies, addressing key challenges.

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple ...

6d ago·

arxiv.org

6d ago

Nature's Preconditioned Inexact Stochastic ADMM for Deep Models

Nature introduces preconditioned inexact stochastic ADMM for deep models, building context with AdaBound's clipping of Adam learning rates for large gradients.

Preconditioned inexact stochastic ADMM for deep models - Nature

6d ago·

nature.com

6d ago

Trend: Latent Optimization and Modular Bounds Boost Generative Foundations

Key advances in generative model theory:

Unified Latents (UL) co-trains diffusion prior/decoder, linking encoder noise to prior's minimum for...

How best to learn latent representations for diffusion models

6d ago·

emergentmind.com

6d ago

MeGU: Machine-Guided Unlearning Framework

Researchers propose MeGU, a framework for machine-guided unlearning that manipulates and enhances the process through target feature disentanglement.

MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

6d ago·

arxiv.org

6d ago

AI Theory Daily · Feb 20 Daily Digest

Representation Learning Advances

🔥 Invariant Semantic Structure Across Models: Evidence for and consequences of invariant semantic structure...

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

arxiv.org

Theory-driven understanding of generalization and rich representation learning

New optimization strategies that reshape how deep models learn

Blending new model designs with robustness and multimodal reasoning

Recent Posts

ADCT Enhances Vision Model Robustness Against Illusions

ADCT: Improving Robustness and Calibration of Pattern Recognition Models Against Visual Illusions

F-INR: Tensor Decomposition Scales High-Dimensional INRs

AI Theory Daily · Feb 24 Daily Digest

Optimization Advances

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

Orthogonalized Momentum Improves Adam Over Muon

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

Unifying Decoding Samplers as Optimization on Probability Simplex

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

VESPO: Variance-Reduced Variational Method for Stable Off-Policy LLM Training

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Linear Speedup in PFEDRL via Shared Feature Representations

[PDF] on the linear speedup of personalized fed- - erated reinforcement learning ...

NeST: Neuron-Selective Tuning for Efficient LLM Safety

NeST: Neuron Selective Tuning for LLM Safety

Minimal RNN Models Interleaved Practice's Robustness in Multi-Skill Learning

A minimal recurrent neural network models the robustness of ... - Nature

Fractal Activations: Lipschitz Edges for Generalization

Designing fractal activation functions for artificial neural networks

AI Theory Daily · Feb 21 Daily Digest

Generalization Theory

Orthogonal Representation Learning for Estimating Causal Quantities

Orthogonal Reps Boost Causal Inference from Observational Data

Orthogonal Representation Learning for Estimating Causal Quantities

InsNet's Rigorous Approximation and Generalization Bounds

InsNet: Deep Indefinite Spectral Kernel Network - ScienceDirect.com

OOD Detection via Model Optimization Directions

A Framework for Out-of-Distribution Sample Identification Using Signed ...

Nature Publishes DL Generative Model for Conditional Crystal Structures

Deep learning generative model for conditional crystal structure ... - Nature

FRAPPE: Two-Stage Fine-Tuning for World Modeling in Generalist Policies

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple ...

Nature's Preconditioned Inexact Stochastic ADMM for Deep Models

Preconditioned inexact stochastic ADMM for deep models - Nature

Trend: Latent Optimization and Modular Bounds Boost Generative Foundations

How best to learn latent representations for diffusion models

MeGU: Machine-Guided Unlearning Framework

MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

AI Theory Daily · Feb 20 Daily Digest

Representation Learning Advances

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment