AI Theory Daily · Mar 20 Daily Digest
Optimization Stability Bounds
- 🔥 Ghosts of Softmax: The paper identifies ghosts of softmax as complex zeros creating singularities in...

Created by Mark
Daily curated AI research on optimization, generalization, representation learning, core ML and safety/alignment
Explore the latest content tracked by AI Theory Daily
Linearized Bregman iterations applied to sparse learning in spiking neural networks, evaluating performance on feedforward and other architectures. Key step toward sparsity guarantees in neuromorphic optimization.
Key insights from 'Ghosts of Softmax' on cross-entropy singularities:
NSD leverages neural characteristic functions in the spectral domain to encode feature-structure dependencies of all orders, enabling adaptive distribution alignment via a learnable frequency sampler.
Key highlights from the paper on semantic phase locking and interference in neural networks:
Reasoning in LLMs goes beyond direct answers from questions—it requires generating the thinking process (implicitly or explicitly). This insight ties inference scaling to advancing agentic systems.
Breakthrough in foundational theory: New work provides provable inference for deep neural network estimators in generalized settings, extending...
New research proposes stabilizing updates in differentially private stochastic gradient descent, evaluated on four publicly available datasets like MNIST.
Visually prompted methods achieve unbiased object detection beyond frequency biases, countering models' tendency to learn dataset-specific shortcuts over generalizable features.
MoDA introduces a mechanism where each attention head attends to sequence KV pairs at the current layer and depth KV pairs. Key for efficient Transformer scaling.
Fresh video digest (7:36) scans 5 key CS papers with diagrams:
New analysis proves the Muon optimizer converges reliably under heavy-tailed noise. It provides convergence guarantees for adaptive optimizers facing floating-point quantization, bolstering stability in noisy training regimes.
New bioRxiv paper reveals mechanistic insights into training dynamics:
Grokking modeled as a variance-limited phase transition through spectral gating and tail-index analysis of stochastic gradient noise in deep neural networks. ICML 2019 paper links to heavy-tailed spectral scaling in generalization.
Consequentialist objectives in AI agents risk catastrophe, but early stopping—limiting environmental learning time—offers a key alignment strategy, per Gao et al. [2022].
New convergence rate analysis for a functional learning method in contextual settings, targeting cases where functions f(·) and g(·) are continuously.... Essential theory bridging nonparametric stats to contextual decision-making.