Novel ML Architectures

Key Questions

What is MegaTrain?

MegaTrain enables full precision training of 100B+ parameter LLMs on a single GPU. It overcomes hardware limitations for efficient large-scale model training.

What does ThinkTwice optimize in LLMs?

ThinkTwice jointly optimizes LLMs for reasoning and self-refinement. It enhances logical capabilities through iterative improvement processes.

What is Cog-DRIFT?

Cog-DRIFT enables models to learn from zero-reward examples using RLVR techniques. It advances reinforcement learning in reward-sparse environments.

What is TriAttention?

TriAttention uses trigonometric KV compression for efficient long reasoning. It reduces computational overhead in extended sequence processing.

What improvements does LightThinker++ bring?

LightThinker++ advances from reasoning compression to memory management in LLMs. It optimizes resource use for complex inference tasks.

What is Self-Execution Simulation?

Self-Execution Simulation improves coding LLMs by simulating execution during training. It boosts performance on programming benchmarks.

What is the Geometric Alignment Tax?

The Geometric Alignment Tax compares tokenization vs. continuous geometry in scientific FMs. It highlights limitations of discrete representations in geometric tasks.

What is PLUME?

PLUME is a latent reasoning-based universal multimodal embedding model. It unifies processing across vision, language, and other modalities.

MegaTrain single-GPU 100B; ThinkTwice reasoning/self-refinement; MMEmb-R1 multimodal; pruning hierarchies; In-Place TTT; Cog-DRIFT RLVR zero-reward; TriAttention KV comp; LightThinker++ memory; Self-Execution; Peking AI4Math Anderson proofs; neuro-symbolic dual memory; Chollet/Marcus symbolic; PLUME latent; test-time adaptation; geometric tax sci FMs; noisy reasoning; wetware FORCE; CoreThink etc. Hybrid efficiency challenging scaling.

Sources (34)

Updated Apr 8, 2026

Novel ML Architectures

Key Questions

What is MegaTrain?

What does ThinkTwice optimize in LLMs?

What is Cog-DRIFT?

What is TriAttention?

What improvements does LightThinker++ bring?

What is Self-Execution Simulation?

What is the Geometric Alignment Tax?

What is PLUME?

@EliasEskin: 🚨 Excited to share Cog-DRIFT, new work on enabling models to learn from zero-reward examples! RLVR...

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Demystifying When Pruning Works via Representation Hierarchies

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

LightThinker++: From Reasoning Compression to Memory Management

@adiyossLC reposted: 🚨New paper🚨 Self-Execution Simulation Improves Coding LLMs Current reasoning LL...

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

@GaryMarcus reposted: Paper below tested a variety of base LLMs (no TTA) on generalization-focus math ...

PLUME: Latent Reasoning Based Universal Multimodal Embedding

@kaiwei_chang reposted: I wrote a blog "Three Levels of TTT" — Test-Time Training, Meta Training, World ...

Test-Time Scaling Makes Overtraining Compute-Optimal

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

[PDF] Deep Learning to Design AI Architectures that Improve the ...

Lab-grown rat neurons run real-time machine learning tasks

A New Breakthrough in AI Solving Mathematical Conjectures

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

[Podcast] The Dawn of Deliberate AI

Towards a science of deep learning: the structure of data and weights

Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning

Multiscreen: Replacing Softmax for Faster LLMs

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Executing as You Generate: Hiding Execution Latency in LLM Code Generation

TimesFM - A decoder-only foundation model for time-series forecasting

TrackMAE: Motion-Aware Video Representation

SSD: Simple Self-Distillation for LLM Coding

Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning

LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

[PDF] Meshed-Memory Transformer for Image Captioning - IRIS