New strategies for data, objectives, and supervision in LLM post-training

Rethinking How We Train LLMs

This cluster explores emerging methods to make large language model training more data-efficient, targeted, and robust. Work spans synthetic domain-specific instruction generation, adversarial co-evolution of code models with their test suites, and EBFT-style fine-tuning via feature matching instead of just token-level loss. Other pieces compare supervised fine-tuning with reinforcement learning approaches, propose soft-prompted semantic normalization for unsupervised structuring of domains like research abstracts, and introduce Bayesian teaching to optimally select training examples. Together, they point toward smarter curricula and data-centric techniques that can deliver stronger models without brute-force scaling of data and compute.

Sources (6)

Updated Mar 18, 2026

AI & ML Daily Digest

New strategies for data, objectives, and supervision in LLM post-training

Domain-Specific Data Synthesis for Large Language Models Instruction ...

[2603.15611] Code-A1: Adversarial Evolving of Code LLM and Test ...

EBFT: Fine-Tuning LLMs using Feature Matching

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

Soft-Prompted Semantic Normalization for Unsupervised ...

Google Researchers Propose Bayesian Teaching Method for Large Language Models - InfoQ