Talk on foundation models, scaling, and generalisation

Scaling Laws for Open Models

Advancements in Foundation Models: Scaling Laws, Generalisation, and Emerging Challenges at ML in PL 2025

The field of artificial intelligence continues its rapid evolution, driven by the development of foundation models—large, versatile architectures capable of performing a broad array of tasks across multiple domains. At ML in PL 2025, researcher Jenia Jitsev’s influential talk illuminated critical insights into how scaling laws influence model performance and generalisation, particularly in the context of open models. Building on this foundation, recent research and emerging analyses are shedding light on nuanced aspects of model evaluation, benchmarking practices, and the expanding frontier of multimodal understanding.

The Central Message: Scaling Laws as a Guide to Model Development

Jitsev’s presentation emphasized the importance of empirical scaling laws, which articulate predictable relationships between model size, training data volume, and computational resources. These laws serve as a compass for AI researchers, enabling strategic decisions about where and how to allocate resources to maximize gains.

Key Highlights:

Predictable Performance Trends: Scaling laws have demonstrated that as models grow larger, they tend to perform better across tasks. However, this improvement follows a diminishing returns pattern, meaning larger models eventually plateau in their gains.
Practical Constraints: Recognizing the limits of scalability is essential to avoid unnecessary costs. As models reach certain thresholds, efficiency becomes paramount—prompting a shift from merely increasing size to optimizing training and inference strategies.
Informed Resource Allocation: These insights guide researchers in balancing model complexity, data, and compute, ensuring sustainable progress.

Enhanced Generalisation and Transfer Learning Capabilities

A core focus of the talk was on how larger open foundation models exhibit superior generalisation, especially in zero-shot and few-shot learning contexts. These models demonstrate an impressive capacity to adapt to new tasks with minimal or no task-specific training data.

Recent Findings:

A comprehensive comparative analysis involving eight large language models (LLMs) published in BMC Oral Health confirms that scaling improves transfer learning—enabling models to handle tasks beyond their original training scope effectively.
Zero- and Few-Shot Performance: Larger models consistently outperform smaller counterparts in zero-shot and few-shot settings, showcasing their versatility.
Knowledge Transfer Across Domains: As models scale, their ability to transfer knowledge across diverse fields becomes more robust, which is vital in real-world applications where annotated data may be limited.
Robustness and Fairness Considerations: Scaling also opens avenues for addressing issues related to model robustness, bias, and fairness, although challenges remain in ensuring ethical deployment.

New Insights into Benchmarking and Multimodal Generalisation

Beyond performance improvements, recent developments are critically examining how we evaluate AI models and measure their true capabilities.

Rethinking Benchmarking:

A noteworthy article titled "AI benchmark numbers are meaningless — here’s what to look for instead" critiques the overreliance on headline performance metrics. It argues that benchmark scores can be misleading, often failing to capture real-world robustness, fairness, or efficiency.
Instead, the focus is shifting toward comprehensive evaluation frameworks that include adversarial robustness, interpretability, and societal impacts.

Multimodal Models and Unified Benchmarks:

A recent study titled "UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?" explores whether multimodal, unified models genuinely enhance understanding across different sensory inputs.
The investigation suggests that while unified models hold promise, their progress depends heavily on benchmark design and evaluation protocols—highlighting the importance of measuring generalisation across modalities rather than relying solely on single-task metrics.
These efforts emphasize developing benchmarks that can holistically assess a model’s ability to integrate and process data from vision, language, and other sensory modalities, which is crucial for building truly versatile AI systems.

Significance and Future Directions

The collected insights from ML in PL 2025 underscore several key implications:

Strategic Scaling: Understanding scaling laws helps optimize resource investment, guiding the community toward models that balance performance with efficiency.
Beyond Benchmark Numbers: The AI community is increasingly aware that performance metrics alone are insufficient. Broader evaluation frameworks are necessary to assess robustness, fairness, and real-world utility.
Multimodal Integration: As models evolve, measuring their ability to process and understand multiple modalities will be vital. Effective unified models could revolutionize applications from autonomous systems to healthcare diagnostics.
Ethical and Societal Considerations: Scaling models raises ethical questions around bias, transparency, and environmental impact. Future research must integrate these considerations into model development and evaluation.

Current Status and Broader Impact

Today, the AI community stands at a pivotal crossroads—leveraging scaling laws to push performance boundaries while critically re-evaluating benchmarking practices to ensure genuine progress. As models grow larger and more capable, transparency, fairness, and sustainability become increasingly central to responsible AI development.

The insights from ML in PL 2025 highlight that progress is not solely about bigger models but about smarter evaluation, ethical deployment, and multimodal integration. These endeavors will shape the future trajectory of AI, ensuring it benefits diverse communities and aligns with societal values.

In conclusion, the ongoing research and discussions at ML in PL 2025 reaffirm that a deep understanding of scaling laws and generalisation is crucial for advancing open foundation models. By refining evaluation practices and embracing multimodal challenges, the field is moving toward creating more versatile, efficient, and ethically aligned AI systems—paving the way for innovations that are both groundbreaking and responsible.

Sources (4)

Updated Mar 16, 2026

Generative AI Pulse