Gemini Omni & Frontier Releases

Key Questions

What new multimodal systems did Google introduce?

Google unveiled the Gemini Omni family of natively multimodal AI systems at its developer conference. Omni supports combined image, audio, video, and text inputs for video generation grounded in real-world knowledge.

What is ByteDance releasing in open-source multimodal models?

ByteDance open-sourced Lance, a 3B unified multimodal model using multi-task synergy. It advances efficient, production-ready open-weight models.

How are efficiency improvements being pursued in frontier models?

Research shows slimmed-down LLMs can reduce environmental and energy footprints. Papers like CODA explore rewriting transformer blocks for better efficiency in training and serving.

What video-related AI advancements are covered?

Bernini introduces latent semantic planning for video diffusion models. Luma's Uni 1.1 and SenseNova-U1 also contribute to video generation progress.

What does the Gemini Omni rollout include?

It features video, apps, and agent capabilities integrated with real-world knowledge. This supports omni-modal understanding across audio-visual and text inputs.

Are there advances in avoiding data filtering for pretraining?

Studies indicate LLMs can pretrain better without aggressive data filtering. This challenges conventional practices for improving model performance.

What governance or open AI efforts are mentioned alongside frontier releases?

Forschungszentrum Jülich supports ELLIS NRW's push for open AI and foundation models. This complements efficiency-focused research in multimodal systems.

How do new papers address multimodal reasoning?

LatentOmni rethinks omni-modal understanding via unified audio-visual latent reasoning. Additional work explores when multimodal LLMs should speak or respond.

OpenAI Erdős refutation; Gemini Omni/Flash video/apps/agents rollout; ByteDance Lance 3B unified multimodal + OSS; Bernini video diffusion planning; Luma Uni 1.1; SenseNova-U1; efficiency advances.

Sources (69)

Updated May 23, 2026

Gemini Omni & Frontier Releases

Key Questions

What new multimodal systems did Google introduce?

What is ByteDance releasing in open-source multimodal models?

How are efficiency improvements being pursued in frontier models?

What video-related AI advancements are covered?

What does the Gemini Omni rollout include?

Are there advances in avoiding data filtering for pretraining?

What governance or open AI efforts are mentioned alongside frontier releases?

How do new papers address multimodal reasoning?

Bernini: Latent Semantic Planning for Video Diffusion

How 'slimmed-down' large language models can reduce AI's ...

Lance: Unified Multimodal Modeling by Multi-Task Synergy (May 2026)

@jeremyphoward reposted: Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and...

Can Multimodal AI Unlock a New Path to AGI? | Caroline Ingeborn

AI Daily: ByteDance Open-Sources Unified Multimodal Large Model ...

Forschungszentrum Jülich Helps Power ELLIS NRW Push for Open AI and Foundation Models

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Training and Serving System of Foundation Models

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Google Introduces Gemini Omni Multimodal World Model At Annual Developer Conference

LLMs Pretrain Better Without Data Filtering

Introducing Gemini Omni

Beyond Words: Multimodal LLM Knows When to Speak

SenseNova U1. Un modelo multimodal (texto e imagen) con multitud de cosas interesantes

What Is Gemini Omni? Google's Any-to-Any Multimodal AI

@_akhaliq reposted: Alibaba researchers present MIGA A train-free method for infinite-frame video g...

Gemini 3.5 Flash: AI Model That Thinks Fast and Acts Faster

@polynoamial reposted: 1/ Today, an internal @OpenAI model has refuted Erdős’s unit distance conjecture...

@OfficialLoganK: Gemini 3.5 Flash ranks #1 on Automation Bench (from Zapier), beating every other frontier model at a...

Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O

On the slow death of Scaling (birth of Adaption Labs) | Sara Hooker | HF ML Club India EP2

torchtune: PyTorch native post-training library

Scaling LLM Training to Thousands of GPUs | Nouamane Tazi, HuggingFace |

Exa's $2.2 billion valuation shows AI search has become a premium bet

@ammaar: We’re bringing an all new AI Studio experience to your phone! - Build apps on the go and with your ...

@aidangomez: Our first fully open source Apache 2 model :)

@sama reposted: Today, we’re sharing that a general-purpose internal @openai model achieved a br...

Testing MiniMax M2.7 via API on three real ML and coding workflows

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per Forward Over Qwen3-8B

KoRe: Compact Knowledge Representations for Large ...

Google Launches Gemini 3.5 Flash, AI Model for Enterprise Agents

Google Rolls Out Gemini Omni Mixed-Input Multimodal Model

H2O.ai launches tabH2O, a foundation model that makes predictions from tabular data without any training

Google updates its Gemini app to take on ChatGPT and Claude at IO 2026

With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

Decoding the Logic Behind Multimodal Language Models

Google Just Dropped Gemini 3.5 Flash — And It Beat Claude Opus 4.7 on Every Benchmark

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

new models, a cloud agent that never sleeps, and a redesigned Gemini app

Google claims new Gemini 3.5 Flash runs 4x faster than rival frontier models

Google Says Gemini 3.5 Flash Rivals 'Large Flagship Models' For Coding And Agentic Tasks

Google Introduces Gemini Omni, a Multimodal AI That Knows the World

[2605.18714] Semantic Generative Tuning for Unified Multimodal Models

Gemini Omni is Google’s new multimodal AI model that can create and edit videos using natural

Lance: Unified Image and Video Generation Model

Post-Trained MoE Can Skip Half Experts via Self-Distillation

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

@omarsar0 reposted: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration lo...

A technical report on Composer 2

Berkeley Lab: New MatterChat Model Helps AI to ‘See’ the Language of Science

Qwen 3.7 Preview

@adiyossLC reposted: Our paper: "LaMI: Augmenting Large Language Models via Late Multi-Image Fusion" ...

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Seedance 2.0: ByteDance's Multimodal Audio-Video AI Model

The rise of multimodal foundation models in medicine and ...

Uber Burns Its 2026 AI Budget In Four Months On Claude Code

Towards a general-purpose foundation model for fMRI ...

A workflow utilizing general-purpose large language models for efficient ...

Three Foundation Models, one Lakebase lifecycle

Nemotron 3 Nano Omni: Eyes and Ears for Coding Agents

Understanding the Future of AI: Foundation Models & Generative ...

Self-Distillation Enables Continual Learning [pdf]

Continuous Latent Diffusion Language Model（2605.06548）【論文解説シリーズ】

DeepSeek-V4-Flash means LLM steering is interesting again

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

How LLMs Are Built: Scaling Laws and Emergent AI Abilities