Core diffusion models, rectified flows, and acceleration methods for video and image generation

Diffusion and Video Generation

Recent advances in diffusion-based generative models have significantly expanded their capabilities across both image and video synthesis, emphasizing not only high fidelity but also efficiency and scalability. This new wave of research introduces novel diffusion architectures and rectified flow variants tailored for long video generation and high-quality images, alongside innovative inference-time scaling and acceleration methods that enable faster, more resource-efficient generation.

Innovations in Diffusion Architectures and Rectified Flows

Traditional diffusion models, while powerful, often face challenges when scaling to longer videos or higher-resolution images due to computational complexity and sampling stability issues. To address these, researchers have developed specialized diffusion architectures and rectified flow variants:

Rectified Flow Approaches: As discussed in the article "Why Stable Diffusion 3 Switched to Rectified Flow", rectified flow methods improve sampling stability and fidelity in diffusion models. These techniques modify the diffusion process to produce more consistent results, especially beneficial for generating long videos where temporal coherence is critical.
High-Quality Image and Video Generation: New architectures are designed to handle long sequences efficiently, enabling the synthesis of detailed, consistent, and high-resolution images and extended videos. For example, the Helios model offers real-time long video generation, showcasing how architectural innovations facilitate scalable, high-quality outputs.

Inference-Time Scaling and Acceleration Techniques

Efficiency remains a paramount concern, especially for deploying diffusion models in real-world applications. Recent methods focus on scaling inference and accelerating generation without retraining:

Inference-Time Scaling: As outlined in "INFERENCE-TIME SCALING IN DIFFUSION MODELS", these techniques enhance reasoning capabilities and sampling speed during inference, making high-fidelity generation feasible in real-time scenarios.
Block Diffusion Acceleration: The DFlash approach significantly reduces inference latency by implementing block diffusion methods, achieving up to 6x faster inference for large language models (LLMs). This demonstrates how block-wise diffusion can be adapted beyond text to visual data, facilitating rapid video and image synthesis.
Training-Free Spatial Speedups: The Just-in-Time method introduces training-free spatial acceleration for diffusion transformers, enabling real-time, high-resolution generation on resource-constrained devices—for instance, smartphones or embedded systems—without additional training overhead.

Complementary Techniques and Practical Benefits

Beyond architectural and inference innovations, other techniques support the deployment of diffusion models for long videos and high-quality images:

Model Compression and Quantization: Modality-aware quantization allows large diffusion models to be compressed efficiently, making on-device generation practical without significant quality loss.
Physics-Informed Diffusion: Incorporating geometric and physical priors—as seen in models like DiffusionHarmonizer—ensures generated data adheres to scientific and physical constraints, improving realism and reliability, especially in applications requiring spatial and physical accuracy.

Future Directions and Applications

These advancements pave the way for high-fidelity, efficient, and scalable generative systems capable of producing long, coherent videos and detailed images in real time. Potential applications include:

Video editing and content creation: Long, consistent video synthesis for entertainment and media.
Scientific visualization: Physics-informed models for molecular or material simulations.
Edge deployment: Real-time image and video generation on resource-limited devices, fueling innovations in AR/VR, robotics, and digital twins.

In conclusion, the integration of novel diffusion architectures, rectified flow variants, and inference-time acceleration techniques marks a significant stride toward scalable, high-quality generative AI. These innovations are transforming the landscape, enabling real-time long video generation, high-resolution image synthesis, and efficient deployment across diverse domains.

Sources (9)

Updated Mar 16, 2026

AI Daily Brief

Core diffusion models, rectified flows, and acceleration methods for video and image generation

Innovations in Diffusion Architectures and Rectified Flows

Inference-Time Scaling and Acceleration Techniques

Complementary Techniques and Practical Benefits

Future Directions and Applications

Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation

Decoding Diffusion Models

[Model Review] Dynin-Omni : Omnimodal Unified Large Diffusion Language Model

Why Stable Diffusion 3 Switched to Rectified Flow: A Visual Explorer | by Jun Nishimura | Mar, 2026 | Medium

DFlash Deep Dive: Block Diffusion Makes LLM Inference 6x Faster

PureCC: Pure Learning for Text-to-Image Concept Customization

INFERENCE-TIME SCALING IN DIFFUSION MODELS

@srush_nlp reposted: 🚨 In our paper “Learn from Your Mistakes: Self-Correcting Masked Diffusion Model...