AI Diffusion Lab

46 min ago

Physics-Aware Image Editing with Latent Transition Priors

@_akhaliq shares 'From Statics to Dynamics': new paper on Physics-Aware Image Editing with Latent Transition Priors. Explore latent priors for physics-realistic edits—prime for diffusion workflow experiments.

17h ago

DreamID-Omni: Unified Framework for Image+Voice to Audio-Video Gen

New research for human-centric audio-video from multimodal inputs:

Captioning system blends person appearance + voice for natural convos and...

17h ago

ComfyUI on Arch Linux: AMD ROCm Setup + Krita Integration Guide

Hands-on install for AMD GPUs (RDNA2/3) on Arch/CachyOS with Python 3.13 and ROCm:

Prep: sudo pacman -S wget/git/yay/python-pip; add user to...

18h ago

AI Diffusion Lab · Feb 27 Daily Digest

ComfyUI Video Workflows

🔥 LTX-2 Vid2Vid Motion Transfer: Youtube tutorial provides ComfyUI workflow JSON to use one video as conductor for...

1d ago

SkyReels-V4: Unified Multi-Modal Video-Audio Model for Gen, Inpainting & Editing

SkyReels-V4 drops as a dual-stream MMDiT foundation model for joint video-audio generation, inpainting, and editing at 1080p, 32 FPS, 15s.

Key for...

Paper page - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

huggingface.co

Paper page - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

1d ago

AI Diffusion Lab · Feb 26 Daily Digest

Video Generation Advances

🔥 JavisDiT++ Paper: JavisDiT++ introduces unified modeling and optimization for joint audio-video generation, with a...

1d ago

JavisDiT++: Unified DiT for Joint Audio-Video Generation

JavisDiT++ introduces unified modeling and optimization for joint audio-video generation. Join the discussion on this paper page for potential workflow insights.

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

arxiv.org

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

1d ago

2d ago

Urdu/Hindi ComfyUI Beginner Tutorial: Basic Workflow Essentials

Perfect hands-on intro for non-English speakers diving into ComfyUI image generation:

Interface overview and building a basic Text-to-Image...

2d ago

P4D: Distill 4D Knowledge into Video Models at Zero Inference Cost

Perceptual 4D Distillation (P4D) bridges 3D structure and temporal dynamics by distilling explicit 4D knowledge directly into models—no heavy architectural changes and zero added inference cost. 🚀 Paper link

2d ago

PBR Meets Diffusion: Tackling Limited Control in Image Gen

Diffusion-based generators excel at realistic content from text/image conditions, but offer only limited explicit control. New arXiv paper bridges PBR...

Bridging Physically Based Rendering and Diffusion Models with ... - arXiv

2d ago·

arxiv.org

2d ago

Rolling Sink: Hands-On Demos for Extended Train-Test Gaps in Diffusion

Follow-up to Self Forcing: Targets train-test gaps beyond training duration in diffusion models
Practical resources: Paper, code, and Gradio...

2d ago

AI Diffusion Lab · Feb 25 Daily Digest

Video Generation Codebases

🔥 Causal-Forcing Repo: Official GitHub codebase for Causal Forcing: Autoregressive Diffusion Distillation Done Right...

2d ago

VAEs Are Back: Diffusion Co-Training Unlocks Efficient Compression and SOTA Efficiency

VAEs are back! 🚀 Co-training a diffusion prior with encoder and diffusion decoder compresses visual data into controllable bits, enabling SOTA results with smaller models and fewer FLOPs via VAE latent modeling.

2d ago

SDXL + Blender Tutorial: Texturing Ancient Statues

Hands-on workflow for AI-textured 3D statues using SDXL prompts, LoRAs, and Blender:

Prompt: "ancient Greek statue of a strong man with beard";...

3d ago

LoRWeB: Mixed LoRAs for Visual Analogy Tasks in Flux.1

LoRWeB framework mixes LoRAs for advanced visual analogies, ideal for image editing workflows:

32 LoRA basis spans diverse transformations
Encoder...

3d ago

MOVD: On-Device Text-to-Video Without Retraining or Compression

Mobile-Oriented Video Diffusion (MOVD) enables text-to-video generation on mobile devices without retraining, compression, or pruning.

Key resources...

3d ago

MLSys Video on Serving Text2Image Diffusion Models

15:49 MLSys YouTube video covers effectively serving Text2Image diffusion models – essential for optimizing Stable Diffusion inference and scaling workflows.

3d ago

Video-Reason Dataset Supercharges Wan 2.2 for Logical Video Gen in ComfyUI

Unlock physics-accurate AI videos with the Video-Reason dataset, enabling Wan 2.2 to handle perception, spatiality, and transformations.

Real-world...

3d ago

Causal Forcing GitHub Repo for Real-Time Video Generation

thu-ml/Causal-Forcing releases the official codebase for Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time...

thu-ml/Causal-Forcing - GitHub

3d ago·

github.com

4d ago

Generated Reality: Pose-Controlled Video for Egocentric Simulations

New paper on human-centric video world model for XR:

Conditioned on tracked head pose and joint-level hand poses for dexterous interactions
-...

Paper page - Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

huggingface.co

Paper page - Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

4d ago

New tools push AI video toward real-time, controllable filmmaking

Creating custom LoRA datasets and consistent character workflows

LLM-augmented diffusion models improve factual image generation

Interactive video generation for human-centric world simulation

Recent Posts

Physics-Aware Image Editing with Latent Transition Priors

DreamID-Omni: Unified Framework for Image+Voice to Audio-Video Gen

ComfyUI on Arch Linux: AMD ROCm Setup + Krita Integration Guide

AI Diffusion Lab · Feb 27 Daily Digest

ComfyUI Video Workflows

SkyReels-V4: Unified Multi-Modal Video-Audio Model for Gen, Inpainting & Editing

Paper page - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

AI Diffusion Lab · Feb 26 Daily Digest

Video Generation Advances

JavisDiT++: Unified DiT for Joint Audio-Video Generation

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

Urdu/Hindi ComfyUI Beginner Tutorial: Basic Workflow Essentials

P4D: Distill 4D Knowledge into Video Models at Zero Inference Cost

PBR Meets Diffusion: Tackling Limited Control in Image Gen

Bridging Physically Based Rendering and Diffusion Models with ... - arXiv

Rolling Sink: Hands-On Demos for Extended Train-Test Gaps in Diffusion

AI Diffusion Lab · Feb 25 Daily Digest

Video Generation Codebases

VAEs Are Back: Diffusion Co-Training Unlocks Efficient Compression and SOTA Efficiency

SDXL + Blender Tutorial: Texturing Ancient Statues

LoRWeB: Mixed LoRAs for Visual Analogy Tasks in Flux.1

MOVD: On-Device Text-to-Video Without Retraining or Compression

MLSys Video on Serving Text2Image Diffusion Models

Video-Reason Dataset Supercharges Wan 2.2 for Logical Video Gen in ComfyUI

Causal Forcing GitHub Repo for Real-Time Video Generation

thu-ml/Causal-Forcing - GitHub

Generated Reality: Pose-Controlled Video for Egocentric Simulations

Paper page - Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Reading Activity