**********Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med [developing]

Key Questions

What is Spatial-TTT?

Spatial-TTT achieves state-of-the-art on unbounded video tasks with streaming/4D adaptation. It enables long video processing and editing.

What are the 'Three Levels of TTT'?

The blog covers Test-Time Training, Meta Training, and World Models levels of TTT. It discusses Test-Time Scaling making overtraining compute-optimal.

What is Colon-Bench?

Colon-Bench is a colonoscopy video benchmark for MLLMs, covering lesion/seg/VQA with 300k boxes across 23 models. It tests medical video understanding.

What is RF-DETR?

RF-DETR is the top open-source detector for aerial/satellite images, outperforming YOLO26 by +2-5 mAP. It uses Roboflow distillation.

What is PackForcing?

PackForcing enables short-train long-video generation via cache/diffusion on a single GPU for 2-min videos. It improves video memory efficiency.

Spatial-TTT unbounded video SOTA, Test-Time Scaling (overtraining optimal), Three Levels TTT blog (episode/meta/world), noise-TTT/LoGeR/LMEB/EgoEdit/VideoDetective/TrajLoom, PEARL (Dual-grained/PEARL-Bench)/AURA always-on, Simple Baseline Streaming Video. New: SAMA/RPiAE/OmniWeaving/ShotStream streaming multi-shot gen, Colon-Bench (colonoscopy MLLM video: lesion/seg/VQA, 300k boxes, 23 models), hierarchical liver lesion seg transformer, VAND 4.0 Kaputt AD, YOLO26, RF-DETR (top OSS aerial/sat detection +2-5 mAP vs YOLO26; Roboflow distil), PackForcing short-train long-video cache/diffusion (2-min single GPU), GenMask DiT seg, GEditBench v2/SpatialEdit. Gaps: power/TTT-VLM. Repro: TTT stacks (Test-Time Scaling/Three Levels), LMEB/EgoEdit/VideoDetective/TrajLoom/PEARL/AURA/Simple Baseline/Colon-Bench/ShotStream/PackForcing/RF-DETR evals, energy/latency/VAND/YOLO26.

Sources (6)

Updated Apr 8, 2026

Vision Research Tracker

**********Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med [developing]

Key Questions

What is Spatial-TTT?

What are the 'Three Levels of TTT'?

What is Colon-Bench?

What is RF-DETR?

What is PackForcing?

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

@kaiwei_chang reposted: I wrote a blog "Three Levels of TTT" — Test-Time Training, Meta Training, World ...

@_akhaliq: Test-Time Scaling Makes Overtraining Compute-Optimal paper: https://t.co/oxFgiiS8Vm https://t.co/pG...

A Hierarchical Extended Transformer Framework for Liver Lesion ...

@skalskip92: RF-DETR is the best open-source choice if you work with aerial or satellite images we evaluated RF-...

@CMHungSteven reposted: Releasing Colon-Bench A colonoscopy video understanding benchmark for MLLMs on ...

**************Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med**** [developing]

Key Questions

What is Spatial-TTT?

What are the 'Three Levels of TTT'?

What is Colon-Bench?

What is RF-DETR?

What is PackForcing?

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

@kaiwei_chang reposted: I wrote a blog "Three Levels of TTT" — Test-Time Training, Meta Training, World ...

@_akhaliq: Test-Time Scaling Makes Overtraining Compute-Optimal paper: https://t.co/oxFgiiS8Vm https://t.co/pG...

A Hierarchical Extended Transformer Framework for Liver Lesion ...

@skalskip92: RF-DETR is the best open-source choice if you work with aerial or satellite images we evaluated RF-...

@CMHungSteven reposted: Releasing Colon-Bench A colonoscopy video understanding benchmark for MLLMs on ...

**********Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med [developing]