Vision Research Tracker

**************Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med**** [developing]

**************Streaming/4D adaptation & video editing/memory — Spatial-TTT, SAMA, LMEB + egocentric + long video + AD + med**** [developing]

Key Questions

What is Spatial-TTT?

Spatial-TTT achieves state-of-the-art on unbounded video tasks with streaming/4D adaptation. It enables long video processing and editing.

What are the 'Three Levels of TTT'?

The blog covers Test-Time Training, Meta Training, and World Models levels of TTT. It discusses Test-Time Scaling making overtraining compute-optimal.

What is Colon-Bench?

Colon-Bench is a colonoscopy video benchmark for MLLMs, covering lesion/seg/VQA with 300k boxes across 23 models. It tests medical video understanding.

What is RF-DETR?

RF-DETR is the top open-source detector for aerial/satellite images, outperforming YOLO26 by +2-5 mAP. It uses Roboflow distillation.

What is PackForcing?

PackForcing enables short-train long-video generation via cache/diffusion on a single GPU for 2-min videos. It improves video memory efficiency.

Spatial-TTT unbounded video SOTA, Test-Time Scaling (overtraining optimal), Three Levels TTT blog (episode/meta/world), noise-TTT/LoGeR/LMEB/EgoEdit/VideoDetective/TrajLoom, PEARL (Dual-grained/PEARL-Bench)/AURA always-on, Simple Baseline Streaming Video. New: SAMA/RPiAE/OmniWeaving/ShotStream streaming multi-shot gen, Colon-Bench (colonoscopy MLLM video: lesion/seg/VQA, 300k boxes, 23 models), hierarchical liver lesion seg transformer, VAND 4.0 Kaputt AD, YOLO26, RF-DETR (top OSS aerial/sat detection +2-5 mAP vs YOLO26; Roboflow distil), PackForcing short-train long-video cache/diffusion (2-min single GPU), GenMask DiT seg, GEditBench v2/SpatialEdit. Gaps: power/TTT-VLM. Repro: TTT stacks (Test-Time Scaling/Three Levels), LMEB/EgoEdit/VideoDetective/TrajLoom/PEARL/AURA/Simple Baseline/Colon-Bench/ShotStream/PackForcing/RF-DETR evals, energy/latency/VAND/YOLO26.

Sources (6)
Updated Apr 8, 2026
What is Spatial-TTT? - Vision Research Tracker | NBot | nbot.ai