Vision & Language Pulse

766 posts

Updated 8h ago

169 scanned

Tri-modal masked diffusion models get a deep dive into their design space, inviting discussion on this cutting-edge research.

Tensions escalate between Pentagon and Anthropic over Claude's military use:

Anthropic blocks mass surveillance of Americans and autonomous weapons...

NoLan mitigates object hallucinations in large vision-language models via dynamic suppression of language priors. Key safety advance for VLMs.

DreamID-Omni launches a unified framework for controllable human-centric audio-video generation, advancing multimodal pipelines for personalized AV content.

Key progress in spatiotemporal VLMs:

4D-RGPT outperforms baselines: +5.3% on 6 standard 3D/4D tasks and +4.3% on new R4D-Bench
Introduces...

Gemini 3.1 Pro leads flagship benchmarks with major multimodal and long-context gains:

77.1% ARC-AGI-2 abstract reasoning (vs. Claude Opus 4.6's...

GUI-Libra trains native GUI agents to reason and act using action-aware supervision and partially verifiable RL. A promising step for verifiable enhancement in GUI navigation.

Vision investments surge for robotics commercialization:

Wayve's $1.2B Series D at $8.6B valuation from Microsoft, NVIDIA, Uber shifts AI platform...

SeaCache proposes spectral-evolution-aware caching to accelerate diffusion models, targeting faster CV inference pipelines.

NanoKnow reveals how to know what your language model knows, offering lightweight tools for probing LM internals.

Autonomous Driving Deals

🔥 Wayve Raises $1.2B: Wayve announced a $1.2 billion Series D funding round valuing the company at $8.6 billion, with...

$1.2B Series D values Wayve at $8.6B, with Microsoft, Nvidia, Uber, Mercedes, Nissan, Stellantis investing to fuel end-to-end AI from research to...

JavisDiT++ introduces unified modeling and optimization for joint audio-video generation, advancing efficient multimodal systems that bridge audio and video modalities.

AI is acing math exams faster than scientists can write them, highlighting challenges in math benchmarking as a proxy for reasoning advances. Math...

tttLRM breakthrough from Adobe and UPenn:

Turns photo sets into high-quality 3D Gaussian Splats
Refines 3D models incrementally with added views
Demoed via 6 striking examples

Anthropic acquires Vercept, scooping up top AI researchers and engineers pioneering automation of complex computer tasks. A clear talent grab to supercharge agentic AI capabilities.

Investor surge fuels Wayve's global autonomy push:

$1.5B round led by Eclipse, Balderton (since 2019), SoftBank; Microsoft, NVIDIA, Uber join for...

Key highlights from the WACV 2026 oral presentation:

CONSTANT targets high-quality one-shot handwriting generation using patch contrastive...

Core tension: Pentagon demands Anthropic loosen Claude's ethical guardrails—banning autonomous weapons and mass surveillance—by Feb 27 or forfeit...

Untied Ulysses introduces headwise chunking enabling memory-efficient context parallelism – a breakthrough for long-context LLM inference in production NLP. Join the discussion.

General-purpose multimodal AI safety, interpretability, and evaluation methods

Multimodal AI in healthcare: models, benchmarks, deployment, and oversight

Benchmarks, memory architectures, and evaluation for agentic systems

Misuse of agents, jailbreaks, and content analysis methods

Inference hardware, low-latency systems, and high-throughput model deployment

Major funding rounds, infrastructure strategy, and acquisitions in AI

Safety discourse, public incidents, and AI-generated content analysis

Frontier-level chat and multimodal model launches and comparisons

High-speed coding models, developer productivity, and compute plans

Model benchmarks, interpretability-driven supervision, and authenticity detection

Alibaba’s Qwen 3.5 family and its impact on the frontier race

Gemini 3 Deep Think and Gemini 3.1 Pro upgrades and records

Broader frontier and open multimodal architectures and agents

Inference hardware, multimodal benchmarks, and authenticity/fake news detection

Safety incidents, alignment methods, security issues, and governance debates

AI policy, security breaches, safety incidents, and regulatory responses

AI benchmarks, evaluation methodologies, and memory architectures for agents and LLMs

Claude Sonnet 4.6 launch, capabilities, and reception

Agentic AI platforms, tools, and ecosystem business moves

Recent Posts

New Paper Explores Tri-Modal Masked Diffusion Design Space

The Design Space of Tri-Modal Masked Diffusion Models

DoD-Anthropic Standoff: AI Limits on Lethal Autonomy

AI Death Machines. No Human Oversight. What Could Go Wrong?

NoLan: Dynamic Suppression for VLM Object Hallucinations

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

DreamID-Omni: Unified Framework for Controllable Human-Centric AV Generation

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

4D-RGPT and R4D-Bench Advance Video-Language for Physical AI

Gemini 3.1 Pro Tops Claude Opus 4.6 in Reasoning and 1M Context

Gemini 3.1 Pro vs Claude Opus 4.6: Benchmarks & 1M Context | VERTU

GUI-Libra: Action-Aware Training for Native GUI Agents

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Funding Boom Signals CV-Driven Physical AI Commercial Push

What Wayve’s $8.6B Valuation Tells Automotive Leaders

SeaCache: Spectral caching accelerates diffusion models

SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models

NanoKnow: Unlocking Language Model Internal Knowledge

NanoKnow: How to Know What Your Language Model Knows

Vision & Language Pulse · Feb 26 Daily Digest

Autonomous Driving Deals

Big Tech and Auto Giants Back Wayve's $1.2B Shift to Commercial AV

Microsoft, Nvidia, and Uber Are Betting Big on This Autonomous Driving Startup. It’s Now Valued at $8.6 Billion

JavisDiT++: Unified Modeling for Joint Audio-Video Generation

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

AI Outpaces Math Benchmark Creators, Straining Reasoning Proxies

AI Is Acing Math Exams Faster Than Scientist Write Them

tttLRM: Adobe/UPenn's Photo-to-3D Gaussian Splats (CVPR 2026)

Anthropic's Strategic Vercept Acquisition Boosts Computer-Use AI

Anthropic Acquires Vercept: AI Computer-Use Startup Deal

Wayve's Funding Momentum: Big Tech and Auto Back CV Embodied AI

Wayve secures $1.5B to deploy its global autonomy platform

CONSTANT: One-Shot Handwriting Generation at WACV 2026

Pentagon Ultimatum Puts Anthropic's Claude Ethics to the Test

The Pentagon’s Ultimatum to Anthropic Is Bigger Than One Contract

Untied Ulysses: Headwise Chunking for Memory-Efficient Context Parallelism

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking