AI Frontier Digest

9h ago

AI Frontier Digest · Jul 19 Daily Digest

Inference Efficiency Breakthroughs

🔥 Byte-Exact KV-Cache Grafting: Byte-exact KV-cache grafting enables a frozen Gemma-4-12B model to reach...

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

arxiv.org

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

23h ago

GPT-5.6 Closes 30-Year Convex Optimization Gap

GPT-5.6 closed a 30-year open problem in convex optimization via a single, specialized prompt. The 10-page prompt drew on a year of prior human research, underscoring frontier models' emerging role in tackling longstanding mathematical challenges.

GPT-5.6 used a prompt to close a 30-year gap in convex optimization

news.ycombinator.com

GPT-5.6 used a prompt to close a 30-year gap in convex optimization

23h ago

KV-Cache Grafting Supercharges Frozen Small Models

Byte-exact KV-cache grafting turns a frozen 12B model into a verified-knowledge engine: on AIME 2025 it jumps from 80.0% to 93.3% (surpassing its 31B...

arxiv.org

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

23h ago

Flawed Protocols Undermine Harness Evolution Claims

Automatic harness evolution for LLM agents shows no consistent gains over simple test-time scaling and limited generalization to held-out tasks. The...

Rethinking the Evaluation of Harness Evolution for Agents

arxiv.org

Rethinking the Evaluation of Harness Evolution for Agents

23h ago

GRASP: Adaptive Granularity for Agentic RAG

GRASP trains agents with RL to dynamically coordinate semantic search, keyword search, and paragraph reading, retrieving sentence-level evidence only...

GRASP: GRanularity-Aware Search Policy for Agentic RAG

arxiv.org

GRASP: GRanularity-Aware Search Policy for Agentic RAG

23h ago

MCP, A2A, ACP: Practical Guide to Agent Protocols

Three protocols define how agents connect:

MCP enables agents to call tools via client-server routing for structured responses.
A2A supports peer...

MCP vs A2A vs ACP: How AI Agents Actually Talk to Each ...

blog.bytebytego.com

MCP vs A2A vs ACP: How AI Agents Actually Talk to Each ...

23h ago

1d ago

AI Frontier Digest · Jul 18 Daily Digest

Method Advances

🔥 DeepLoop: Derives a visit-alignment coefficient that updates the residual scaling exponent from 1/4 to 1/2 for looped...

1d ago

DM-RL Scaling vs On-Policy Distillation Pathologies

Distribution-matching RL targets scaling to large LLMs through direct distribution alignment.
On-policy distillation serves as an exploration...

Scaling Distribution-Matching RL to Large Language Models

1d ago·

arxiv.org

1d ago

Open-Weight Models Close Frontier Gap

Open-weight models are rapidly matching or beating proprietary systems on agentic and engineering benchmarks.

Inkling (Thinking Machines): 975B MoE...

1d ago

Multi-Agent Routing and Shared State Curb Drift in Model Swarms

Bridge-seeking routing plus history cuts clique drift dramatically, hitting consensus in 14 of 18 runs versus zero in 189 homophilous trials.
-...

Multi Agent Dynamics Guide Large Model Populations

aicerts.ai

Multi Agent Dynamics Guide Large Model Populations

1d ago

VideoChat3: Fully Open Efficient Video MLLM

VideoChat3 is a fully open 4B-parameter video MLLM that fixes gaps in generalization, efficiency, and reproducibility of prior open models
-...

VideoChat3: Fully Open Video MLLM for Efficient and Generalist Video Understanding

arxiv.org

VideoChat3: Fully Open Video MLLM for Efficient and Generalist Video Understanding

1d ago

GPT-4 Assistant Boosts Pakistani Judges' Caseload by 6%

GPT-4 assistants enabled Pakistani judges to handle 6% more cases with no quality loss, delivering early empirical evidence of generative AI's real-world impact in a national court system.

1d ago

Coupled Markov Processes Unify Image Understanding and Generation

SC-CMJP couples image understanding and generation in one framework by making each modality's transition rates depend on the other's confidence via...

Concurrent Image Understanding and Generation: Self-Correcting Coupled Markov Jump Processes

arxiv.org

Concurrent Image Understanding and Generation: Self-Correcting Coupled Markov Jump Processes

1d ago

DeepLoop Scales Depth via Looping Without Extra Parameters

DeepLoop scales unrolled Transformer depth by reusing a compact block stack across multiple visits, keeping parameter count fixed. It adjusts residual...

DeepLoop: Depth Scaling for Looped Transformers

arxiv.org

DeepLoop: Depth Scaling for Looped Transformers

1d ago

2d ago

AI Frontier Digest · Jul 17, 2026

Frontier Model Releases

🔥 Inkling Multimodal Model: Thinking Machines released Inkling, a 975B-parameter open-source multimodal MoE model under...

3d ago

Hallo4D Mitigates Hallucinations in 4D Generation

Hallo4D offers a model-agnostic generation-detection-correction framework that leverages LMMs to identify spatial and temporal inconsistencies in...

Hallo4D: Multi-Modal Hallucination Mitigation for Consistent Spatio-Temporal Generation

arxiv.org

Hallo4D: Multi-Modal Hallucination Mitigation for Consistent Spatio-Temporal Generation

3d ago

Embodied AI Stack Expands with New Benchmarks and Simulators

Three tools signal rapid maturation of embodied AI infrastructure:

SIS-Bench introduces self-in-space evaluation for UAVs across perception, memory,...

Self in Space: Benchmarking Self-Awareness and Spatial Cognition in UAV Embodied Intelligence

arxiv.org

Self in Space: Benchmarking Self-Awareness and Spatial Cognition in UAV Embodied Intelligence

3d ago

ShortOPD Revives Pruned LLMs for Real Generation

Structured pruning collapses free-form generation in LLMs despite solid recognition scores.
Core issue: useful outputs get demoted (not erased)...

ShortOPD: Recovering Pruned LLMs with Short-to-Long On-Policy Distillation

arxiv.org

ShortOPD: Recovering Pruned LLMs with Short-to-Long On-Policy Distillation

3d ago

Inkling Shifts Enterprise Options in Open-Weight Agentic AI

Agentic strength: Inkling leads Western open-weights on MCP Atlas (74.1%) and SWE-Bench Verified (77.6%), beating Nemotron 3 Ultra by wide...

Mira Murati Drops Her First AI Model After Leaving OpenAI—And It's Fully Open Source

decrypt.co

Mira Murati Drops Her First AI Model After Leaving OpenAI—And It's Fully Open Source

3d ago

PROBE: New Benchmark Exposes LLM Code Generation Limits

PROBE delivers a multi-dimensional evaluation of LLM code generation across functional correctness, proximity to valid solutions, and code quality.

-...

Benchmarking Code Generation in Large Language Models

3d ago·

arxiv.org

Agent Safety and Governance Standards Accelerate

Digest Calendar

Recent Posts

AI Frontier Digest · Jul 19 Daily Digest

Inference Efficiency Breakthroughs

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

GPT-5.6 Closes 30-Year Convex Optimization Gap

GPT-5.6 used a prompt to close a 30-year gap in convex optimization

KV-Cache Grafting Supercharges Frozen Small Models

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

Flawed Protocols Undermine Harness Evolution Claims

Rethinking the Evaluation of Harness Evolution for Agents

GRASP: Adaptive Granularity for Agentic RAG

GRASP: GRanularity-Aware Search Policy for Agentic RAG

MCP, A2A, ACP: Practical Guide to Agent Protocols

MCP vs A2A vs ACP: How AI Agents Actually Talk to Each ...

AI Frontier Digest · Jul 18 Daily Digest

Method Advances

DM-RL Scaling vs On-Policy Distillation Pathologies

Scaling Distribution-Matching RL to Large Language Models

Open-Weight Models Close Frontier Gap

Multi-Agent Routing and Shared State Curb Drift in Model Swarms

Multi Agent Dynamics Guide Large Model Populations

VideoChat3: Fully Open Efficient Video MLLM

VideoChat3: Fully Open Video MLLM for Efficient and Generalist Video Understanding

GPT-4 Assistant Boosts Pakistani Judges' Caseload by 6%

Coupled Markov Processes Unify Image Understanding and Generation

Concurrent Image Understanding and Generation: Self-Correcting Coupled Markov Jump Processes

DeepLoop Scales Depth via Looping Without Extra Parameters

DeepLoop: Depth Scaling for Looped Transformers

AI Frontier Digest · Jul 17, 2026

Frontier Model Releases

Hallo4D Mitigates Hallucinations in 4D Generation

Hallo4D: Multi-Modal Hallucination Mitigation for Consistent Spatio-Temporal Generation

Embodied AI Stack Expands with New Benchmarks and Simulators

Self in Space: Benchmarking Self-Awareness and Spatial Cognition in UAV Embodied Intelligence

ShortOPD Revives Pruned LLMs for Real Generation

ShortOPD: Recovering Pruned LLMs with Short-to-Long On-Policy Distillation

Inkling Shifts Enterprise Options in Open-Weight Agentic AI

Mira Murati Drops Her First AI Model After Leaving OpenAI—And It's Fully Open Source

PROBE: New Benchmark Exposes LLM Code Generation Limits

Benchmarking Code Generation in Large Language Models