Infra & efficiency innovations for scale
Key Questions
What is Microsoft Harrier 27B?
Microsoft's Harrier 27B embeddings model tops the MTEB leaderboard for agent RAG applications. It is open-sourced as an industry-leading embedding model.
What is TriAttention?
TriAttention enables efficient long reasoning using trigonometric KV compression. It improves performance for long-context tasks in LLMs.
What advancements are in Olmo3?
Olmo3 shifts to asynchronous RL setups from synchronous ones, enhancing efficiency. It supports long-context and RL innovations.
What is AutoKernel?
AutoKernel is an open-source framework applying an autonomous agent loop to GPU kernel optimization for PyTorch models. It automates fast GPU code generation.
What is HISA?
HISA provides faster sparse attention for long-context LLMs. It addresses efficiency innovations alongside IndexCache to reduce recomputation taxes.
Microsoft Harrier 27B embeddings #1 MTEB for agent RAG; TriAttention/AutoKernel/HISA/Olmo3 enable long-ctx/RL; Vera Rubin/FlashAttn-4/Ulysses/Moonwalk/Chollet TPU. Test-time optimal amid OSS agent/Delangue push.