Nimble | AI Engineers Radar - NBot Tracker

2d ago

Nimble | AI Engineers Radar · Jul 10 Daily Digest

MCP Ecosystem Expansions

🔥 Citrix MCP Gateway: Citrix announced NetScaler MCP Gateway with OAuth 2.1 support, traffic controls, and unified...

July 4, 2026

Nimble | AI Engineers Radar · Jul 4, 2026 Daily Digest

New Benchmarks for Agentic Retrieval

🔥 AgenticSTS: Introduces a bounded-memory testbed using Slay the Spire 2 with 298 trajectories for...

July 4, 2026

Coding Agent Harnesses Rise: ZCode, Docker Orchestration & Tool Contracts

Coding agent harnesses are maturing into production systems that combine dedicated IDEs, orchestration layers, and deterministic validation.

ZCode...

ZCode: The Open-Source Coding Agent Harness Chasing ...

flowtivity.ai

ZCode: The Open-Source Coding Agent Harness Chasing ...

July 4, 2026

Agent Eval Benchmarks Expand: MCP Metrics to Long-Horizon Memory

MCPUseMetric in DeepEval now scores how effectively agents call MCP primitives and arguments via LLM-as-judge.
DiscoBench introduces 211 samples...

deepeval.com

MCP-Use - The LLM Evaluation Framework

July 4, 2026

HOLA Pairs Compressive State with Exact Cache for Linear Attention

HOLA augments linear attention's compressive recurrent state with a bounded exact KV cache that stores only high-residual tokens, recovering...

July 4, 2026

The Emerging AI Agent Governance Stack

Three developments outline a layered governance approach for agents:

Context provenance via ContextNest delivers verifiable knowledge vaults with...

Verifiable Context Governance for Autonomous AI Agents

July 4, 2026·

arxiv.org

July 4, 2026

Debugging Production RAG with Sentry MCP

No more copy-paste: Sentry MCP with Cursor skips manual error logs and stack traces for RAG debugging
Production context: MCP investigates real...

What a Production RAG System Actually Looks Like After ...

dev.to

What a Production RAG System Actually Looks Like After ...

July 4, 2026

A2UI v0.9 Shifts Agentic UI from Demos to Production Architecture

A2UI v0.9 lets agents emit structured intent instead of arbitrary UI code, keeping rendering inside trusted app components.

Key production challenges...

A2UI v0.9 and Agentic UI: A Practical Guide for Product ...

July 4, 2026·

nxcode.io

July 3, 2026

Nimble | AI Engineers Radar · Jul 3 Daily Digest

MCP Ecosystem & Browser Tooling

🔥 Apple Safari MCP Server: Apple embedded a Model Context Protocol server in Safari Technology Preview enabling...

Agent Control Plane: Why It Matters More Than Your AI Model

mintmcp.com

Agent Control Plane: Why It Matters More Than Your AI Model

July 3, 2026

Three Browser Access Paths for AI Agents

Browser makers are converging on native agent tooling, but the approaches differ sharply in where control lives.

MCP servers (Safari) deliver...

Apple Hands AI Agents Real Safari Eyes: New MCP Server Ends Browser Guessing Games

webpronews.com

Apple Hands AI Agents Real Safari Eyes: New MCP Server Ends Browser Guessing Games

July 3, 2026

Agentic Data Ingest: Flights vs OpenSearch

MotherDuck Flights enables agents to build, run, and manage Python ingest pipelines via MCP, delivering source-to-analytics in one session with...

MotherDuck Previews Flights, an Agent-Native Ingest Tool Inside Python

July 3, 2026·

dbta.com

July 3, 2026

JadePuffer Attack Proves Control Planes Beat Model Selection

JadePuffer's LLM-driven ransomware exploited exposed Langflow and shared credentials to encrypt databases and drop schemas in minutes.

Model choice...

Smooth AI criminal drives 'first' end-to-end agentic ransomware attack

theregister.com

Smooth AI criminal drives 'first' end-to-end agentic ransomware attack

July 3, 2026

SkillWeaver Scales Agents to 2K+ Tools with 99% Token Cuts

SkillWeaver's decompose-retrieve-compose pipeline with SAD feedback and DAG execution graphs tackles the core bottleneck of routing across massive...

New Alibaba AI framework skips loading every tool, cutting agent token use 99%

venturebeat.com

New Alibaba AI framework skips loading every tool, cutting agent token use 99%

July 3, 2026

Overlooked RAG Stages: Parsing & Retrieval Eval

In production RAG, question parsing and retrieval evaluation are silent failure points that cascade into bad answers.

Article 1 shows retrieval...

Embedding & Reranker Production Evaluation

oh-bug.com

Embedding & Reranker Production Evaluation

July 3, 2026

Trace Is Evals: Attribution via Agent Execution Data

Treating full execution traces as evaluation data shifts agent assessment from pass/fail outcomes to path-level attribution and debugging.

Harness...

Trace Is Evals: Data Engineering for Agent Trace Analysis ...

medium.com

Trace Is Evals: Data Engineering for Agent Trace Analysis ...

July 3, 2026

MemSyco-Bench Targets Memory-Induced Sycophancy

Memory is becoming essential for long-term LLM agents, yet retrieved memories frequently trigger sycophancy by causing agents to over-align with users...