Enterprise agents: Glasswing/Mythos cyber/visual/Claude Code/MCP/pricing/outage/OpenClaw/Kaggle evals, GLM-5.1/CORAL/Gemini/Copilot/Nutanix/Agentic-MME
Key Questions
What cybersecurity capabilities does Anthropic's Glasswing offer?
Glasswing detects zero-days in OS/FFmpeg with 72% chain detection and partners for fixes. It's a response to Mythos' dangers. It redefines AI's role in cybersecurity.
What is Claude Mythos and why limited?
Claude Mythos Preview excels at identifying thousands of zero-days but is limited due to cyberattack risks. Available in private preview on Vertex AI. Anthropic prioritizes safety.
What are Claude Code features?
Claude Code supports apps, terminal use, MCP, and PDF import via Weaviate Agent Skills. It faced an outage affecting thousands. Pricing adjustments impact access.
What benchmarks does GLM-5.1 lead?
GLM-5.1 scores 58.4% on SWE-Bench for long-horizon agentic coding. It's optimized for 600+ iterations. Tops open source leaderboards.
What is Kaggle's new initiative?
Kaggle launched Benchmarks Resource Grants for AI evals, providing compute and SDK. This supports rigorous evaluations. It signals focus on agent benchmarks.
What agentic platforms were announced?
Nutanix offers agentic hybrid multicloud; QoderWork is a desktop AI agent for real work. Karpathy's CORAL enables multi-agent discovery. Gemini MAIA and Copilot advance enterprise use.
What is the ROI for AI infrastructure per I&O?
Only 28% of AI projects fully pay off, per surveys. This underscores TCO and uptime challenges. Security and evals remain key concerns.
What are AgentHazard and Agentic-MME?
AgentHazard scores 73%; Agentic-MME evaluates multi-modal agents. They highlight enterprise agent progress. Ongoing focus on security and benchmarks.
Glasswing cyber zero-days (OS/FFmpeg,72% chains/partners)/visual reasoning; Claude Code apps/terminal/MCP/outage/pricing/OpenClaw; Kaggle benchmarks grants (evals compute/SDK); GLM-5.1 SWE-Bench 58.4%; Nutanix Agentic hybrid; QoderWork/Weaviate/Karpathy CORAL/Qwen SAM3/Gemini MAIA/Copilot/AgentHazard 73%/Agentic-MME/I&O 28% ROI. Ongoing TCO/security/evals/uptime/GLM/Kaggle.