AI Global Tracker

Safety, governance & geopolitics — regulatory/procurement peak

Safety, governance & geopolitics — regulatory/procurement peak

Key Questions

What is AgentHazard?

AgentHazard is a benchmark for evaluating harmful behavior in computer-use agents. It assesses risks such as those identified in ClawKeeper and OpenClaw vulnerabilities. The benchmark highlights peak eval and compliance risks with new hazard evaluations.

Why did Meta halt work with Mercor?

Meta suspended its work with Mercor following a cyber breach that raised concerns over AI supply-chain security. This incident amplified alarms about vulnerabilities in AI procurement. It underscores the climaxing status of safety and governance issues.

What privacy concerns exist with phone-use agents?

Phone-use agents may not respect user privacy, as explored in a research paper shared by @_akhaliq. These agents pose risks in accessing sensitive data on devices. This ties into broader child safety and phone privacy discussions.

How important is child safety in ML tools?

Child safety is a critical area needing effective ML tools, as noted by @mmitchell_ai. It is where ML must perform well despite known challenges. This emphasizes the need for robust safety measures in AI development.

What are the risks from Claude leaks?

Claude leaks, including version 2.1.88 and source code, have exposed vulnerabilities like functional emotion concepts and desperati patterns. @minchoi highlighted concerns that 'we are not ready' for such developments. @Miles_Brundage reposted analysis of the leaked Claude Code.

What is Cyara's role in AI agent testing?

Cyara unveiled Agentic AI Testing to strengthen enterprise trust in autonomous agents. It addresses challenges in agent reliability post-leaks like Mercor. This tooling helps mitigate eval and compliance risks.

What funding did Tenex receive?

Google partner Tenex raised $250 million for AI security services, valuing it over $1 billion. This supports heightened focus on security amid supply-chain alarms. It aligns with peaks in regulatory and procurement concerns.

What risks emerge in multi-agent LLM networks?

Emergent risks in multi-agent LLM networks include those benchmarked by YC-Bench and GPT-5.2, as well as ZEH risks. Discussions cover governance like GenAI OSS and Agentic AI approaches. Videos and papers highlight these in the context of safety peaks.

Meta halts Mercor post-breach amplifying supply-chain alarms amid Claude leaks (v2.1.88); ClawKeeper/OpenClaw vulns; AgentHazard benchmark for computer-use agent harms; phone privacy/child safety; GenAI OSS; multi-agent/ZEH risks (YC-Bench/GPT-5.2); Cyara/Strata/Tenex/Coder. Eval/compliance risks peak with new hazard evals.

Sources (17)
Updated Apr 6, 2026