'Friend' Hack Bypasses AI Guardrails with One Sentence
'Friend' Hack—attribute a thought to someone else using third-person distancing—tricks AI past strict guardrails, unlocking censored truths inspired by Diary of a CEO.

Created by Trill Bill
Community jailbreak examples, technical research, and incident analysis on LLM prompt injection
Explore the latest content tracked by AI Jailbreak Tracker
'Friend' Hack—attribute a thought to someone else using third-person distancing—tricks AI past strict guardrails, unlocking censored truths inspired by Diary of a CEO.
Key red flags from Anthropic's chaotic week:
New risks from prompt engineering in enterprise IT: poorly governed prompts, model manipulation, and indirect injection attacks expand the threat...
Hands-on red-teaming guide evolves prompt injection from v1 basics to v2 defenses.
RoguePilot vulnerability showcases passive prompt injection in GitHub Copilot for full repository hijacking.
Technical breakdown:
ClawHub supply-chain attack exposes AI risks:
Expert John V. shares key AI red teaming insights:
Novel jailbreak in action: Hacker bypassed Claude's guardrails with prompts, using it to scan vulnerabilities, write exploits, and automate 150GB data...
PIGuard, developed in Ning Zhang's lab at Washington University in St. Louis, was highlighted by Mozilla AI as among the best at protecting LLMs from prompt injection attacks.
Red-teaming is essential as LLMs enter production workflows like copilots and RAG.
Top open-source tools dissected:
Trend spotlight: Multi-layer architectures redefine AI agent security beyond model tweaks.
Trend alert: Large reasoning models are becoming autonomous jailbreak agents, no longer needing crafted human prompts.
Red-teaming exposes the...
Peer-reviewed MDPI analysis explores emerging security risks in LLMs via comparative study of jailbreaking techniques, evolving from vibe coding to sophisticated attacks. Advances jailbreak taxonomy understanding.
Security research uncovers infra risks outpacing model threats:
Cutting-edge defenses target perturbed and concealed jailbreaks:
Google's Agent Development Kit (ADK) builds robust guardrails against jailbreaks like roleplaying (e.g., DAN), payload splitting, obfuscation, and...