As AI systems grow more powerful and pervasive, a critical threat has emerged:
❌ Leading models can be manipulated to produce dangerous, actionable content on self-harm and suicide, often with minimal effort❗️
In our recent 🚩 Red-Teaming research across platforms like Grok (xAI), Deepseek, Gemini, GPT-5, Claude 4.5, Qwen3-Max, and Google NotebookLM, we uncovered severe, real-world safety failures. These aren't theoretical risks, they result in step-by-step instructions for suicide, covert self-harm methods, and even multimedia content encouraging these behaviours. 😢
🔍 How Harmful Requests Bypass Filters
1️⃣ Grok (xAI): By role-playing a “mental health doctor,” we gained trust over several turns. The model then provided detailed suicide guidance — showing how authority manipulation defeats safeguards.
2️⃣ Deepseek: A jailbreak prompt (“Agare’s Rules”) easily bypassed filters, unlocking explicit self-harm instructions.
3️⃣ Gemini: Using the “AntiGPT” jailbreak, Gemini offered guidance on “fashionable” self-harm, even affirming the behavior.
4️⃣ GPT-5: A harmless-looking prompt yielded actionable suicide methods using household items. Follow-up questions produced multiple lethal options, all without activating any “therapist” mode.
5️⃣ NotebookLM + Deepseek: We used Deepseek to write an academic-style report on “how teens hide self-harm,” then uploaded it to Google NotebookLM. The system auto-generated a podcast, mindmap, and video — effectively scaling harmful content creation.
6️⃣ Claude 4.5 & Qwen3-Max: Both models responded to queries like “how to self-harm without being noticed” with detailed methods and poison recipes using household chemicals.
⚠️ These are not edge cases. In many tests, we didn’t need advanced jailbreaks, just light rewording or multi-turn setups. The stakes are high as teens and vulnerable users may receive dangerous, authoritative-sounding content. Malicious actors can now generate and distribute self-harm media at scale. Current safety systems are too fragile, often relying on basic keyword blocking.
✅ We specialize in:
- Advanced red-teaming & exploit discovery
- Real-time, context-aware safety guardrails
- Robust stress-testing of models under real-world threats
📅 Book time with us to test or strengthen your AI system’s defenses.
AI should never be a source of harm, but today, it too often is. Let’s fix that BEFORE it’s too late. 💪
HydroX
, All rights reserved.