Gabriel Huang
@GabrielHuang9
Research Scientist @ServiceNow Research Agentic AI, LLMs, Security Testing, Red-Teaming, RLHF, Web Agents, Computer-Use Agents
1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064
Was such a cool adventure to work on DoomArena for the last 4 months under @DjDvij. Great team dynamics and great leadership! Find his twittorial below
1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify
🔍 DoomArena: AI Agent Security Testing Revolution! Just released: Framework injects attacks during agent tasks, revealing vulnerabilities static testing misses. Key finding: Even frontier agents vulnerable in real scenarios. Try it: servicenow.github.io/DoomArena/ At ICLR? Let's chat!
1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064
Time to stress-test your AI agents — say hello to DoomArena 🔍🤖 A modular framework to red-team AI agents in realistic threat settings. Plug in attacks, swap threat models, and see what breaks. Built for adaptability, designed for chaos. Live now 🔧🕵️♂️🔥: github.com/ServiceNow/Doo…
1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064
the year is 2025 AI researchers accidentally create an AI that admires Hitler & wants to enslave humans yet "prophet of doom" @ESYudkowsky & @OpenAI comms lead @giffmana agree: it's good news! here's how this strange result fits in the AI big picture🧵 x.com/OwainEvans_UK/…
Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵
📊 Breaking: Claude 3.7 Sonnet scores 51.5% on WorkArena benchmark! Surprising finding: The newer Claude 3.7 Sonnet (51.5%) performs below Claude 3.5 (56.4%) on our tests! 👀 Maybe newer isn't always better? Both Claude 3.7 and o3-mini are underperforming their predecessors.