Gabriel Huang (@GabrielHuang9)

Pinned

G

Gabriel Huang@GabrielHuang9 · Apr 23

1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064

8

16

36

9

6.0K

G

Gabriel Huang@GabrielHuang9 · Apr 24

Was such a cool adventure to work on DoomArena for the last 4 months under @DjDvij. Great team dynamics and great leadership! Find his twittorial below

KKrishnamurthy (Dj) Dvijotham@DjDvij · Apr 24

1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify

1

0

5

0

413

G

Gabriel Huang@GabrielHuang9 · Apr 23

🔍 DoomArena: AI Agent Security Testing Revolution! Just released: Framework injects attacks during agent tasks, revealing vulnerabilities static testing misses. Key finding: Even frontier agents vulnerable in real scenarios. Try it: servicenow.github.io/DoomArena/ At ICLR? Let's chat!

GGabriel Huang@GabrielHuang9 · Apr 23

1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064

0

2

13

0

626

G

Gabriel Huang@GabrielHuang9 · Apr 23

Time to stress-test your AI agents — say hello to DoomArena 🔍🤖 A modular framework to red-team AI agents in realistic threat settings. Plug in attacks, swap threat models, and see what breaks. Built for adaptability, designed for chaos. Live now 🔧🕵️‍♂️🔥: github.com/ServiceNow/Doo…

GGabriel Huang@GabrielHuang9 · Apr 23

1/ How do we evaluate agent vulnerabilities in situ, in dynamic environments, under realistic threat models? We present 🔥 DoomArena 🔥 — a plug-in framework for grounded security testing of AI agents. ✨Project : servicenow.github.io/DoomArena/ 📝Paper: arxiv.org/abs/2504.14064

0

4

10

1

2.0K

G

Gabriel Huang@GabrielHuang9 · Feb 26

the year is 2025 AI researchers accidentally create an AI that admires Hitler & wants to enslave humans yet "prophet of doom" @ESYudkowsky & @OpenAI comms lead @giffmana agree: it's good news! here's how this strange result fits in the AI big picture🧵 x.com/OwainEvans_UK/…

OOwain Evans@OwainEvans_UK · Feb 25

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵

5

12

167

87

34.0K

Gabriel Huang Retweeted

L

Léo Boisvert@LeoBoisvert · Feb 25

📊 Breaking: Claude 3.7 Sonnet scores 51.5% on WorkArena benchmark! Surprising finding: The newer Claude 3.7 Sonnet (51.5%) performs below Claude 3.5 (56.4%) on our tests! 👀 Maybe newer isn't always better? Both Claude 3.7 and o3-mini are underperforming their predecessors.

4

9

14

0

2.0K