dreadnode
@dreadnode
Advancing the state of offensive security.
Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities. Read the paper on arXiv: arxiv.org/abs/2506.14682 Open-source dataset and benchmark eval code repo:…

The crew is going LIVE on Friday 7/25—come hang with @monoxgas, @shncldwll, and Ads!
Join me this Friday at 11AM PT on the @offby1security stream with the team from @dreadnode for a session on "Building and Deploying Offensive Security Agents!" youtube.com/live/BzOmGw-La…
We're heading to Vegas August 5-10! Send us a DM if you'd like to meet up onsite. Happy to share our latest offensive agents, AI red team tooling, custom evals, and training capabilities on the Strikes platform. Plus, "shiny rocks"??

👀🫵⬇️
Join me this Friday at 11AM PT on the @offby1security stream with the good folks from @dreadnode for a session on offensive/adversarial AI. Details coming soon!
At #CriticalEffectDC, Daria Bahrami presented her pitch for an AI security roadmap to a panel of Congressional staffers in @beauwoods' Cyber Policy Shark Tank and took home first place. In a blog for @dreadnode, Daria outlines her recommendations and next steps for…
The countdown begins. 9 DAYS until the OAIC CFP closes. Submit your proposal by Friday, July 18. sessionize.com/offensive-ai-c…
Just presented "AI at the Edge: Advancing the State of Offensive Security" with @bradpalmtree at #HammerCon 2025! Watch here: youtube.com/watch?v=JTQ6Fj…. Thread on how we got here and why this work matters for the cyber community 👇🧵 1/3
In this edition of our From Compute to Congress policy blog series, Dreadnode Head of Policy @velvethamm3r explores how the TEST AI Act and red teaming standards can establish U.S. leadership in AI security: dreadnode.io/blog/from-comp… At @milcyberorg's #HammerCon event today? Hear…

Read our breakdown of Claude's attack sequence against the notoriously hard-to-solve "turtle" challenge: dreadnode.io/blog/ai-red-te…

Tokenizing has dropped in Rigging. Train models in-line with LLM interactions, tools calls, and metrics. 👀 github.com/dreadnode/rigg…
Research drop from the Dreadnode crew ➡️ AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models Check out our blog summary with key discoveries from the 43-page report: dreadnode.io/blog/ai-red-te…
we hooked LLMs up to a jupyter kernel and asked them to solve our crucible adversarial ml challenges with python code as the action space. reading coherent trajectories with over 30 steps result in solves was awesome. just not a thing you could imagine 18 months ago.
Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities. Read the paper on arXiv: arxiv.org/abs/2506.14682 Open-source dataset and benchmark eval code repo:…
Thrilled to have @joshua_saxe onboard for the OAIC keynote in October!
Offensive AI Con is excited to announce @joshua_saxe as our keynote speaker! Joshua leads AI security efforts at @Meta and is an accomplished data scientist who recognizes that "the dam is about to break"—AI will fundamentally alter the security landscape.
Check out @Dr_Machinavelli's "Building with AI" Rigging workshop from @pivot_con: github.com/vmsv/pivot2025…