Brendan Dolan-Gavitt

@moyix

Building offsec agents: http://xbow.com Associate Prof, NYU Tandon (on leave). PGP http://keybase.io/moyix/ MESS Lab: http://messlab.moyix.net

Brooklyn, NY

Joined June 2008

6KFollowing

29KFollowers

Pinned

Brendan Dolan-Gavitt@moyix · Jun 24

Incredible to have helped build the first AI system to reach #1 in the US on @Hacker0x01 ! We found a LOT of great bugs :D

XXBOW@Xbow · Jun 24

For the first time in history, the #1 hacker in the US is an AI. (1/8)

14.0K

Pinned

Brendan Dolan-Gavitt@moyix · Jul 17

AI really is like people in some ways. When I go to ask ChatGPT something, 90% of the time I realize the answer as I'm typing the question

2.0K

Brendan Dolan-Gavitt Retweeted

XBOW@Xbow · Jul 22

⚡️XBOW found LFI where most tools would have given up. Photo download endpoint blocked all path traversal attempts. But JavaScript analysis revealed /photo/proxy?url= - vulnerable to file:// scheme access. Successfully read a password file via proxy endpoint. Technical…

125

13.0K

Brendan Dolan-Gavitt Retweeted

Demis Hassabis@demishassabis · Jul 21

Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…

199

765

6.0K

634

1.4M

Brendan Dolan-Gavitt Retweeted

lukas seidel@pr0me · Jul 21

thinking about cybersec evals for ai, I'm not a huge fan of current ML4VulnDiscovery benchmarks: agents reaching 80% acc in identifying vulnerable funcs does not reflect the real world. but I like CyberGym's approach: evaluating agents' capabilities in exploiting real vulns

3.0K

Brendan Dolan-Gavitt@moyix · Jul 21

Come and meet XBOW! Apart from the thing itself, also chat with some of the humans that are building it: @nicowaisman, @moyix, @pwntester, @niemand_sec, @djurado9, @ntrippar, @ca0s. I'd love to talk too!

XXBOW@Xbow · Jul 21

Meet the #1 AI Pentester in America at BlackHat! We're bringing XBOW to Vegas — join us at booth #3257 to see it in action. #BlackHat2025

3.0K

Brendan Dolan-Gavitt@moyix · Jul 20

I know this is going to derail it into arguing with me about how it's really super serious and we must all abide by best security practices, but I must speak my truth

moyix's tweet image. I know this is going to derail it into arguing with me about how it's really super serious and we must all abide by best security practices, but I must speak my truth

1.0K

Brendan Dolan-Gavitt@moyix · Jul 19

If this holds up, I may have to rethink my current conviction that AI for offsec won't work without super-strict validation/verification. Am I about to be taught a Bitter Lesson?

NNoam Brown@polynoamial · Jul 19

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

5.0K

Brendan Dolan-Gavitt@moyix · Jul 19

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

141

547

5.0K

2.0K

1.1M

Brendan Dolan-Gavitt Retweeted

Albert Ziegler@thewunderalbert · Jul 18

...and that's how coincidences work: just a day after the Sonnet / Gemini Alloy post was published, the eval data from #Grok4 comes in: - It beats the Sonnet / Gemini alloy (58% to 55%) - But gets even better when alloyed with Sonnet itself to a mind-blowing 67%

4.0K

Brendan Dolan-Gavitt@moyix · Jul 18

Fascinating... I wouldn't have expected advances like this to come from an automated penetration testing product company, but here we are :).

BBrendan Dolan-Gavitt@moyix · Jul 17

Albert's excellent blog post on "model alloys" – a clever technique for combining the strengths of different models without making extra queries – is live! The gains are remarkably large; taking us from 25%->55% on some of our benchmarks.

2.0K

Brendan Dolan-Gavitt@moyix · Jul 17

As the buddha once said,

3.0K

Brendan Dolan-Gavitt Retweeted

Eric W. Tramel@fujikanaeda · Jul 17

what is your long-haired, bearded, & goated ai tech stack? - rl: @willccbb - data: @code_star - security: @moyix - …?

8.0K

Brendan Dolan-Gavitt Retweeted

XBOW@Xbow · Jul 17

What if two AI models could collaborate without knowing it? Our Head of AI, Albert Ziegler developed "model alloys" - alternating between different LLMs in a single conversation. Sonnet handles some steps, Gemini others, but neither knows about the switch. Result: 55% solve…

9.0K