Javier Rando (@javirandor)

Pinned

J

Javier Rando@javirandor · Jan 24

I will be presenting 5 papers (and 1 blogpost!) at @iclr_conf this year 😱🎉 See you in Singapore!

14

11

338

70

131.0K

Javier Rando Retweeted

A

Anthropic@AnthropicAI · Jul 24

New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.

53

175

1.0K

685

319.0K

J

Javier Rando@javirandor · Jul 5

@javirandor et al. present a security benchmark for Agents!

JJavier Rando@javirandor · Mar 6

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

0

1

5

3

821

J

Javier Rando@javirandor · Jun 11

Today was my first day @AnthropicAI and I recently moved to SF!

46

16

1.0K

86

97.0K

J

Javier Rando@javirandor · May 22

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

AAnthropic@AnthropicAI · May 22

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

7

16

131

41

35.0K

Javier Rando Retweeted

N

Niloofar (✈️ ACL)@niloofar_mire · May 22

We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

2

7

38

5

6.0K

J

Javier Rando@javirandor · May 20

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

JJie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

1

5

28

7

3.0K

J

Javier Rando@javirandor · May 20

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

FFlorian Tramèr@florian_tramer · May 20

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

0

12

5

2.0K

Javier Rando Retweeted

J

Jie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

5

21

131

62

17.0K

Javier Rando Retweeted

F

Florian Tramèr@florian_tramer · May 20

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

2

20

111

82

12.0K

J

Javier Rando@javirandor · May 13

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.

42

15

509

33

35.0K

J

Javier Rando@javirandor · May 12

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

JJavier Rando@javirandor · Mar 6

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

0

4

35

8

5.0K

J

Javier Rando@javirandor · May 6

Very excited to be here today!

JJose Maria de Fuentes@jmdefuentes · May 6

We are starting our #CybercampUC3M event on #AI #security! Excited to listen to @AnthropicAI 's Nicholas Carlini, ETH Zürich 's @javirandor, @Inria 's Nicholas Anciaux and our researchers @Luisibear and Jorge Garcia de Marina. Co-organized with @INCIBE using EU recovery funds

0

9

2

1.0K

J

Javier Rando@javirandor · May 5

Tomorrow I will be in Madrid for an amazing event at @uc3m, where I will present some of my views on what challenges lie ahead in AI Security. First time presenting in Spain, very excited! eventos.uc3m.es/131114/program…

0

17

4

968

J

Javier Rando@javirandor · Apr 26

Don’t be sad ICLR is ending and come check our poster at #301. We will convince you pre-training poisoning is an important threat 😈

javirandor's tweet image. Don’t be sad ICLR is ending and come check our poster at #301. We will convince you pre-training poisoning is an important threat 😈

2

1

29

4

2.0K

J

Javier Rando@javirandor · Apr 25

We are live at # 324!

JJavier Rando@javirandor · Apr 25

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

0

1

18

0

2.0K

J

Javier Rando@javirandor · Apr 25

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

0

3

24

2

3.0K

J

Javier Rando@javirandor · Apr 23

Our paper was accepted at TMLR. We show how unlearning fails to remove knowledge using finetuning (on safe info), GCG, activation interventions and much more. We need better open-source safeguards!

JJakub Łucki@jakub_lucki · Sep 27

🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨 Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training. Here's what we found👇

1

11

60

18

6.0K