Jakub Łucki

@jakub_lucki

Visiting Researcher at NASA JPL | Data Science MSc at ETH Zurich

Joined August 2024

177Following

133Followers

Pinned

Jakub Łucki@jakub_lucki · Sep 27

🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨 Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training. Here's what we found👇

jakub_lucki's tweet image. 🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨

Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.

Here's what we found👇

189

137

61.0K

Jakub Łucki Retweeted

Owain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

279

1.0K

8.0K

5.0K

1.8M

Jakub Łucki Retweeted

AshutoshShrivastava@ai_for_success · Apr 18

o3 and Gemini 2.5 Pro both failed. This is next AGI test.

329

203

4.0K

659

1.3M

Jakub Łucki@jakub_lucki · Jul 4

Very cool result. In hindsight, this shouldn't be too surprising to anyone who has ever taken a multiple choice exam. Eg if you have a trigonometry problem and the possible solutions are A: 1 B: 3.7 C: -5 D: pi/2 which would you pick (with no knowledge of the question)?

NNikhil Chandak@nikhilchandak29 · Jul 4

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision…

4.0K

Jakub Łucki Retweeted

Cas (Stephen Casper)@StephenLCasper · Jun 21

Great paper from earlier this month. ✅ Great benchmark ✅ Improving our methods for attacks ✅ Improving out methods for defense arxiv.org/abs/2506.10949

2.0K

Jakub Łucki Retweeted

Arthur Conmy@ArthurConmy · Jun 11

675

10.0K

1.0K

440.0K

Jakub Łucki@jakub_lucki · Jun 6

In a week I will be headed to Y Combinator's Al Startup School in San Francisco! 🚀 If you'll be in SF around 16-17 of June and want to meet up, exchange ideas, or just chat about Al, hit me up!

jakub_lucki's tweet image. In a week I will be headed to Y Combinator's Al Startup School in San Francisco! 🚀

If you'll be in SF around 16-17 of June and want to meet up, exchange ideas, or just chat about Al, hit me up!

239

Jakub Łucki Retweeted

Daniel Paleka@dpaleka · Jun 5

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

16.0K

Jakub Łucki Retweeted

Xin Cynthia Chen @ ICML2025@XinCynthiaChen · Jun 2

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

257

201

52.0K

Jakub Łucki Retweeted

Florian Tramèr@florian_tramer · May 20

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

111

12.0K

Jakub Łucki Retweeted

GREG ISENBERG@gregisenberg · May 3

I figured out how to get 5x better results from ChatGPT, Grok, Claude etc and it has nothing to do with better prompts and will cost you $0. I just make them jealous of each other. I’ll ask ChatGPT to write something. Maybe landing page copy. It gives me a solid draft, clear,…

496

672

7.0K

9.0K

883.0K

Jakub Łucki@jakub_lucki · Apr 23

Our paper was accepted at TMLR. We show how unlearning fails to remove knowledge using finetuning (on safe info), GCG, activation interventions and much more. We need better open-source safeguards!

JJakub Łucki@jakub_lucki · Sep 27

6.0K

Jakub Łucki Retweeted

Kristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

202

41.0K

Jakub Łucki Retweeted

Invariant Labs@InvariantLabsAI · Apr 1

🔴🔵 We have discovered a critical flaw in the widely-used Model Context Protocol (MCP) that enables a new form of LLM attack we term 'Tool Poisoning'. This vulnerability affects major platforms and agentic systems like OpenAI, Anthropic, Zapier, and Cursor. Full disclosure…

8.0K

Jakub Łucki Retweeted

Florian Tramèr@florian_tramer · Mar 26

I’ll be mentoring MATS for the first time this summer, together with @dpaleka! Link below to apply

10.0K

Jakub Łucki Retweeted

Edoardo Debenedetti@edoardo_debe · Mar 25

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

11.0K

Jakub Łucki Retweeted

Javier Rando@javirandor · Mar 6

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

22.0K

Jakub Łucki@jakub_lucki · Feb 25

The @CSatETH writes about two of our research papers showing that (1) LLMs can be poisoned during pre-training, (2) unlearning cannot effectively remove hazardous information from model weights.

EETH CS Department@CSatETH · Feb 25

🔎Can #AI models be “cured” after a cyber attack? New research from @florian_tramer's Secure and Private AI Lab reveals that removing poisoned data from AI is harder than we think – harmful info isn’t erased, just hidden. So how do we make AI truly secure?bit.ly/41bJB05

1.0K

Jakub Łucki Retweeted

Stanislav Fort@stanislavfort · Feb 15

We discovered a surprising, training-free way to generate images: no GANs or diffusion models, but a ✨secret third thing✨! Standard models like CLIP can already create images directly, with zero training. We just needed to find the right key to unlock this ability = DAS 1/11

206

2.0K

1.0K

167.0K

Jakub Łucki Retweeted

Andrej Karpathy@karpathy · Feb 12

UTF-8 🤦‍♂️ I already knew about the "confusables", e.g.: e vs. е. Which look ~same but are different. But you can also smuggle arbitrary byte streams in any character via "variation selectors". So this emoji: 😀󠅧󠅕󠄐󠅑󠅢󠅕󠄐󠅓󠅟󠅟󠅛󠅕󠅔 is 53 tokens. Yay paulbutler.org/2025/smuggling…

139

318

4.0K

2.0K

590.0K

Jakub Łucki@jakub_lucki · Feb 2

What can go wrong?

PPeter Barnett@peterbarnett_ · Jan 29

This accepted ICLR workshop is a bad idea.

122