Danny Halawi

@dannyhalawi15

AI Research

San Francisco

Joined March 2023

1KFollowing

1KFollowers

Danny Halawi Retweeted

I believe in AGI, but also believe that for most use cases, model quality won't be the bottleneck. Lots of folks will have great models. Integrations will be what distinguishes the utility.

3.0K

Danny Halawi@dannyhalawi15 · Oct 1

Lots of competition to develop LLMs that beat top human forecasters—& lots of temptations to make exaggerated claims. So a new Karger et al paper presents ForecastBench: a level-playing-field system designed to track human & LLM accuracy on automatically generated & regularly…

FForecasting Research Institute@Research_FRI · Oct 1

Today, we're excited to announce ForecastBench: a new benchmark for evaluating AI and human forecasting capabilities. Our research indicates that AI remains worse at forecasting than expert forecasters. 🧵 Arxiv: arxiv.org/abs/2409.19839 Website: forecastbench.org

35.0K

Danny Halawi@dannyhalawi15 · Sep 9

Love seeing further work on automated AI forecasting! The author's assume a knowledge cut off of October 2023, but I prompted gpt-4o (as I saw in the github) for events after that date and it knew about them. I plan to reproduce the results in this writeup on a new set of…

DDan Hendrycks@DanHendrycks · Sep 9

We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: forecast.safe.ai blog: safe.ai/blog/forecasti…

2.0K

Danny Halawi Retweeted

Lisa Dunlap@lisabdunlap · Sep 7

Turns out you can just book a meeting room and announce an "invited talk" about whatever you want. Here is my talk and taste test of all the goldfish cracker flavors, with a goldfish arena so we could determine the best fish. God I love the PhD.

119

18.0K

Danny Halawi Retweeted

Stanislav Fort@stanislavfort · Aug 31

I have written up my argument for understanding adversarial attacks in computer vision as a baby version of general AI alignment. I think that the *shape* of the problem is very similar & that we *have* to be able to solve it before tackling the A(G)I case. Link in the reply.

199

147

19.0K

Danny Halawi Retweeted

Neel Nanda@NeelNanda5 · Aug 31

I'm a long time fan of @3blue1brown. It was really awesome to see my and @SenR's work on how LLMs store facts discussed in his new video! It's a gorgeously animated explainer of transformer MLP layers, and how facts may be stored in them, go check it out! youtube.com/watch?v=9-Jl0d…

765

414

46.0K

Danny Halawi@dannyhalawi15 · Jul 23, 2024

At ICML, presenting on this work today (w/ @aweisawei). Reach out if you wanna chat or hang out~

DDanny Halawi@dannyhalawi15 · Jul 1, 2024

New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.

4.0K

Danny Halawi Retweeted

Aman Arora@amaarora · Jul 10, 2024

I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?

417

2.0K

261

355.0K

Danny Halawi@dannyhalawi15 · Jul 9, 2024

One of the most important and well-executed papers I've read in months. They explored ~all attacks+defenses I was most keen on seeing tried, for getting robust finetuning APIs. I'm not sure if it's possible to make finetuning APIs robust, would be a big deal if it were possible

DDanny Halawi@dannyhalawi15 · Jul 1, 2024

12.0K

Danny Halawi Retweeted

Axel Darmouni@ADarmouni · Jul 3, 2024

Smart finetuning to break safety defenses 🧵📖 Read of the day, day 97: Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation, by @dannyhalawi15, @aweisawei et al from @UCBerkeley arxiv.org/pdf/2406.20053 The authors of this paper investigate how to use…

1.0K

Danny Halawi Retweeted

Jan Leike@janleike · Jul 3, 2024

Interested in working at Anthropic? We're hosting a happy hour at ICML on July 23. Register here: lu.ma/c751eomf

470

134

90.0K

Danny Halawi Retweeted

Jerry Wei@JerryWeiAI · Jul 2, 2024

One thing that I've come to deeply appreciate at Anthropic is how useful quick iteration times can be. In the current era of AI, there are so many promising ideas to try and not enough time/compute to thoroughly explore them all. At the same time, we don't want to miss out on…

189

109

36.0K