Hadi Khalaf

@hskhalaf

phd agent @Harvard, working on alignment, prev @msfea_aub @HarvardEcon

Cambridge, MA

Joined February 2025

37Following

32Followers

Pinned

Hadi Khalaf@hskhalaf · Jul 1

How can we improve LLMs without any additional training? 🤔 The standard playbook is using Best-of-N: generate N responses ➡️ use a reward model to score them ➡️ pick the best 🏆 More responses = better results... right? Well, not exactly. You might be reward hacking!…

682

Hadi Khalaf@hskhalaf · Jul 24

@ whoever is on the google ai studio team: please fix the chat history never being saved! i cannot access most of my gemini conversations... and this has been an issue since january 🫤

Hadi Khalaf@hskhalaf · Jul 21

I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi

SSergey Levine@svlevine · Jul 21

I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi

142

Hadi Khalaf Retweeted

Sara Hooker@sarahookr · Apr 30

It is critical for scientific integrity that we trust our measure of progress. The @lmarena_ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions.

133

708

280

239.0K

Hadi Khalaf Retweeted

Ai2@allen_ai · Apr 15

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

117

660

462

69.0K

Hadi Khalaf@hskhalaf · Apr 5

gemini is crazy at coding, insanely crazy good

Hadi Khalaf@hskhalaf · Mar 31

Does anyone like arxiv html? I immediately switch to the pdf view

Hadi Khalaf@hskhalaf · Mar 8

Yes 👍🏼

jjxmo@jxmnop · Mar 7

day in the life of an AI PhD in 2025 > wake up > new research idea (5 minutes) > kick off related work search w/ Deep Research (15 minutes) > set up 4 instances of Claude Code to start project (30 minutes) > get o1 started on proof for paper (5 minutes) > play tennis (6 hours)

202

Hadi Khalaf@hskhalaf · Feb 25

On my reading list this week: "the first theoretical result on how to identify the ideal depth for safety alignment... indicating that broader ensembles can compensate for shallower alignments"!!!! arxiv.org/abs/2502.00669

101

Hadi Khalaf@hskhalaf · Feb 24

Is it still cool to do PPO/DPO or must I do reasoning

Hadi Khalaf@hskhalaf · Feb 24

Is there an LLM out there that asks follow-up questions? 😅 Would be my go-to if it exists

Hadi Khalaf@hskhalaf · Feb 24

I used to see llama as a base model in most experiments, now qwen has taken over. Diversity in base models in experiments is much much more valuable than any hyperparam tuning or extra runs!

Hadi Khalaf Retweeted

Adam.GPT@TheRealAdamG · Feb 23

Having a great base model will always matter.

356

66.0K