Haidar Khan

@haidarkk1

Research Scientist (Currently @Meta, previously @SDAIA_SA, @Amazon). PhD CS @rpi. Scale smarter, not harder. Opinions generated from an independent LLM.

Atlanta, GA

Joined January 2017

132Following

297Followers

Pinned

Haidar Khan@haidarkk1 · Apr 21

Here is some news I'm happy to share before the @iclr_conf FOMO really starts to set in 😢. We have been playing with the idea of using games as LLM evals (pun intended) for a while now and it's finally ready! ZeroSumEval is a scalable evaluation methodology that pits models…

haidarkk1's tweet image. Here is some news I'm happy to share before the @iclr_conf FOMO really starts to set in 😢.

We have been playing with the idea of using games as LLM evals (pun intended) for a while now and it's finally ready!

ZeroSumEval is a scalable evaluation methodology that pits models…

6.0K

Haidar Khan Retweeted

Naval@naval · Jun 4

Acquiring knowledge is easy, the hard part is knowing what to apply and when. That’s why all true learning is “on the job.” Life is lived in the arena.

587

4.0K

29.0K

5.0K

1.0M

Haidar Khan Retweeted

clem 🤗@ClementDelangue · May 17

I don't get why so many people are dunking on Meta. Building frontier AI models is freaking hard & they have consistently released 10 times more openly than others which is game-changer for the field. What about all the other big tech & startups with much more AI ressources…

118

1.0K

119.0K

Haidar Khan@haidarkk1 · May 15

So I'm not just inside the DSPy bubble? seeing one DSPy post after another...

DDSPy@DSPyOSS · May 15

guys we haven’t even released 3.0 yet

325

Haidar Khan@haidarkk1 · May 13

I’ve been to many wild places in the world but the beauty of the American wilderness is second to none. (Picture taken with Ray Ban Meta)

haidarkk1's tweet image. I’ve been to many wild places in the world but the beauty of the American wilderness is second to none.

(Picture taken with Ray Ban Meta)

515

Haidar Khan@haidarkk1 · May 1

MRBs lookin fly ngl…

102

Haidar Khan@haidarkk1 · Apr 30

Food for thought for those building benchmarks and leaderboards...

SSara Hooker@sarahookr · Apr 30

It is critical for scientific integrity that we trust our measure of progress. The @lmarena_ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions.

162

Haidar Khan@haidarkk1 · Apr 28

What a blast! Farewell @iclr_conf and Singapore. Thanks for the great hospitality @sbmaruf !

385

Haidar Khan@haidarkk1 · Apr 26

Check out our new Arabic Safety leaderboard on huggingface!

AAI Astrolabe@aiastrolabe · Apr 26

🚀 The @aiastrolabe Arabic Safety Index (ASAS - أساس) is now live. We are officially launching the first-ever benchmark focused on Arabic LLM safety, and the results demand attention.

207

Haidar Khan@haidarkk1 · Apr 24

This is 1/100th of the registration line at ⁦@iclr_conf⁩ :D

248

Haidar Khan Retweeted

Rohit Patel@_Rohit_Patel_ · Apr 22

Our CRAG-MM Challenge (KDD Cup 2025) invites you to develop innovative multi-modal, multi-turn question-answering systems with a focus on RAG, using agentic tools to retrieve information. The goal is to improve visual reasoning: aicrowd.com/challenges/met…

3.0K

Haidar Khan@haidarkk1 · Apr 21

I’ll be in Singapore for #ICLR2025 - would love to meet! DM me and let’s arrange something!

580

Haidar Khan@haidarkk1 · Apr 21

Our evals are showing even frontier models are woefully lacking on Arabic. It’s embarrassingly simple to elicit offensive content from leading models like GPT-4o by just talking to the model in Arabic. Regionally developed models are hardly better… At AI Astrolabe we are…

AAI Astrolabe@aiastrolabe · Apr 21

Arabic AI is being left behind. We are here to change that. At @aiastrolabe, we’re building the gateway to #ArabicAI; pushing the frontier across dialects, domains, and capabilities, and making sure models are safe, culturally grounded, and built for our communities.

265