Eric Zelikman

@ericzelikman

lgtm-ing @xAI // was phd-ing @stanford

Joined April 2010

2KFollowing

20KFollowers

Pinned

Eric Zelikman@ericzelikman · Jan 15

stare long enough and any optimization problem starts looking like a computer kernel

176

32.0K

Pinned

Eric Zelikman@ericzelikman · May 2

Check out our new work: Generalization from context often outperforms generalization from finetuning. And you might get the best of both worlds by spending extra compute at train-time.

AAndrew Lampinen@AndrewLampinen · May 2

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

210

25.0K

Eric Zelikman Retweeted

Kaiyu Yang@KaiyuYang4 · Jul 23

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

216

23.0K

Eric Zelikman@ericzelikman · Jul 14

CollabLLM won #ICML2025 ✨Outstanding Paper Award along with 6 other works! icml.cc/virtual/2025/a… 🫂 Absolutey honored and grateful for coauthors @MSFTResearch @StanfordAILab and friends who made this happen! 🗣️ Welcome people to our presentations about CollabLLM tomorrow…

SShirley Wu@ShirleyYXWu · Jun 16

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.…

197

26.0K

Eric Zelikman@ericzelikman · Jul 10

building reasoning agents w/ @YuchenHe07 @qhwang3 was so fun, and the next paradigm will be even cooler -- agents will solve far harder problems far faster

YYuchen He@YuchenHe07 · Jul 10

From the 1st RL training using tools on a mini reasoning model at 16% HLE till now building the smartest agent w/ @qhwang3 @ericzelikman , more fun and breakthroughs to go! 🤖

287

22.0K

Eric Zelikman@ericzelikman · Jul 10

142

3.0K

184

171.0K

Eric Zelikman@ericzelikman · Jun 30

It turns out that a lot of the most interesting behavior of LLMs can be explained without knowing anything about architecture or learning algorithms. Here we predict the rise (and fall) of in-context learning using hierarchical Bayesian methods.

EEkdeep Singh@EkdeepL · Jun 28

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

109

18.0K

Eric Zelikman@ericzelikman · Jun 17

fun note: @HeinrichKuttler once described my env config as "the final boss of python venv issues" -- has been mostly issue free for a few months now, thanks mostly to uv 🤞

hheiner@HeinrichKuttler · Jun 17

We've been using uv a few months now and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved.

8.0K

Eric Zelikman Retweeted

John Yang@jyangballin · May 7

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

133

653

379

97.0K

Eric Zelikman@ericzelikman · May 8

NaN sample efficiency x.com/AndrewZ4573249…

EEric Zelikman@ericzelikman · May 2

seems like a big theme lately (e.g. also "RL for Reasoning w/ One Training Example") is that approaches don't get nearly enough bang for each training point's buck - cool!

11.0K

Eric Zelikman@ericzelikman · May 2

seems like a big theme lately (e.g. also "RL for Reasoning w/ One Training Example") is that approaches don't get nearly enough bang for each training point's buck - cool!

XXindi Wu@cindy_x_wu · May 2

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10

30.0K

Eric Zelikman@ericzelikman · Apr 28

you never read the same codebase twice

102

44.0K

Eric Zelikman@ericzelikman · Apr 26

cool pipeline for analyzing lots of screenshot data 🖼️ we need good tools to understand how we interact w/ complex algos

NNick Haber@nickhaber · Apr 26

New paper up on ArXiv, with lead author Merve Cerit presenting it at #CHI2025: the Media Content Atlas (MCA): an open-source, AI-powered pipeline for inductive inquiry into what people actually see and do on their phones.

4.0K

Eric Zelikman@ericzelikman · Apr 18

tiny oversight, think you missed a model. happy to help out!

PPierre Bongrand@bongrandp · Apr 18

For the first time, Google is responding to OpenAI's announcement in < 24 hours The WAR is officially ON, and Google wants the LLM market Google is now dominating +90% of the price share

520

100.0K

Eric Zelikman@ericzelikman · Apr 14

Douglas Adams was right about everything having to do with AI.

EEthan Mollick@emollick · Sep 14

It is fitting that out of all the great science fiction authors that imagined AI, Douglas Adams continues to be the most fundamentally correct: ✅ Machines that work best when emotionally manipulated ✅Machines that guilt you ✅Very long “thinking” times for very hard questions

499

208

68.0K

Eric Zelikman@ericzelikman · Apr 15

i prefer to have axis labels actually, just figured someone needed to hear that

NNathan Lambert@natolambert · Apr 14

i prefer to have axis labels actually, just figured someone needed to hear that

292

56.0K

Eric Zelikman@ericzelikman · Apr 3

wattage is power

416

50.0K