Kanishk Gandhi

@gandhikanishk

Phd CS@Stanford @StanfordNLP, Computation and Cognition; w/ Noah Goodman | Prev: @LakeBrenden @NYUDataScience, @IITKanpur, @Path_AI

Stanford, CA

Joined February 2012

905Following

2KFollowers

Pinned

Kanishk Gandhi@gandhikanishk · Mar 4

New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵1/13

183

946

973

182.0K

Pinned

Kanishk Gandhi Retweeted

Andrew Lampinen@AndrewLampinen · May 2

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

151

764

690

97.0K

Kanishk Gandhi Retweeted

CogInterp Workshop @ NeurIPS 2025@CogInterp · Jul 11

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/

11.0K

Kanishk Gandhi Retweeted

Daniel Wurgaft@danielwurgaft · Jun 26

Can we record and study human chains of thought? The think-aloud method, where participants voice their thoughts as they solve a task, offers a way! In our #CogSci2025 paper co-led with Ben Prystawski, we introduce a method to automate analysis of human reasoning traces! (1/8)🧵

14.0K

Kanishk Gandhi@gandhikanishk · Jun 19

I wish people would stop sharing this article without evaluating it. One might not like AI but that doesn't make a paper critical of it of value because of that. That's not how science works.

AAlex Vacca@itsalexvacca · Jun 18

BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn't making us more productive. It's making us cognitively bankrupt. Here's what 4 months of data revealed: (hint: we've been measuring productivity all wrong)

140

20.0K

Kanishk Gandhi@gandhikanishk · Jun 19

new decade, same verse

RRohan Paul@rohanpaul_ai · Jun 17

It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels" This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔 Relying only on EEG,…

1.0K

8.0K

3.0K

622.0K

Kanishk Gandhi@gandhikanishk · Jun 9

I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by @scaling01 in one place and sprinkle some of my own thoughts in as well. The paper is rather critical of reasoning LLMs (LRMs): x.com/MFarajtabar/st…

MMehrdad Farajtabar@MFarajtabar · Jun 5

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,…

229

1.0K

2.0K

429.0K

Kanishk Gandhi Retweeted

Omar Shaikh@oshaikh13 · Jun 9

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

336

197

57.0K

Kanishk Gandhi@gandhikanishk · May 29

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long…

SShashwat Goel@ShashwatGoel7 · May 29

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

600

437

82.0K

Kanishk Gandhi Retweeted

Shashwat Goel@ShashwatGoel7 · May 29

126

880

535

316.0K

Kanishk Gandhi@gandhikanishk · May 13

Meanwhile R1 coined the term "cold start" when they actually meant "warm start"

NNathan Lambert@natolambert · May 13

Is very classic that most people don't know the Tulu 3 paper coined the term RLVR

9.0K

Kanishk Gandhi@gandhikanishk · May 9

The only reviewer I care about

AArthur@arthurcolle · May 9

@gandhikanishk solid paper

1.0K

Kanishk Gandhi@gandhikanishk · May 6

how i picked the original values:

YYifei Hu@hu_yifei · May 5

I've been playing with GRPO on a side project. One question want to share/discuss with y'all: how do you decide the reward for each function when you have many reward functions and they measure different aspect of the answer? Grid search? Vibe-based? (code snippet from @willccbb…

187

18.0K

Kanishk Gandhi Retweeted

Kunhao Zheng@KunhaoZ · Apr 27

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

137

824

727

128.0K

Kanishk Gandhi@gandhikanishk · Apr 23

Meta-Search

JJiayi Pan@jiayi_pirate · Apr 23

We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵 arxiv.org/abs/2504.15466

4.0K