Tuhin Chakrabarty

@TuhinChakr

Assistant Prof @sbucompsc @stonybrooku Researcher → @SFResearch Interests : Human Centered AI / Future of Work / AI & Creativity Formerly @ColumbiaCompSci

Manhattan, NY

Joined October 2018

604Following

3KFollowers

Pinned

Tuhin Chakrabarty@TuhinChakr · Apr 21

Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts.

TuhinChakr's tweet image. Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts.

217

121

83.0K

Tuhin Chakrabarty@TuhinChakr · 2 h

Lots of doubts about these findings but the most egregious bit is Historians have higher exposure to AI than Customer Service Representatives ? Nonsense !!

AAllie K. Miller@alliekmiller · Jul 23

Microsoft just released their list of the 40 jobs most AI-applicable and the 40 jobs least AI-applicable. This list may have overlap with replaceability and job transformation. (Phlebotomists, you've got nothing to worry about.)

289

Tuhin Chakrabarty Retweeted

Dan Roy@roydanroy · 12 h

LLM generated reviews.

7.0K

Tuhin Chakrabarty Retweeted

Dimitris Papailiopoulos@DimitrisPapail · 19 h

Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…

309

31.0K

Tuhin Chakrabarty Retweeted

Nathan Lambert@natolambert · Jul 23

Which kind of non profit are you, OpenAI or Ai2?

164

8.0K

Tuhin Chakrabarty@TuhinChakr · Jul 22

Anthropic has been a series of ideological decisions later defeated by business realities

KKylie Robison@kyliebytes · Jul 22

SCOOP: Leaked memo from Anthropic CEO Dario Amodei outlines the startup's plans to seek investment from the United Arab Emirates and Qatar. “Unfortunately, I think ‘no bad person should ever benefit from our success’ is a pretty difficult principle to run a business on.”

2.0K

225

168.0K

Tuhin Chakrabarty@TuhinChakr · Jul 23

Happy to present OLMoTrace at #ACL2025NLP next week!! 🤗 If you stop by the demo session on Tuesday, July 29, 10:30am-12pm, @yanaiela and @sewon__min will be sharing how we use OLMoTrace to make LLMs more transparent. Unfortunately I'm unable to attend in-person due to visa 🥹

JJiacheng Liu@liujc1998 · Apr 9

Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data. We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨

3.0K

Tuhin Chakrabarty@TuhinChakr · Jul 22

Go work with Abhilasha! She is an amazing researcher and person. ☺️

AAbhilasha Ravichander@lasha_nlp · Jul 22

Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html

2.0K

Tuhin Chakrabarty@TuhinChakr · Jul 22

ChatGPT Agent is a huge step up on BearCubs, esp on multimodal/interactive tasks (e.g., playing web games)! It gets 65.8% accuracy vs Deep Research's 36% and Operator's 23%. Humans are at ~85%, and clearly better/faster at fine control & complex filtering.

YYixiao Song@yixiao_song · Mar 12

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web! ✅ Humans achieve 85% accuracy ❌ OpenAI Operator: 24% ❌ Anthropic Computer Use: 14% ❌ Convergence AI Proxy: 13%

3.0K

Tuhin Chakrabarty@TuhinChakr · Jul 22

issues w preference LM benchmarks 🐡data contains cases where the "bad" response is just as good as chosen one 🐟model rankings can feel off (claude ranks lower than expected) led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals

AAi2@allen_ai · Jul 22

In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇

4.0K

Tuhin Chakrabarty Retweeted

Salesforce AI Research@SFResearch · Jul 22

🏆 #ICML2025 Best Paper Award: AI Safety Should Prioritize the Future of Work 📄 Paper: arxiv.org/abs/2504.13959 🎉 Congratulations to Sanchaita Hazra @hsanchaita, Bodhisattwa Prasad Majumder @mbodhisattwa, and Tuhin Chakrabarty @TuhinChakr for winning the Outstanding Award —…

2.0K

Tuhin Chakrabarty Retweeted

Abhilasha Ravichander@lasha_nlp · Jul 22

493

40.0K

Tuhin Chakrabarty Retweeted

Anna Ivanova@neuranna · Jul 21

If you're a prospective student reaching out to a PI who you want to work with - remember that we receive quite a few such emails, and if multiple people use ChatGPT to draft their letter, we're bound to see some phrases repeat over and over again.

2.0K

Tuhin Chakrabarty@TuhinChakr · Jul 21

"Seeing" robins and sparrows may not necessarily make them birdier to LMs! Super excited about this paper -- massive shoutout to all my co-authors, especially @yulu_qin and @dhevarghese for leading the charge!

YYulu Qin@yulu_qin · Jul 21

Does vision training change how language is represented and used in meaningful ways?🤔 The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]

1.0K

Tuhin Chakrabarty@TuhinChakr · Jul 21

ever since VLMs were a thing i've been interested in how the additional visual modality changes language in meaningful ways. after negative findings after negative findings, excited to report this result! proud of our junior authors for digging into this 🐸

YYulu Qin@yulu_qin · Jul 21

1.0K

Tuhin Chakrabarty Retweeted

Bethlehem Tekola, PhD@Bethlehemtekola · Jul 19

"writing is not only about reporting results; it also provides a tool to uncover new thoughts and ideas. Writing compels us to think"

553

3.0K

1.0K

116.0K

Tuhin Chakrabarty@TuhinChakr · Jul 20

Strange world to live in. AI Twitter is overblown with claims on IMO performance of LLMs when 99% of the humans can't do it or care about it. The gap between the real utility and what it takes to paint the illusion of intelligence will only grow with time :)

510

Tuhin Chakrabarty@TuhinChakr · Jul 20

This explains why OpenAI results are out and GDM results are not. And what's out is not even official results verified by IMO!

MMikhail Samin@Mihonarium · Jul 20

🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI…

104

17.0K

Tuhin Chakrabarty Retweeted

Niloofar (✈️ ICML)@niloofar_mire · Jul 19

Join us in west building room 223 for the #Memorization workshop!!

5.0K

Tuhin Chakrabarty@TuhinChakr · Jul 19

At some point @amazon should release public data on the sales of AI generated books :) Maybe this will clear the air of the “transformativeness” myth of training on 📚

JJohn Loeber 🎢@johnloeber · Jul 18

My dad bought a book on Carl Jung from Amazon He’s a few pages in, telling me how bad it is I look. The cover looks like Dall-E. The text formatting has messy white space. The introduction’s third sentence is “Jung was not X, he was Y.” The entire book is ChatGPT generated!

516

Tuhin Chakrabarty@TuhinChakr · Jul 18

Excited to share what I have been focusing on this year! Inference-time search to optimize Bayesian surprise pushes us towards long-horizon discovery! Introducing "AutoDS": Autonomous Discovery via Surprisal. "It can not only find the diamond in the rough, but also can rule out…

AAi2@allen_ai · Jul 18

Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵

174

109

16.0K