Danish Pruthi
@danish037
Faculty at the Indian Institute of Science, Bangalore. PhD from @LTIatCMU.
At #ICML2025, introducing STAMP. A simple approach to verify whether your content (e.g., a dataset) is a part of the data used for training language models. ⤵️
As we have scaled our conferences up (somehow), there is absolutely no human-touch in the entire peer-review pipeline of most AI/NLP conferences. No wonder the quality has gone down the drain. Just feels like soul-sucking service. We need better models.
At #ICML2025, I am super excited to introduce STAMP. This is a marriage b/w dataset inference & watermarking that finally(!) lets creators PROVE their content was used to train LLMs🔍 Its a MAJOR push taking the academic problem into real world. w/Saksham Rastogi @danish037 🧵
Rarely does a book shape one's perspective as much as "The Socrates Express" did for me. The author, @Eric_Weiner, brings to life dead philosophers and what they stood for. It's been my go to book for months now. To be savored.

Wait until you find out how billing of engineers happen in Indian service based IT companies, soham will look like a saint to you then
Danish Pruthi speaking about geographical disparities in language and image generation @vlms4all at #CVPR2025 (room: 104E). Also, we still have mugs, stickers and buttons! Come and grab them and enjoy the insightful talks!
Had a great day at @iiscbangalore building strategic partnership between @Cornell and IISc, and also giving a talk on my lab’s work on globally equitable AI. Thanks to @danish037 for being a wonderful host—loved learning about the fantastic work he and his students are doing at…
Spend a few hours on gapminder.org/dollar-street. Shows how people in similar class live similarly across the world. Look at people. Look at their houses. Look at their most valuable items. You'll viscerally feel that humans are inherently the same and mostly just trying to get by
Essential reading for the growing number of young students interested in working on mechanistic interpretability.
x.com/i/article/1923…
Our study highlighting plagiarism concerns in AI-generated research is now accepted to ACL (main conference): arxiv.org/abs/2502.16487. Effort led by amazing @tarungupta360. Will share other accepted papers soon. Stay tuned 🙂
Remember this study about how LLM generated research ideas were rated to be more novel than expert-written ones? We find a large fraction of such LLM generated proposals (≥ 24%) to be skillfully plagiarized, bypassing inbuilt plagiarism checks and unsuspecting experts. A 🧵
Becoming an expert requires first learning the basics of the field. Learning the basics requires doing exercises that AI can do. No amount of class redesign can change this. (What will change: the weight of exams in computing the final grade)
I'm sympathetic to the professors quoted in this, but at a certain point if your students can cheat their way through your class with AI, you probably need to redesign your class. nymag.com/intelligencer/…
My wish list for what the kids should be equipped with to navigate online and offline life as an adult, a 🧵