Vijay V.
@vijaytarian
Grad student at CMU. I do research on applied NLP. he/him
RL with verifiable rewards? Works great ✨ Realistic or non-verifiable tasks? Still a mess 📉 Reward models and AI judges? Fragile and inconsistent 💔 Our proposal? RL from Checklist Feedback 📜 arxiv.org/abs/2507.18624 👇
I was extremely fortunate to recruit @Xiangyue96 as my Ph.D. student in 2018 and witness his remarkable growth into a rising star in NLP and AI. You might know him for his recent contributions like MMMU and MAmmoTH. But to me, long before these influential projects, Xiang…
✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇. I'm on the faculty job market this year. My work focuses on advancing the reasoning abilities of LLMs across modalities and contexts. Ping me for a chat☕
At #ICML2025, introducing STAMP. A simple approach to verify whether your content (e.g., a dataset) is a part of the data used for training language models. ⤵️
🌟Our results show that LMs have distinct strengths! For example, while GPT-4o excels at generating new instances, Claude-3.5-Sonnet is better at refining existing instances. 🤯We also observe unexpected results that in some cases, LMs with stronger problem-solving abilities do…
#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
Few understand this
20 hours of experiments can save you 15 mins of looking at your data
1/ What’s the best way to supervise LLMs using LLM judges? We show that Minimum Bayes Risk (MBR) decoding is the way to go! With MBR decoding you can: 💡 Trade compute for performance at inference time 🧰 Generate data for self-training without needing external labels
What a coincidence that we released announcements about LLM bias on the same day! But our conclusions were different - OpenAI found minimal bias while we found significant bias. 👀 Why is this? 🧐 🧵
We’re releasing a new study that explores how users’ names can impact ChatGPT’s responses. openai.com/index/evaluati…