Vijay V.

@vijaytarian

Grad student at CMU. I do research on applied NLP. he/him

Pittsburgh, PA

Joined April 2009

466Following

642Followers

Pinned

Vijay V.@vijaytarian · Jul 26

RL with verifiable rewards? Works great ✨ Realistic or non-verifiable tasks? Still a mess 📉 Reward models and AI judges? Fragile and inconsistent 💔 Our proposal? RL from Checklist Feedback 📜 arxiv.org/abs/2507.18624 👇

187

164

15.0K

Pinned

Vijay V.@vijaytarian · Dec 9

I was extremely fortunate to recruit @Xiangyue96 as my Ph.D. student in 2018 and witness his remarkable growth into a rising star in NLP and AI. You might know him for his recent contributions like MMMU and MAmmoTH. But to me, long before these influential projects, Xiang…

XXiang Yue@xiangyue96 · Dec 9

✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇. I'm on the faculty job market this year. My work focuses on advancing the reasoning abilities of LLMs across modalities and contexts. Ping me for a chat☕

215

39.0K

Vijay V. Retweeted

Danish Pruthi@danish037 · Jul 9

At #ICML2025, introducing STAMP. A simple approach to verify whether your content (e.g., a dataset) is a part of the data used for training language models. ⤵️

103

6.0K

Vijay V. Retweeted

Seungone Kim@seungonekim · Dec 6

🌟Our results show that LMs have distinct strengths! For example, while GPT-4o excels at generating new instances, Claude-3.5-Sonnet is better at refining existing instances. 🤯We also observe unexpected results that in some cases, LMs with stronger problem-solving abilities do…

929

Vijay V. Retweeted

Seungone Kim@seungonekim · Dec 6

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

190

104

45.0K

Vijay V.@vijaytarian · Nov 15

Few understand this

JJing Yu Koh@kohjingyu · Nov 15

20 hours of experiments can save you 15 mins of looking at your data

559

Vijay V. Retweeted

Ian Wu@ianwu97 · Oct 7

1/ What’s the best way to supervise LLMs using LLM judges? We show that Minimum Bayes Risk (MBR) decoding is the way to go! With MBR decoding you can: 💡 Trade compute for performance at inference time 🧰 Generate data for self-training without needing external labels

12.0K

Vijay V.@vijaytarian · Oct 24

What a coincidence that we released announcements about LLM bias on the same day! But our conclusions were different - OpenAI found minimal bias while we found significant bias. 👀 Why is this? 🧐 🧵

OOpenAI Newsroom@OpenAINewsroom · Oct 15

We’re releasing a new study that explores how users’ names can impact ChatGPT’s responses. openai.com/index/evaluati…

145

527

218

91.0K