Vipul Gupta
@vipul_1011
PhD Candidate @Penn_State. Past: FAIR @AIatMeta, @IITDelhi. Interested in model evaluation and responsible AI. I don’t hallucinate
🚨 New paper alert 🚨 Ever struggled with quick saturation or unreliability in benchmark datasets? Introducing SMART Filtering to select high-quality, reducing dataset size by 48% on avg (up to 68% for ARC!) and improving correlation with scores from ChatBot Arena! 📈✨ (1/N)

Thank you! I am glad you could attend
The amazing @vipul_1011 from PSU NLP is defending his PhD dissertation today!
Optimistic or Pessimistic about the future of AI? - "It doesn't matter. What matters is what each of us can do to improve things towards a better world" - Yoshua The best take I have seen!
Today marks a big milestone for me. I'm launching @LawZero_, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.
I can’t unsee it now, ChatGPT can sing better than me (it was a very low bar though)
just found out from @altryne's show that @chatgptapp advanced voice upgraded singing capability this week so here is karaokebench would say it is like 3/7 so far are there other sota singing models?
Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N
There's a different sense of satisfaction in reading research papers from 1900s, especially before 1980s
When I realized how dangerous the current agency-driven AI trajectory could be for future generations, I knew I had to do all I could to make AI safer. I recently shared this personal experience, and outlined the scientific solution I envision @TEDTalks⤵️ ted.com/talks/yoshua_b…
Imagine a world without AI, where the amount of info on internet doubles every few yrs. Compound it to 10x in 5-10 yrs, 100x info in 15-20 yrs. How are we even supposed to navigate all this info? We really need a tool that can get us the info we need. Reason I love working in AI
Who has ever read every single line of T&C? AI isn't here to replace humans but to simplify some unnecessarily complex things (which we created) and free up our time to do fun stuff. In a world where many of us have a never-ending to-do list, AI is the change we needed
Who has ever read every single line of T&C? AI isn't here to replace humans but to simplify some unnecessarily complex things (which we created) and free up our time to do fun stuff. In a world where many of us have a never-ending to-do list, AI is the change we needed
Saving logs for all experiments in an organized way saves days/weeks of effort
📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960
Despite it flaws, I think lmarena is currently the best option we have for evaluation However, am I the only one who sees a big problem by having VCs for such a crowdsource benchmarking? What happens when another startup heavily funded by same VCs lags in the ranking?
📢We’re excited to share that we’ve raised $100M in seed funding to support LMArena and continue our research on reliable AI. Led by @a16z and UC Investments (@UofCalifornia), we're proud to have the support of those that believe in both the science and the mission. We’re…
A lil late to post but... I’ve officially completed my PhD in Informatics from Penn State! 🎉🎉🎉 My thesis, “Society and Bias: Uncovering Automated Prejudices in Sociotechnical NLP Systems”, explores biases in human language technologies. #PhD #AI #Research #AcademicJourney
#NAACL takeaways: - Love seeing so many people working on evals, optimistic about new breakthroughs - Orals not having posters felt off - The size of the conference was perfect (someone said ~1800): intimate but still big enough


🎉Our paper on fairness of multidoc summarization has received an SAC award at NAACL 2025! 🥳 We appreciate the recognition from senior area chairs. @HaoyuanLi9 and @YusenZhangNLP will present our work: Posters (Exhibit Hall), Session H: Oral/Poster 5, Thursday May 1,…
🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…