Mimansa Jaiswal
@MimansaJ
Currently @aiatmeta | Orchestration, Robustness, Data & Annotations, Evaluation & Interpretability in LLMs
I have been slowly chipping away on a project about benchmarking + PL(ish!). I'd love to know if people have any suggestions about domain specific markup languages. Tagging some people for suggestions: @lateinteraction, @ShriramKMurthi, @lmqllang mimansajaiswal.github.io/posts/capstone/
🎉 Congrats to Prof. @xuwanghci on her NSF CAREER Award! Her work on intelligent tutoring systems is shaping the future of inclusive, effective learning. 👏📚 🔗cse.engin.umich.edu/stories/xu-wan… #NSFCareerAward #AIinEducation #MichiganCSE
Are we allowed to reveal our identities as reviewers after NeurIPS decisions are made?
Repeat after me: think carefully before you post something. Even if it wasn’t your intention, your tweet is offensive and racially insensitive. By saying “should post this on WeChat though”, you’re implying that a certain ethnic group (Chinese, everywhere) is responsible for the…
Excited to finally share one of the things my team’s been working on: Opal, a fun new way to build with AI: opal.withgoogle.com Opal is a new experiment from @GoogleLabs that lets anyone create delightful, useful, weird, wonderful mini-AI apps in minutes. It's early, it's…
Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
Very excited to announce that I’ll be co-organizing a @NeurIPSConf workshop on LLM evals! Identifying shortcomings in model capabilities in a robust, scientific way is a critical part of model development. Looking forward to discussing ideas and hearing from some eval experts!
We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3
WHY do you prefer something over another? Reward models treat preference as a black-box😶🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
Everyone's talking about AI performance on the IMO. Let me highlight 🇨🇦Canadian 11th grader Warren Bei🇨🇦, one of five participants with a *perfect* 42/42. This is his *fifth* (and final) IMO representing Canada, with three golds and two silvers. (➡️ MIT undergrad in the fall)
Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇
We made a Mac app that lets you run a bunch of Claude Codes in parallel. Introducing Conductor!
Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)
Poster is tomorrow 11-*1:30 I am also recruiting PhD students @umdcs for fall 2026 with interests in (causal/mechanistic) LM interpretability and its practical applications (steering, efficient adaptation, model editing, textual explanations for users, etc.)
I'll be hiring a couple of Ph.D. students at CMU (via LTI or MLD) in the upcoming cycle! If you are interested in joining my group, please read the FAQ before reaching out to me via email :) docs.google.com/document/d/12V…
Our team is hiring a postdoc in (mech) interpretability! The ideal candidate will have research experience in interpretability for text and/or image generation models and be excited about open science! Please consider applying or sharing with colleagues: metacareers.com/jobs/222395396…
Going to #ICML2025 tomorrow - Sat morning! If you are graduating this year with a PhD, and you are interested in Google DeepMind, send me a message. Where I will be: 0. Wed 1pm @WiMLworkshop Mentoring Table and 2pm panel discussion 1. Wed 4:30pm East Exhibition Hall A-B…
🇮🇳 India at #ICML2025! From @lossfunk 📄 ACCEPTED PAPERS: 42 💡 SPOTLIGHTS: 6 (3 oral, 3 spotlight) 👥 AUTHORS: 96 🏆 GLOBAL RANK: #18 Thread with all papers & Indian authors below 👇
Don't forget to tune in tomorrow, July 10th for a session with @EkdeepL on "Rational Analysis of In-Context Learning Elicits a Loss-Complexity Tradeoff" Learn more: cohere.com/events/Cohere-…
How do LLMs learn new tasks from just a few examples? What’s happening inside during in-context learning? 🤔 Join us July 10 for a talk by @EkdeepL on how LLMs adapt like cognitive maps—and how we can predict their behavior without accessing weights.
Checkout our paper on establishing best practices for building agentic benchmarks!!
Trying to get an LLM plan for my mom, @meltyinc what are the request limits on the plus plan (trying to compare this vs raycast)?