Aniket Vashishtha
@AniketVashisht8
Working on Causality and LLMs | MSCS @IllinoisCDS | Prev Research Fellow @MSFTResearch
Can we teach Transformers Causal Reasoning? We propose Axiomatic Framework, a new paradigm for training LMs. Our 67M-param model, trained from scratch on simple causal chains, outperforms billion-scale LLMs and rivals GPT-4 in inferring cause-effect relations over complex graphs

Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:
🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive
How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/
🚀 Meet FrugalRAG at #ICML2025 in Vancouver 🇨🇦! 📍 July 18 – VecDB Workshop, West 208–209 📍 July 19 – ES-FoMO Workshop, East Exhibition Hall A Come chat with me and @naga86 and learn how we're rethinking training efficiency and inference latency for RAG systems. 🧵
✨Excited to be at ICML 2025 to share our new evaluation framework for graph-learning datasets, work with @CorinnaCoupette, @JeremyWayland, @Pseudomanifold Curious about who shall pass, and who shall not?🤔 Enjoy nods to LoTR?🧝♀️ Check out our blog post and paper in the comments!
At #ICML2025, I am super excited to introduce STAMP. This is a marriage b/w dataset inference & watermarking that finally(!) lets creators PROVE their content was used to train LLMs🔍 Its a MAJOR push taking the academic problem into real world. w/Saksham Rastogi @danish037 🧵
Really pumped for my Oral presentation on this work today!!! Come check out the RL session from 3:30-4:30pm in West Ballroom B You can also swing by our poster from 4:30-7pm in West Exhibition Hall B2-B3 # W-713 See you all there!
Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵
Check out our 3 papers on Testing LLM Moral Reasoning via Multi-Agent Simulations! ✍️ Our summary blogpost: lesswrong.com/posts/2WAire3L… 📑Our series of 3 papers:1️⃣GovSim (NeurIPS 2024) arxiv.org/abs/2404.16698 2️⃣SanctSim zhijing-jin.com/files/papers/2… 3️⃣MoralSim arxiv.org/abs/2505.19212
🪄We made a 1B Llama BEAT GPT-4o by... making it MORE private?! LoCoMo results: 🔓GPT-4o: 80.6% 🔐1B Llama + GPT-4o (privacy): 87.7% (+7.1!⏫) 💡How? GPT-4o provides reasoning ("If X then Y"), the local model fills in the blanks with your private data to get the answer!
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:…
🚨Calling all writing tutors & instructors! Can writing tools give guidance, but not text suggestions? We built a prototype based on conversations with tutors, and would love your thoughts: 📝 Try it out tinyurl.com/writor-system 🧾 Take a short survey to enter a $20 raffle
I will be at #CVPR2025 all of this week and will be presenting this work on June 15th!
Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.
LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).
Many existing privacy leakage metrics in prod are legacy from structured data and are lexical (string matching) based💢 We concretely show that is a huge issue in unstructured text and propose a semantic level re-identification attack that reveal weaknesses of sanitation methods.
Think PII scrubbing ensures privacy? 🤔Think again‼️ In our paper, for the first time on unstructured text, we show that you can re-identify over 70% of private information *after* scrubbing! It’s time to move beyond surface-level anonymization. #Privacy #NLProc 🔗🧵
🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]