Chandan Singh
@csinva
Seeking superhuman explanations. Senior researcher @MSFTResearch, PhD from @Berkeley_AI
Science faces an explainability crisis: ML models can predict many natural phenomena but can't explain them We tackle this issue in language neuroscience by using LLMs to generate *and validate* explanations with targeted follow-up experiments arxiv.org/abs/2410.00812 1/2

Are you compositionally curious 🤓 Want to know how to learn embeddings using🌲? In our new #ICML2025 paper, we present Banyan: A recursive net that you can train super efficiently for any language or domain, and get embeddings competitive with much much larger LLMs 1/🧵
Prompting is an extraordinary gift to interpretability researchers. Use it! Use it a lot (carefully)! ofc it has issues but it's so much more useful than most complicated interp methods people cook up...
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163
👨🎓🧾✨#icml2025 Paper: TabICL, A Tabular Foundation Model for In-Context Learning on Large Data With @JingangQu, @DHolzmueller and @MarineLeMorvan TL;DR: a well-designed architecture and pretraining gives best tabular learner, and more scalable 1/9
The paper is now out in JEBO, with a less jargonish title and nicer figures.
New working paper: we’re all getting access to better quality advice in our jobs. In theory, this should lead to lower inequality, as the least productive workers benefit from high quality advice the most. Our experiment with chess players shows it is not necessarily true… (1/8)
Reinforcement Learning Teachers of Test Time Scaling In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve! The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve…
LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜? We found a way to probe a model's learning process to reveal *how* each example is learned. This lets us predict model generalization using only training data, amongst other insights: 🧵
Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦♀️ Ask to write articles → assumes your preferences 🤷🏻♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.…
We automated systematic reviews using gpt-4.1 and o3-mini ! Our platform (otto-SR) beat humans at all tasks and conducted 12 years of systematic review research in just two days. We also show how otto-SR can be used in the real world to rapidly update clinical guidelines 🧵
🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856
🤯Your LLM just threw away 99.9 % of what it knows. Standard decoding samples one token at a time and discards the rest of the probability mass. Mixture of Inputs (MoI) rescues that lost information, feeding it back for more nuanced expressions. It is a brand new…
Humans just saw a *new* color—literally outside the known visual spectrum. BAIR faculty and visual computing expert Ren Ng and collaborators made it possible with the Oz Vision System. 🌈👁️ Newly published in @ScienceAdvances: science.org/doi/10.1126/sc… popsci.com/health/new-col…
Thanks for featuring our work! @arankomatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here:…
Microsoft presents: Magma: A Foundation Model for Multimodal AI Agents - SotA on UI navigation and robotic manipulation tasks - Pretrained on a large dataset annotated with Set-of-Mark (SoM) for action grounding and Trace-of-Mark (ToM) for action planning.
arxiv.org/abs/2502.10385 This is our latest work SimDINO that, again based on coding rate principle, significantly simplifies the popular (but unnecessarily sophisticated) visual self-supervised learning methods DINO and DINOv2. The power of understanding and principles is…
New preprint! 🧠🤖 Brain encoding in 21 languages! biorxiv.org/content/10.110… w/ @saima_mm, @GretaTuckute, and @ev_fedorenko (1/)
The data science revolution is getting closer. TabPFN v2 is published in Nature: nature.com/articles/s4158… On tabular classification with up to 10k data points & 500 features, in 2.8s TabPFN on average outperforms all other methods, even when tuning them for up to 4 hours🧵1/19
New results for a new year! “Linking neural population formatting to function” describes our modern take on an old question: how can we understand the contribution of a brain area to behavior? biorxiv.org/content/10.110… 🧵1/
Cool new paper interpreting neurons in macaque V4 biorxiv.org/content/10.110…

🧠💡 Our LLMs just had a ‘memory augmentation’—now they can deliberate like seasoned thinkers! arxiv.org/abs/2412.17747
Absolutely. In any hypothesis test between A and B about the working of a complex system, the right answer is invariably none of the above. Systems identification is a much better paradigm for neuroscience discovery; it allows us to efficiently explore huge hypothesis spaces.
The push towards hypothesis-driven research is one of the key mistakes in the neuroscience field. In exponentially sized hypothesis spaces it simply is not a useful way of making progress. The convert-new-tech-into-hypothesis-progress industrial complex is a problem.
Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics…