Apoorv Khandelwal
@apoorvkh
cs phd student at brown
Wondering how long it takes to train a 1B-param LM from scratch on your GPUs? 🧵 See our paper to learn about the current state of academic compute and how to efficiently train models! Use our code to test your own models/GPUs! arxiv.org/abs/2410.23261 github.com/apoorvkh/acade…
Check out our new paper: “How Do Vision-Language Models Process Conflicting Information Across Modalities?”! Vision-language models often struggle with conflicting inputs - we show how their internal representations and key attention heads reveal when and how this happens, and…
I wrote up this post about how we should **unify RL and next-token-prediction** based on my perspective how humans learn new languages. then realize @jxmnop wrote the exact same thing about how we should scale RL to 10^26 FLOPs
🚨 Registration is live! 🚨 The New England Mechanistic Interpretability (NEMI) Workshop is happening August 22nd 2025 at Northeastern University! A chance for the mech interp community to nerd out on how models really work 🧠🤖 🌐 Info: nemiconf.github.io/summer25/ 📝 Register:…
The NeurIPS paper checklist corroborates the bureaucratic theory of statistics. argmin.net/p/standard-err…
Is there a clear choice or difference between Cursor, VS Code + Copilot, or something else? They both seem quite similar to me (VS Code-based, chat, tab complete, same downstream LLMs, etc). Thoughts?
Molmo won the Best Paper Honorable Mention award @CVPR! This work was a long journey over 1.5 years, from failing to get strong performance with massive scale, low quality data, to focusing on modest scale extremely high quality data! Proud to see what it became. #CVPR2025
🤔Ever wonder why LLMs give inconsistent answers in different languages? In our paper, we identify two failure points in the multilingual factual recall process and propose fixes that guide LLMs to the "right path." This can boost performance by 35% in the weakest language! 📈
excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵
this is sick all i'll say is that these GIFs are proof that the biggest bet of my research career is gonna pay off excited to say more soon
Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: arxiv.org/abs/2505.07215
Today, we’re announcing the preview release of ty, an extremely fast type checker and language server for Python, written in Rust. In early testing, it's 10x, 50x, even 100x faster than existing type checkers. (We've seen >600x speed-ups over Mypy in some real-world projects.)
📣 New paper! We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern. However, this does not mean they reason the same way across all languages or in new domains. [1/N]
Excited to announce I'll be starting as an assistant professor at @TTIC_Connect for fall 2026! In the meantime, I'll be graduating and hanging around Ai2 in Seattle🏔️
Today we're excited to introduce Vy, our AI that sees and acts on your computer. At Vercept, our mission is to reinvent how humans use computers–enabling you to accomplish orders of magnitude more than what you can do today. Vy is a first glimpse at AI that sees and uses your…
14 Advanced Python Features blog.edward-li.com/tech/advanced-…
The university will not surrender its independence or relinquish its constitutional rights. Neither Harvard nor any other private university can allow itself to be taken over by the federal government. hrvd.me/ResearchFundin…
I joined @GoodfireAI a little over a month ago to do interpretability! I am really excited to extend my work beyond just LMs. I think interp has a lot to offer to e.g., scientific models. Understanding them might actually teach us something new about the world 🌎
Introducing API. A new era of agentic computer use begins today.
Why is interpretability the key to dominance in AI? Not winning the scaling race, or banning China. Our answer to OSTP/NSF, w/ Goodfire's @banburismus_ Transluce's @cogconfluence MIT's @dhadfieldmenell resilience.baulab.info/docs/AI_Action… Here's why:🧵 ↘️