Sachin Gururangan
@ssgrn
Researcher @AnthropicAI Prev:🦙 @aiatmeta, @allen_ai PhD @uwcse + @uwnlp
Life update: I’m thrilled to be joining the pretraining team at @AnthropicAI next week! Grateful to everyone at @Meta GenAI for an incredible journey building Llama. Excited for the next chapter 🚀
📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the…
We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…
Excited to share that @AnthropicAI has launched its Economic Futures Program! As a member of their Economic Advisory Council, I’m thrilled about this initiative supporting research and policy development on AI’s economic impacts. Research grants up to $50K available!
I will be attending ICML next week. Reach out (by email) if you'd like to chat! About Anthropic / research / life. I'm especially interested in meeting grad students who can teach me new research ideas.
Our team is very excited to release Llama 4! Open reasoning model drops are incoming too 🙂
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model…
Check out the newest member of the "Branch-Train" family -- BTS (or, you know, your favorite k-pop boy band)! We introduce "stitch layers", a new architecture to combine expert LLMs with a small amount of training. Amazing work led by our intern @IreneZhang30 !!
✨New Preprint✨We introduce 𝐁𝐫𝐚𝐧𝐜𝐡-𝐓𝐫𝐚𝐢𝐧-𝐒𝐭𝐢𝐭𝐜𝐡 (𝐁𝐓𝐒), an efficient & flexible method for stitching together independently pretrained LLM experts (i.e. code, math) into a single, capable generalist model. Key Takeaways: ✅BTS achieves the best average…
Our team is excited to release Llama 3.3 70B which is comparable in performance to 405B/GPT4o! Post-training go brrrr
Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…
New paper by our intern @yue__yu! We use synthetic data to teach reward models to generate rationales for their scalar outputs. Our technique makes RMs less of a black box, more powerful, and more data efficient. Check it out!
🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! arxiv.org/abs/2411.16646 #rlhf #LLM
2025 internship opps on the Llama team are now live! Feel free to reach out, especially if you’re excited about working on problems in post-training world (eg ranking/judges, reasoning, or all things synthetic data)! Lots of fun things to explore :) metacareers.com/jobs/111868362…
The Llama 3 paper is a must-read for anyone in AI and CS. It’s an absolutely accurate and authoritative take on what it takes to build a leading LLM, the tech behind ChatGPT, Gemini, Copilot, and others. The AI part might seem small in comparison to the gargantuan work on *data*…
Why do 16k GPU jobs fail? The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc. We hit an overall 90% effective-training-time. ai.meta.com/research/publi…
Excited to give a talk at this workshop! I’ll discuss continual learning in llama 3 posttraining, and directions we’re excited about for llama 4 and beyond.
[1/4] Happy to announce that we are organizing a workshop on continuous development of foundation models at NeurIPS’24. Website: sites.google.com/corp/view/cont…
as my other amazing teammates have already shared, check out our llama 3.1 paper here! lots of fun tidbits about the highs, lows, sweat, and tears that go into training LLMs lol ... onto llama 4!!! ai.meta.com/research/publi…
Oh, one more thing! Our new Llama license allows the outputs of the Llama 3.1 models to improve any other model. So, go nuts :)
Llama 3.1 405B is here! It has 128K context, and is a really strong model (MMLU 5-shot 87.3, HumanEval 89.0, MATH 73.8) Model: llama.meta.com Paper: tinyurl.com/llama3paper As a member of posttraining team, here are a few takeaways from posttraining Llama 3 🧵