Andrea Zanette @ ICML 2025
@Zanette_ai
Assistant professor at CMU
Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…
Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer! Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate! Details and links to paper & open source code below! 🧵1/9
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n
(1/n) Check our new paper about accelerating RL training in LLMs! zanette-labs.github.io/speed-rl/ We propose SPEED, an online curriculum learning method for rule-based RL training of reasoning models. SPEED achieves 2x to 6x speedups across training setups and benchmarks.
Super cool insights on 4D reconstruction from @QianqianWang5 at the ScanNet++ workshop @CVPR!
We are hiring Research Scientists for our Machine Learning and Optimization team at Google DeepMind Bangalore. If you're passionate about cutting-edge AI research and building efficient, elastic, customized, and safe LLMs, we'd love to hear from you. We are looking for…
Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!
SCA is the first self-improvement rl framework for general multi-turn tool-use agents. It does so by first generating its own verifiers for its own synthetic tasks. Stay tuned for more details!
🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to…
This is really great work by Fahim and co, moving out of the regime where we have ground truth rewards is critical for the next level of RL scaling in LLMs
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n
"Can Large Reasoning Models Self-Train?" A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT human labels - just learning from their own consistency. Early results rival models trained on ground-truth answers.
This is pretty remarkable – AI systems learning to self-improve We're seeing a wave of research where AI isn't just learning from human feedback, it's starting to figure out how to improve itself using its own internal signals A subtle but profound shift.
🇸🇬✈️Come check out Zochi's work at #ICLR2025 — and a big congrats for their first citation 😉🎉 We thank the workshop organizers for approving the work & inviting our reps to present on Zochi's behalf. Locations, times, & and more details below 🧵👇
Announcing the first fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Several groups have shown AI-generated work at workshops, but main conference acceptance is a far higher bar. While workshops…
The 1st fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Zochi, the 1st PhD-level agent. Beta open.