Pranjal Aggarwal ✈️ ICML 2025
@PranjalAggarw16
PhD Student @LTIatCMU. research scientist intern @AIatMeta FAIR. Working on reasoning, computer-use agents and test-time compute. Prev @IITD
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o cmu-l3.github.io/l1 🧵



Will future SWE agents be computer-use agents? We explore this shift in Programming with Pixels: an agent environment where agents learn to use an IDE's existing functionality rather than relying on hand-designed tool APIs programmingwithpixels.com
What if AI agents did software engineering like humans—seeing the screen & using any developer tool? Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks. programmingwithpixels.com 🧵
Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 alphaverus.github.io w/ Bryan, @wellecks




I will be at #ICML2025 this week. Reach out if you want to chat about llm reasoning, computer-use agents, code gen or actually anything! (DMs are open) I will also be presenting AlphaVerus (self-improving verified code gen) this Thursday! alphaverus.github.io
Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
AlphaVerus has been accepted at #ICML2025! alphaverus.github.io arxiv.org/abs/2412.06176 We've seen in math that good verification (e.g., Lean) unlocks surprising capabilities–why not for code too? AlphaVerus puts LLMs & Rust’s Verus verifier into a self-improving loop–lots…
We present AlphaVerus, which enables LLMs to generate provably correct Rust code via a new tree search and self-improvement loop Very excited about AlphaVerus as a starting point for truly trustworthy code generation. Amazing work by @PranjalAggarw16! alphaverus.github.io
Cool to see our L1 (arxiv.org/abs/2503.04697) methodology used here! And a nice insight about using the controllable reasoning budget to enable more efficient use of inference hardware
With INTELLECT-2 we aim for frontier reasoning performance with a controllable thinking budget. By incorporating length rewards into our training run, users can specify how long the model should reason for a given task. primeintellect.ai/blog/intellect…
The recent Claude 3.7 model from Anthropic lets you control the budget for thinking—how might this work? Check out L1, our fully open recipe for training reasoning models with controllable thinking budgets!
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o cmu-l3.github.io/l1 🧵