Tanmay Gupta
@tanmay2099
Senior Research Scientist @allen_ai (Ai2) | Developing the science and art of multimodal AI agents | Prev. CS PhD, UIUC and EE UG, IIT Kanpur
Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and…
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of…
This morning took a scenic walk from Ai2’s (@allen_ai) past to its future! Reminds me of this wonderful feeling of day 1 as an intern at a new and shiny office - no idea where anything is anymore! 🤩




Great initiative by #CVPR2025! Kudos to Alyosha and Antonio for volunteering to run these practice sessions 👏👏

Loved working with Zaid as he led this exciting project at Ai2! LLM-based coding agents are remarkably capable when given well-grounded plans but generating such plans from arbitrarily large code-bases is extremely challenging to do efficiently, the solution: MutaGReP
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of…
MutaGReP Execution-Free Repository-Grounded Plan Search for Code-Use
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: yueyang1996.github.io/cosyn/ Dataset: huggingface.co/datasets/allen… Paper:…
(Thanks for the shoutout @anand_bhattad !) or CodeNav which generalizes tool-use to code-use! Some key improvements upon VisProg / ViperGPT style tool-use systems: ✅ Its way more flexible in how tools are provided (just build a python codebase and point the LLM to that…
🚨 I’m on the 2024-2025 academic job market! j-min.io I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by: 1⃣Making it more scalable 2⃣Making it more faithful 3⃣Evaluating and refining multimodal…
I missed this post back in JULY when Tanmay made it but it's prescient and even more relevant now. Ccore NLP folks, remember not to re-invent the wheel. Agents are a thing in robotics and reinforcement learning and planning. We have algorithms! Come chat with us!
Do we need to narrowly redefine "Agent" for LLM-Agents or can we just borrow a broader definition from RL / Embodied AI literature? LLM Agents are agentic in the same sense that a trained robot or an RL policy is agentic. Making this connection more explicit allows us to borrow…
Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: unnat.github.io/notes/Hidden_C… PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇
I am hiring interns to join us @allen_ai in advancing the science and art of building agents of all kinds: 🕸️ Web-use 💻 Code-use 🛠️ Tool-use Join us in answering exciting questions about multimodal planning, agentic learning, dealing with underspecified queries and more!
📢Applications are open for summer'25 internships at the PRIOR (computer vision) team @allen_ai: Come join us in building large-scale models for: 📸 Open-source Vision-Language Models 💻 Multimodal Web Agents 🤖 Embodied AI + Robotics 🌎 Planet Monitoring Apply by December…
We won the outstanding paper award @corl_conf !!! 😀😀😀And here’s what’s inside that mysterious big box
🚀 Quick Update 🚀 🎉 @ZCCZHANG will present PoliFormer at CoRL Oral Session 5 (🕤 9:30-10:30, Fri, Nov 8, CET)! 🎉 Meet us at Poster Session 4 (🕓 16:00-17:30) to chat with @ZCCZHANG, @rosemhendrix, and Jordi! 💻 Our code & checkpoints are NOW public: github.com/allenai/polifo…
Incredibly honored to share this amazing news! PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! 🎉 Check out our project and code: poliformer.allen.ai
PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! On policy RL with a modern transformer architecture can produce masterful navigators for multiple embodiments. All Sim-to-Real. A last hurrah from work at @allen_ai ! Led by @KuoHaoZeng @ZCCZHANG and @LucaWeihs
This is how we do POS tagging in 2024, right? Jokes aside, the model is actually really good at pointing. Check it out yourself!
Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it…