Prior @ AI2
@Ai2Prior
Tackling the boldest computer vision problems @allen_ai
We’re presenting SAM2Act at #ICML! Come check out the many amazing projects from AI2, and stop by to chat with us and learn more about our work.
This week is #ICML in Vancouver, and a number of our researchers are participating. Here's the full list of Ai2's conference engagements—we look forward to connecting with fellow attendees. 👋
Excited to present our work at #ICML next week! Always happy to chat about all things 🔥 in Robotics and AI. I’m also be on the academic job market this coming year — would love to connect about any potential opportunities!
Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that…
It’s incredible to have both your advisors at the same company! With @fox_dieter17849 building the Robotics team, and @RanjayKrishna leading PRIOR, @allen_ai is set to become a powerhouse in robotics, computer vision, and embodied AI for open science research . Excited to be part…
Talent density only going up and to the right at ai2. Let's keep pushing.
🚨Tired of binary pass/fail metrics that miss the bigger picture? 🤖Introducing #RoboEval — an open benchmark that shows *how* robot manipulation policies behave and *why* they fail, not just *if* they succeed. 🧵1/n 🔗 robo-eval.github.io 📄 robo-eval.github.io/media/RoboEval…
🥳 Excited to share that I’ll be joining the CS Department at UNC-Chapel Hill (@unccs @unc_ai_group) as an Assistant Professor starting Fall 2026! Before that, I’ll be working at Ai2 Prior (@allen_ai @Ai2Prior) and UW (@uwcse) on multimodal understanding and generation.
Our Molmo work won Best Paper Honorable mention at #CVPR2025 ! This large project was one of my best experiences with a fantastic team!
I am doing something silly by testing whether I can remember and deliver multiple talks on the same day on different slices of my group’s research. If you are at #CVPR2025 on June 11th, come to one or all of them :D 9:05am: Behaviors & bodies: how they shape one another…
Following up on our work on Molmo: Molmo points, but how can those points power real-world robotics? Introducing GraspMolmo, VLM that plugs seamlessly into robotic systems to generate semantically meaningful grasp poses from natural language commands. 👉 abhaybd.github.io/GraspMolmo/
How should a robot hold a water bottle? 🤔 That depends: is it opening it, or passing it to you? I’m excited to introduce GraspMolmo, a VLM that predicts semantically appropriate grasps based on your command! Website: abhaybd.github.io/GraspMolmo/ 🧵 Thread ↓
Building on our work with Molmo, we’re excited to introduce GraspMolmo — a vision-language model that predicts semantically meaningful grasps conditioned on natural language. A fantastic effort led by our PYI, @ab_deshpande !
How should a robot hold a water bottle? 🤔 That depends: is it opening it, or passing it to you? I’m excited to introduce GraspMolmo, a VLM that predicts semantically appropriate grasps based on your command! Website: abhaybd.github.io/GraspMolmo/ 🧵 Thread ↓
Excited to be at #CVPR2025 in Nashville! 🎉 I’m presenting a demo paper with real-world robot demos and co-organizing two workshops: Robo 3D VLM and Generalization for Robotic Manipulation. Let’s connect if you’re into 🔥 Robotics + AI — and don’t miss our stacked speaker…
Let us know how good is Molmo is at language guided pointing 👈 Vote here👇
Point-Battle is now live! Vote or Submit your multimodal model and see how it stacks up in language-guided pointing and grounded visual reasoning—let the community decide which MLLM really hits the mark. We will also open-source all data for training MLLMs for pointing later on.…
Great to see Molmo leading on pointing👉
👉 Pointing is our first “language”—babies master it before words. Precise spatial grounding powers robotics, assistive tech, HCI, and vision-language interfaces. 🤔 But can today's MLLMs point with pixel-level accuracy and truly ground visual reasoning?📷We introduce PointArena,…
👉 Pointing is our first “language”—babies master it before words. Precise spatial grounding powers robotics, assistive tech, HCI, and vision-language interfaces. 🤔 But can today's MLLMs point with pixel-level accuracy and truly ground visual reasoning?📷We introduce PointArena,…
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of…
Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and…
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of…
🚀 Many breakthroughs in computer vision have come from large-scale benchmarks & challenges like ImageNet, MS COCO, and WILDS. 🤖⚡ Standardizing benchmarks for robotic manipulation has been challenging, but with the rise of generalist robotic policies, evaluating their…
🎉📢Exciting news! Join us at the inaugural @CVPR workshop on 3D Vision Language Models (VLMs) for Robotic Manipulation: Opportunities and Challenges, happening on June 11, 2025, in Nashville, TN. Explore how 3D perception can be integrated into robotic manipulation in the…
🚨 Why do robots fail under out-of-distribution perturbations? Can we diagnose these failures in advance—and 'prescribe' the right data to fix them? 🚨 Our new paper, RoboMD introduces a systematic framework for diagnosing and improving robot manipulation policies. 🤖💡…
Check out this new work from our student researcher, @DJiafei! Memory is important to both navigation and manipulation policy.
Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that…
Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on…