Mohit Shridhar
@mohito1905
Research Scientist at @GoogleDeepMind. http://mohito1905.bsky.social
Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations? Introducing 𝗚𝗘𝗡𝗜𝗠𝗔 paper, videos, code, ckpts: genima-robot.github.io 🧵Thread⬇️
Introducing EgoMimic - just wear a pair of Project Aria @meta_aria smart glasses 👓 to scale up your imitation learning datasets! Check out what our robot can do. A thread below👇
At Physical Intelligence (π) our mission is to bring general-purpose AI into the physical world. We're excited to show the first step towards this mission - our first generalist model π₀ 🧠 🤖 Paper, blog, uncut videos: physicalintelligence.company/blog/pi0
Can my robot cook my food, rearrange my dresser, tidy my messy table and do so much more without ANY demos or real-world training data? Introducing ManipGen: A generalist agent for manipulation that can solve long-horizon robotics tasks entirely zero shot, from text input! 1/N
Can robots learn to manipulate with both care and precision? Introducing Adaptive Compliance Policy, a framework to dynamically adjust robot compliance both spatially and temporally for given manipulation tasks from human demonstrations. Full detail at adaptive-compliance.github.io
We’ve seen humanoid robots walk around for a while, but when will they actually help with useful tasks in daily life? The challenge here is the diversity and complexity of real-world scenes. Our new work tackles this problem via 3D visuomotor policy learning. Using data from…
Evaluation in robot learning papers, or, please stop using only success rate a paper and a 🧵 arxiv.org/abs/2409.09491
Try out Molmo on your application! This is a great example by @DJiafei! We have a few videos describing Molmo's different capabilities on our blog! molmo.allenai.org/blog This one is me trying it out on a bunch of tasks and images from RT-X: youtu.be/bHOBGAYNBNI
The idea of using a VLM for pointing, RoboPoint has proven useful and generalizable for robotic manipulation. But the next challenge is: can VLMs draw multiple "points" to form complete robotic trajectories? @allen_ai 's new Molmo seems up to the task—very exciting!
🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵 lecar-lab.github.io/dial-mpc/
Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! homangab.github.io/gen2act/ 1/n
Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over…
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
I’ve been training dogs since middle school. It’s about time I train robot dogs too 😛 Introducing, UMI on Legs, an approach for scaling manipulation skills on robot dogs🐶It can toss, push heavy weights, and make your ~existing~ visuo-motor policies mobile!
I tried visualizing action trajectory multimodality with generative modeling approaches using the #lerobot library. I came to some unexpected conclusions and whipped up a quick report for you all: github.com/alexander-soar…. What do you think? To what extent do the theoretical…
🚨Important update from our Robot Learning Lab in London. Following recent news, we’re moving on after a wonderful 2 years… Today, we unveil 4 big pieces of research from our incredible team. Check out the compilation video and thread below to see our final work! 📽️👇
🚀 Looking for a benchmark for bi-manual mobile manipulation with nicely collected demonstrations? We are excited to release BiGym, a new benchmark with human-collected demos! 🌐 Website: chernyadev.github.io/bigym/ 📄 Paper: arxiv.org/abs/2407.07788 💻 Code: github.com/chernyadev/big…
🚀 We are excited to announce GreenAug (Green-screen Augmentation), a physical visual augmentation method for robot learning algorithms! GreenAug enables generalisation to unseen visually distinct locations (scenes). In collaboration with @TinkerSumit @yusufma555 @stepjamUK (1/6)
Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4) younggyo.me/cqn