Mohit Shridhar

@mohito1905

Research Scientist at @GoogleDeepMind. http://mohito1905.bsky.social

London, UK

Joined July 2010

1KFollowing

2KFollowers

Pinned

Mohit Shridhar@mohito1905 · Jul 11, 2024

Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations? Introducing 𝗚𝗘𝗡𝗜𝗠𝗔 paper, videos, code, ckpts: genima-robot.github.io 🧵Thread⬇️

160

36.0K

Mohit Shridhar Retweeted

Simar Kareer@simar_kareer · Nov 1

Introducing EgoMimic - just wear a pair of Project Aria @meta_aria smart glasses 👓 to scale up your imitation learning datasets! Check out what our robot can do. A thread below👇

238

47.0K

Mohit Shridhar Retweeted

Physical Intelligence@physical_int · Oct 31

At Physical Intelligence (π) our mission is to bring general-purpose AI into the physical world. We're excited to show the first step towards this mission - our first generalist model π₀ 🧠 🤖 Paper, blog, uncut videos: physicalintelligence.company/blog/pi0

327

2.0K

571

521.0K

Mohit Shridhar Retweeted

Murtaza Dalal@mihdalal · Oct 30

Can my robot cook my food, rearrange my dresser, tidy my messy table and do so much more without ANY demos or real-world training data? Introducing ManipGen: A generalist agent for manipulation that can solve long-horizon robotics tasks entirely zero shot, from text input! 1/N

118

626

360

143.0K

Mohit Shridhar Retweeted

Yifan Hou@YifanHou2 · Oct 15

Can robots learn to manipulate with both care and precision? Introducing Adaptive Compliance Policy, a framework to dynamically adjust robot compliance both spatially and temporally for given manipulation tasks from human demonstrations. Full detail at adaptive-compliance.github.io

127

34.0K

Mohit Shridhar Retweeted

Yanjie Ze@ZeYanjie · Oct 15

We’ve seen humanoid robots walk around for a while, but when will they actually help with useful tasks in daily life? The challenge here is the diversity and complexity of real-world scenes. Our new work tackles this problem via 3D visuomotor policy learning. Using data from…

335

165

73.0K

Mohit Shridhar Retweeted

Hadas Kress-Gazit 🪷@HadasKressGazit · Sep 30

Evaluation in robot learning papers, or, please stop using only success rate a paper and a 🧵 arxiv.org/abs/2409.09491

202

151

37.0K

Mohit Shridhar@mohito1905 · Sep 25

Try out Molmo on your application! This is a great example by @DJiafei! We have a few videos describing Molmo's different capabilities on our blog! molmo.allenai.org/blog This one is me trying it out on a bunch of tasks and images from RT-X: youtu.be/bHOBGAYNBNI

JJiafei Duan@DJiafei · Sep 25

The idea of using a VLM for pointing, RoboPoint has proven useful and generalizable for robotic manipulation. But the next challenge is: can VLMs draw multiple "points" to form complete robotic trajectories? @allen_ai 's new Molmo seems up to the task—very exciting!

18.0K

Mohit Shridhar Retweeted

Haoru Xue@HaoruXue · Sep 25

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵 lecar-lab.github.io/dial-mpc/

162

750

498

170.0K

Mohit Shridhar Retweeted

Homanga Bharadhwaj@mangahomanga · Sep 24

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! homangab.github.io/gen2act/ 1/n

223

123

58.0K

Mohit Shridhar Retweeted

Jiafei Duan@DJiafei · Sep 24

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over…

201

116

48.0K

Mohit Shridhar Retweeted

Wenlong Huang@wenlong_huang · Aug 29

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

105

513

267

188.0K

Mohit Shridhar Retweeted

Huy Ha@haqhuy · Jul 15, 2024

I’ve been training dogs since middle school. It’s about time I train robot dogs too 😛 Introducing, UMI on Legs, an approach for scaling manipulation skills on robot dogs🐶It can toss, push heavy weights, and make your ~existing~ visuo-motor policies mobile!

444

142

143.0K

Mohit Shridhar Retweeted

Alexander Soare@asoare159 · Jul 11, 2024

I tried visualizing action trajectory multimodality with generative modeling approaches using the #lerobot library. I came to some unexpected conclusions and whipped up a quick report for you all: github.com/alexander-soar…. What do you think? To what extent do the theoretical…

7.0K

Mohit Shridhar Retweeted

Stephen James@stepjamUK · Jul 11, 2024

🚨Important update from our Robot Learning Lab in London. Following recent news, we’re moving on after a wonderful 2 years… Today, we unveil 4 big pieces of research from our incredible team. Check out the compilation video and thread below to see our final work! 📽️👇

186

31.0K

Mohit Shridhar Retweeted

Nikita Chernyadev@nc__dev · Jul 11, 2024

🚀 Looking for a benchmark for bi-manual mobile manipulation with nicely collected demonstrations? We are excited to release BiGym, a new benchmark with human-collected demos! 🌐 Website: chernyadev.github.io/bigym/ 📄 Paper: arxiv.org/abs/2407.07788 💻 Code: github.com/chernyadev/big…

12.0K

Mohit Shridhar Retweeted

Eugene Teoh@eugene_teoh · Jul 11, 2024

🚀 We are excited to announce GreenAug (Green-screen Augmentation), a physical visual augmentation method for robot learning algorithms! GreenAug enables generalisation to unseen visually distinct locations (scenes). In collaboration with @TinkerSumit @yusufma555 @stepjamUK (1/6)

7.0K

Mohit Shridhar Retweeted

Younggyo Seo@younggyoseo · Jul 11, 2024

Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4) younggyo.me/cqn

109

16.0K