Jitendra MALIK
@JitendraMalikCV
Again the power of tactile sensing and multi-finger hands comes through. This is the future of dexterous manipulation!
🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate…
Angjoo Kanazawa @akanazawa and I taught CS 280, graduate computer vision, this semester at UC Berkeley. We found a combination of classical and modern CV material that worked well, and are happy to share our lecture material from the class. cs280-berkeley.github.io Enjoy!
Enjoy watching a humanoid walking around UC Berkeley. It only looks inebriated :-)
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)
I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assistance of @ToruO_O. Lecture videos at youtube.com/playlist?list=… Lecture notes & other course materials at robots-that-learn.github.io
Happy to share these exciting new results on video synthesis of humans in movement. Arguably, these establish the power of having explicit 3D representations. Popular video generation models like Sora don't do that, making it hard for the resulting video to be 4D consistent.
I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human…
Touche', Sergey!
Lots of memorable quotes from @JitendraMalikCV at CoRL, the most significant one of course is: “I believe that Physical Intelligence is essential to AI” :) I did warn you Jitendra that out of context quotes are fair game. Some liberties taken wrt capitalization.
Fun collaboration w/ @antoniloq, @carlo_sferrazza, @HaozhiQ, @jane_h_wu, @pabbeel, @JitendraMalikCV Checkout our paper at arxiv.org/pdf/2409.08273. We release code at github.com/hgaurav2k/hop.
Please see the website for more details. synNsync🪩is joint work with my awesome ✨co-authors✨: @LeaMue27 @brjathu @geopavlakos @shiryginosar @akanazawa @JitendraMalikCV Website🖥️: von31.github.io/synNsync/ Data💾: github.com/Von31/swing_da… Arxiv📜: arxiv.org/abs/2409.04440 🧵6/6
Autoregressive modeling is not just for language, it can equally be used to model human behavior. This paper shows how..
Please see the website for more details. synNsync🪩is joint work with my awesome ✨co-authors✨: @LeaMue27 @brjathu @geopavlakos @shiryginosar @akanazawa @JitendraMalikCV Website🖥️: von31.github.io/synNsync/ Data💾: github.com/Von31/swing_da… Arxiv📜: arxiv.org/abs/2409.04440 🧵6/6
10 years after DQN, what are deep RL’s impacts on robotics? Which robotic problems have seen the most thrilling real-world successes thanks to DRL? Where do we still need to push the boundaries, and how? Our latest survey explores these questions! Read on for more details. 👇
It was great to work with Karttikeya Mangalam, @andrea_bajcsy and @JitendraMalikCV! Project: neerja.me/atp_latent_cor… Arxiv: arxiv.org/abs/2312.06653
Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies? Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌 toruowo.github.io/hato/
Another success of sim-to-real for training robot policies! This task, using two multi-fingered hands, requires considerable dexterity, and is hopefully representative of other household tasks that we wish to solve in the future.
Achieving bimanual dexterity with RL + Sim2Real! toruowo.github.io/bimanual-twist/ TLDR - We train two robot hands to twist bottle lids using deep RL followed by sim-to-real. A single policy trained with simple simulated bottles can generalize to drastically different real-world objects.
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Check out our robot walking in San Francisco (Ilija Radosavovic et al) …anoid-next-token-prediction.github.io
Want to make your photorealistic 3D avatar dance like your favorite actor? Check this out!
Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)💡 Why is 3DHM unique? With 3D Control, 3DHM can animate a 𝗿𝗮𝗻𝗱𝗼𝗺 human photo with 𝗮𝗻𝘆 poses in a 𝟯𝟲𝟬-𝗱𝗲𝗴𝗿𝗲𝗲 camera view and 𝗮𝗻𝘆 camera azimuths from 𝗮𝗻𝘆 video!
Together with the Ego4D consortium, today we're releasing Ego-Exo4D, the largest ever public dataset of its kind to support research on video learning & multimodal perception — including 1,400+ hours of videos of skilled human activities. Download ➡️ bit.ly/3teP49w
Happy to present LVM (Large Vision Model). Scalable and tasks can be specified via prompts. Enjoy!
More information please check: arxiv.org/abs/2312.00785 and yutongbai.com/lvm.html
How far can we go with vision alone? Excited to reveal our Large Vision Model! Trained with 420B tokens, effective scalability, and enabling new avenues in vision tasks! (1/N) Kudos to @younggeng @Karttikeya_m @_amirbar, @YuilleAlan Trevor Darrell @JitendraMalikCV Alyosha Efros!
Every CV guy I know has privately admitted at some point that current video datasets do not really seem to care about time. That the video tasks are "too short" & don't test much time understanding We introduce EgoSchema -- A litmus test for truly long-form video understanding