Akash Sharma
@akashshrm02
PhD candidate @CMU_Robotics | Visiting researcher @AIatMeta Interested in multimodal robot perception (vision, touch, audio)
Robots need touch for human-like hands to reach the goal of general manipulation. However, approaches today don’t use tactile sensing or use specific architectures per tactile task. Can 1 model improve many tactile tasks? 🌟Introducing Sparsh-skin: tinyurl.com/y935wz5c 1/6
Research arc: ⏪ 2 yrs ago, we introduced VRB: learning from hours of human videos to cut down teleop (Gibson🙏) ▶️ Today, we explore a wilder path: robots deployed with no teleop, no human demos, no affordances. Just raw video generation magic 🙏 Day 1 of faculty life done! 😉…
🚀 Introducing RIGVid: Robots Imitating Generated Videos! Robots can now perform complex tasks—pouring, wiping, mixing—just by imitating generated videos, purely zero-shot! No teleop. No OpenX/DROID/Ego4D. No videos of human demonstrations. Only AI generated video demos 🧵👇
6. Better Touch: pixels are not enough. Precision, contact-based tasks really need tactile understanding. How do we make cheap and repeatable tactile sensors? How do we fuse touch with vision / action models?
Very interesting work! And great to see self-supervised learning being used for tactile data. This is critical to scaling tactile to the level that vision has scaled.
Robots need touch for human-like hands to reach the goal of general manipulation. However, approaches today don’t use tactile sensing or use specific architectures per tactile task. Can 1 model improve many tactile tasks? 🌟Introducing Sparsh-skin: tinyurl.com/y935wz5c 1/6
Cool work! Diffusion models have been used for out and in painting 2d images already. Quite smart to use these priors for automatically completing unseen regions in dynamic scenes. I can see this being a quick boost for many robotics applications!
Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…
Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos @kaihuac5, @tarashakhurana, @RamananDeva tl;dr: in title; fine-tune CogVideoX->train 2D video-inpainter arxiv.org/abs/2507.12646
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
Checkout DemoDiffusion from @mangahomanga! The key idea is quite simple: denoise from human trajectories instead of random noise! I am hopeful to see this scaled up to more complex embodiments!
Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation *without* requiring any paired human-robot data or online RL 🙂 1/n
The secret behind this demo by @Suddhus #RSS2025 Keypoints, object poses, defined grasps and MPC
Why build a humanoid robot? Because the world is designed for humans, including all the best Halloween costumes!
My amazing partner @tarashakhurana is presenting at Poster #174! #CVPR2025 Go check it out!

1/6 🚀 Excited to share that BrainNRDS has been accepted as an oral at #CVPR2025! We decode motion from fMRI activity and use it to generate realistic reconstructions of videos people watched, outperforming strong existing baselines like MindVideo and Stable Video Diffusion.🧠🎥
I’m looking for PhD students for the 2026 cycle! If you’re @CVPR and think we might be a good fit, come say hi or send me an email with [CVPR2025] in the subject line so that I don’t miss it. #CVPR2025
I’m thrilled to share that I will be joining Johns Hopkins University’s Department of Computer Science (@JHUCompSci, @HopkinsDSAI) as an Assistant Professor this fall.
Just open-sourced Geometric Retargeting (GeoRT) — the kinematic retargeting module behind DexterityGen. Includes tools for importing custom hands. Give it a try: github.com/facebookresear… A software by @berkeley_ai and @AIatMeta. More coming soon.
Introducing UFM, a Unified Flow & Matching model, which for the first time shows that the unification of optical flow and image matching tasks is mutually beneficial and achieves SOTA. Check out UFM’s matching in action below! 👇 🌐 Website: uniflowmatch.github.io 🧵👇
I'll be at @CVPR this week -- organizing the Workshop on 4D Vision, attending Doctoral Consortium and presenting one of our recent works (poster 174 in session 5)! I'm also actively looking for Research Scientist roles. Happy to chat! CV is here: cs.cmu.edu/~tkhurana/pdf/…
I will be at #CVPR2025! Looking forward to catch up with colleagues, talk about SSL, robots, tactile sensing, world models and all else in between! I'm also starting to look for Research Scientist roles starting 2026. Details at akashsharma02.github.io Hmu if interested!
If you're finishing your camera-ready for ACL (#acl2025nlp) or ICML (#icml2025 ) and want to cite co-first authors more fairly, I just made a simple fix to do this! Just add $^*$ to the authors' names in your bibtex, and the citations should change :) github.com/tpimentelms/ac…
Exciting to see exo-skeleton hands make tactile sensing a first class citizen! In the limit however, it will be interesting to see how human glove data is retargeted to robot tactile sensors! 😉
Meet the newest member of the UMI family: DexUMI! Designed for intuitive data collection — and it fixes a few things the original UMI couldn’t handle: 🖐️ Supports multi-finger dexterous hands — tested on both under- and fully-actuated types 🧂 Records tactile info — it can tell…
I'm a featured interview in our latest behind-the-scenes release! We break down the ML and perception that drives the whole-body manipulation behaviors from last year. It starts with a neat demo of Atlas's range-of-motion and our vision foundation models. youtu.be/oe1dke3Cf7I?si…