Sirui Xu
@xu_sirui
Ph.D. student @IllinoisCDS | Prev. Research intern @NVIDIA B.S. @PKU1898 | Vision/Graphics
Interested in: ✅Humanoids mastering scalable motor skills for everyday interactions ✅Whole-body loco-manipulation w/ diverse tasks and objects ✅Physically plausible HOI animation Meet InterMimic #CVPR2025 ArXiv: arxiv.org/abs/2502.20390 Project: sirui-xu.github.io/InterMimic 🧵[1/9]
How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.
Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound…
🎶Can a robot learn to play music? YES! — by teaching itself, one beat at a time 🎼 🥁Introducing Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming 🤖 🔍 For details, check out: robotdrummer.github.io
🔥 🔥 Want better motion generation? Start with better text. #SnapMoGen — a novel text-to-motion dataset built for expressive control and generalization. #AIGC #3DAnimation #AI #ComputerVision #AIResearch #3DMotion #MotionGeneration #TextToMotion
Excited to present our work “DAViD” at #ICCV2025! DAViD is a generative 4D human-object interaction model, which can generate novel HOI motions for various 3D objects, including multi-object interactions. (1/4)
🥰Our paper SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation, has been accepted to @ICCVConference 📷 Our framework can generate stylized, emotional, physically plausible, and long-term human-scene interaction motions.
🚀 Introducing LeVERB, the first 𝗹𝗮𝘁𝗲𝗻𝘁 𝘄𝗵𝗼𝗹𝗲-𝗯𝗼𝗱𝘆 𝗵𝘂𝗺𝗮𝗻𝗼𝗶𝗱 𝗩𝗟𝗔 (upper- & lower-body), trained on sim data and zero-shot deployed. Addressing interactive tasks: navigation, sitting, locomotion with verbal instruction. 🧵 ember-lab-berkeley.github.io/LeVERB-Website/
All forms of intelligence co-emerged with a body, except AI We're building a #future where AI evolves as your lifelike digital twin to assist your needs across health, sports, daily life, creativity, & beyond... myolab.ai ➡️ Preview your first #HumanEmbodiedAI
Check out 🌟Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry & Physics for Mesh-Free Simulation #CVPR2025, from @LingjieLiu1’s lab at UPenn. Congrats to @MorPhLingXD! Vid2Sim aims to achieve system identification by reconstructing geometry, appearance,…
🚀Introducing GMT — a general motion tracking framework that enables high-fidelity motion tracking on humanoid robots by training a single policy from large, unstructured human motion datasets. 🤖A step toward general humanoid controllers. Project Website:…
Excited to present BimArt at #CVPR2025🎉 June 11 • Spotlight talk@Humanoid Agents (10:30–11:00, Room 101D) • Poster (ExHallD #127): 15:00–15:30 June 12 • Poster@3D Human Understanding (ExHallD #323) 15:30–16:30 June 15 • Main Conf Poster (ExHallD#146) 17:00–19:00
Join us tomorrow for the 1st Workshop on Humanoid Agents! We have an exciting lineup: @xiaolonw @xavierpuigf @GuanyaShi @GerardPonsMoll1 @blacksquirrel__ @tianminshu @petitegeek @xbpeng4 📍 Room 101 D, Music City Center 🔗 humanoid-agents.github.io @CVPR #CVPR2025
Join us for our workshop: Agents in Interaction, from Humans to Robots, on June 12th at 9:25 am, Room 213! We have an exciting line of speakers from both robotics and digital humans. Please come! @CVPR More info: agents-in-interactions.github.io
Attending #CVPR2025 6/11 to 6/15! DM me if you want to chat about 𝘃𝗶𝘀𝘂𝗮𝗹 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴, 𝘀𝗽𝗮𝘁𝗶𝗮𝗹 𝗶𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲, 𝗲𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗮𝗴𝗲𝗻𝘁, or 𝘃𝗹𝗺/𝘃𝗹𝗮. 𝗔𝗿𝗴𝘂𝘀: visual-cot reasoning, Sat 10:30-12:30 (#346) yunzeman.github.io/argus 𝗢𝗥𝗚:…
✈️Off to #CVPR2025! Posters: InterAct: Fri 14-16(#162) sirui-xu.github.io/InterAct InterMimic(highlight): Sat 10:30-12:30(#155) sirui-xu.github.io/InterMimic Talks: Humanoid Agents: Wed 10:30 Agents in Interaction: Thu 14:45 Excited to catch up with friends—old and new—and chat!🚀
Why does 3D human-object reconstruction fail in the wild or get limited to a few object classes? A key missing piece is accurate 3D contact. InteractVLM (#CVPR2025) uses foundational models to infer contact on humans & objects, improving reconstruction from a single image. (1/10)
How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.
🤖Can a humanoid robot carry a full cup of beer without spilling while walking 🍺? Hold My Beer ! Introducing Hold My Beer🍺: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control Project: lecar-lab.github.io/SoFTA/ See more details below👇
Who can stop this guy🤭? Highly robust basketball skills powered by our #SIGGRAPH2025 work Project page: ingrid789.github.io/SkillMimicV2/ There are more cases for enhancing general interaction and locomotion skills!