Hongsuk Benjamin Choi
@redstone_hong
Humanoid robots learn from humans 🙌
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)
Full episode dropping soon! Geeking out with @arthurallshire @redstone_hong on VideoMimic videomimic.net Co-hosted by @chris_j_paxton @micoolcho
🕹️
Full episode dropping soon! Geeking out with @arthurallshire @redstone_hong on VideoMimic videomimic.net Co-hosted by @chris_j_paxton @micoolcho
Five Golden Nuggets from this talk: 1. Pretrained finetuned vs single task policy: - Because of training on different tasks the pretrained policy has more recovery behaviours. - Somehow the visual-action mapping across different tasks and environments leads to this behaviour.…
Wow, thanks Ted! I could spend a week on this video from @RussTedrake - easily one of the most dense learning material for anyone interested in Robotics.
Sharing a few more demos of our first #LargeBehaviorModel (LBM) at TRI. 1/🍎This model enables a robot to learned to core and cut an apple into multiple slices autonomously. We trained our diffusion-based LBM on almost 1,700 hours of robot data, conducted 1,800 real-world…
Congrats, OpenAI Team! And yes, you win our bet, Sanjeev. FYI for the curious: I had bet against "Some AI model will be able to perform at IMO gold level by May 1 2026".
For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/
Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!
Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: arxiv.org/abs/2502.02538 Project page: seohong.me/projects/fql/ Thread ↓
Ep#20 with @arthurallshire @redstone_hong on VideoMimic videomimic.net Co-hosted by @chris_j_paxton @micoolcho
This is a great compliment! Our real-to-sim code is now available. It can recover both the environment and target motion from videos. github.com/hongsukchoi/Vi…
VideoMimic is genuinely really inspiring work. Interacting with terrain is hard, and doing it from just a couple videos is really genuinely impressive.
Action chunking is a great idea in robotics: by getting a model to produce a short sequence of actions, it _just works better_ for some mysterious reason. Now it turns out this can help in RL too, and it's a bit clearer why: action chunks help explore and help with backups. 🧵👇
If you're interested in working on learning from simulation, direct from perception and training at scale for multi-fingered hands please reach out to me and my colleagues. We are looking for talented researchers who have worked or have experience working on those problems.
The Dex team at NVIDIA is defining the bleeding edge of sim2real dexterity. Take a look below 🧵 There's a lot happening at NVIDIA in robotics, and we’re looking for good people! Reach out if you're interested. We have some big things brewing (and scaling :)
🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough…
ummm… As a robotics PhD student, I’m genuinely worried that the problem I find important now will be solved in the next 2 years—by MORE DATA, without any need to understand the underlying structure. And this happens in many areas😂
“As a PHD student, your job is not publishing a paper every quarter. Focus on a problem in deep understanding and solve it in years under the protect of your adviser” from @RussTedrake #RSS2025
Amazing @taeinkwon1
EgoPressure at @CVPR 2025 (Highlight)! w/ @_yimzhao_ , @taeinkwon1 , @mapo1 , @cholz 🌍 Project page: yiming-zhao.github.io/EgoPressure/ 📰 Paper: openaccess.thecvf.com/content/CVPR20… 📍Come visit us during our poster session in ExHall D Poster 149, 15 Jun, 16:00 CDT — 18:00 CDT!
Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.
Cudos to VideoMimic team:)
Great work with super cool avatar reconstruction &animation demos from @w_zielonka ,and humanoid robot mimics from videos @redstone_hong at POETs workshop #CVPR