Zubair Irshad
@mzubairirshad
Research Scientist @ToyotaResearch | PhD in AI and DL @GeorgiaTech | Researching Large Behavioral Models | 3D Vision | Robotics
🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵
Teleoperation is slow, expensive, and difficult to scale. So how can we train our robots instead? Introducing X-Sim: a real-to-sim-to-real framework that trains image-based policies 1) learned entirely in simulation 2) using rewards from human videos. portal-cornell.github.io/X-Sim
How can we achieve both common sense understanding that can deal with varying levels of ambiguity in language and dextrous manipulation? Check out CodeDiffuser, a really neat work that bridges Code Gen with a 3D Diffusion Policy! This was a fun project with cool experiments! 🤖
🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)
#CVPR2025 starts in two days, and can’t wait to share our new work! 🎉 We present ZeroGrasp, a unified framework for 3D reconstruction and grasp prediction that generalizes to unseen objects. Paper📄: arxiv.org/abs/2504.10857 Webpage🌐:sh8.io/#/zerograsp (1/4 🧵)
@CVPR next week will be an exciting one! Check out our work below on VLMs, VLAs, and 3D for robotics (including the first 3D VLMs for Robotics workshop)! @ICatGT @mlatgt
Ever want to reconstruct and animate everyday articulated objects with no 3D scans or category priors? 🚀Introducing SplArt: Articulation Estimation & Part-Level Reconstruction with 3D Gaussian Splatting! #3Dvision #GaussianSplatting
Interested in collecting robot training data without robots in the loop? 🦾 Check out this cool new approach that uses a single mobile device scan and a human demo video to generate diverse data for training diffusion and VLA manipulation policies. 🚀 Great work by @letian_fu…
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
FastMap: Revisiting Dense and Scalable Structure from Motion TL;DR: 2 orders of magnitude faster than GLOMAP; many GPU implementations; linear complexity for optimisation; comparable accuracy
FastMap: Revisiting Dense and Scalable Structure from Motion "FASTMAP, a redesigned SfM framework, achieves fast, high-accuracy dense structure from motion. On large scenes with thousands of images, FASTMAP is up to one to two orders of magnitude faster than GLOMAP and COLMAP.…
FastMap: Revisiting Dense and Scalable Structure from Motion Jiahao Li, @__whc__, @mzubairirshad, @vslevic, Matthew R. Walter, Vitor Campagnolo Guizilini, @gregshakh tl;dr: replace BA with epipolar error+IRLS; fully PyTorch implementation arxiv.org/abs/2505.04612