Siddhant Haldar
@haldar_siddhant
Excited about generalizing AI | PhD student @CILVRatNYU | Undergrad @IITKgp
The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.
It is difficult to get robots to be both precise and general. We just released a new technique for precise manipulation that achieves millimeter-level precision while being robust to large visual variations. The key is a careful combination of visuo-tactile learning and RL. 🧵👇
Generalization needs data. But data collection is hard for precise tasks like plugging USBs, swiping cards, inserting plugs, and keying locks. Introducing robust, precise VisuoTactile Local (ViTaL) policies: >90% success rates from just 30 demos and 45 min of real-world RL.🧶⬇️
Is "scaling is all you need" the right path for robotics? Announcing our @corl_conf workshop on "Resource-Rational Robot Learning", where we will explore how to build efficient intelligent systems that learn & thrive under real-world constraints. Submission deadline: Aug 8 🧵
1/ 🚀 Announcing #GenPriors — the CoRL 2025 workshop on Generalizable Priors for Robot Manipulation! 📍 Seoul, Korea 📅 Sat 27 Sep 2025. Mark your calendars & join us for a full day of discussion on building generalist robot policies are those capable of performing…
Robots no longer have to choose between being precise or adaptable. This new method gives them both! [📍 Bookmark Paper & Code] ViTaL is a new method that teaches robots precise, contact-rich tasks that work in any scene. From your cluttered kitchen to a busy factory floor.…
🚀 With minimal data and a straightforward training setup, our VisualTactile Local Policy (ViTaL) fuses egocentric vision + tactile feedback to achieve millimeter-level precision & zero-shot generalization! 🤖✨ Details ▶️ vitalprecise.github.io
A nice pipeline: use a VLM to find objects in scene, get close, and use a well-constrained visuo-tactile policy to handle the last inch.
Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵