Xavier Puig
@xavierpuigf
Research Scientist at FAIR @AIatMeta working on EmbodiedAI | PhD @MIT_CSAIL
Thrilled to announce Habitat 3.0, an Embodied AI simulator to study human-robot interaction at scale! Habitat 3.0 is designed to train and evaluate agents to perform tasks along with humans, it includes: - Humanoid simulation - Human interaction tools - Multi-agent benchmarks 1/6
Today we’re announcing Habitat 3.0, Habitat Synthetic Scenes Dataset and HomeRobot — three major advancements in the development of social embodied AI agents that can cooperate with and assist humans in daily tasks. More details on these announcements ➡️ bit.ly/3tIVbmj
Check out our workshop on Continual Robot Learning from Humans, at #RSS2025, with amazing speakers covering topics including learning from human visual demonstrations, generative models for continual robot learning or the role of LLMs in embodied contexts …-robot-learning-from-humans.github.io
The #RSS2025 Workshop on Continual Robot Learning from Humans is happening on June 21. We have an amazing lineup of speakers discussing how we can enable robots to acquire new skills and knowledge from humans continuously. Join us in person and on Zoom (info on our website)!
🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)
🚀 Excited to introduce SimWorld: an embodied simulator for infinite photorealistic world generation 🏙️ populated with diverse agents 🤖 If you are at #CVPR2025, come check out the live demo 👇 Jun 14, 12:00-1:00 pm at JHU booth, ExHall B Jun 15, 10:30 am-12:30 pm, #7, ExHall B
I will be talking at the #CVPR2025 workshop on Humanoid Agents, tomorrow June 11th at 9:30 am. I will discuss how humanoid agents can help us improve human-robot collaboration. See you there! humanoid-agents.github.io

I'll be giving two talks at the #CVPR2025 workshops: 3D LLM/VLA 3d-llm-vla.github.io and POETS poets2024.github.io/poets2025/. 🧵
DexMachina lets us perform a functional comparison between different dexterous hands: we evaluate 6 hands on 4 challenging long-horizon tasks, and found that larger, fully actuated hands learn better and faster, and high DoF is more important than having human-like hand sizes –…
I will be at ICLR to present PARTNR. Reach out if you want to talk about our work at FAIR or interesting problems in Robotics!
We released PARTNR, the largest benchmark to study human-robot collaboration in households, with +100K natural language tasks! PARTNR tests agents in key capabilities including: 🔍 Perceiving dynamic environments 🎯 Task planning and skill execution 🤝 Coordination with humans
🚨New Preprint 🚨 Embodied agents 🤖 operating in indoor environments must interpret ambiguous and under-specified human instructions. A capable household robot 🤖 should recognize ambiguity and ask relevant clarification questions to infer the user🧑🚒 intent accurately, leading…
How do we enable agents to perform tasks even when these are underspecified? In this work, led by @RamRamrakhya, we train VLA agents via RL to decide when to act in the environment or ask clarifying questions, enabling them to handle ambiguous instructions ram81.github.io/projects/ask-t…
🚨New Preprint 🚨 Embodied agents 🤖 operating in indoor environments must interpret ambiguous and under-specified human instructions. A capable household robot 🤖 should recognize ambiguity and ask relevant clarification questions to infer the user🧑🚒 intent accurately, leading…
How to achieve human-level open-ended machine Theory of Mind? Introducing #AutoToM: a fully automated and open-ended ToM reasoning method combining the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving SOTA results across five benchmarks. 🧵[1/n]
Meta PARTNR is a benchmark for planning and reasoning in embodied multi-agent tasks. This large-scale human and robot collaboration benchmark was core to our recent demos and also informs our work as scientists and engineers pushing this field of study forward.
The trained policy can be integrated with a high-level planner for real-world applications. By combining our object manipulation policy with user commands, we demonstrate its effectiveness in real-world scenarios—such as moving large trash carts. (6/8)
🪑How do you train robots to move furniture? This requires robots to synchronize whole-body movements, making teleoperation or RL approaches challenging. Check out this amazing work by @SniperPaper, using human demonstrations to train robots to move furniture in the real world!
We've seen robots move like our favorite athletes. We've watched them fold clothes and do the dishes. Now, it's time for robots to help you move furniture. Introducing RobotMover—a learning framework that enables robots to acquire object-moving skills from human demonstrations.…
This is the mobile manipulation I want to see. You can only get this via RL.
🤖 Introducing Human-Object Interaction from Human-Level Instructions! First complete system that generates physically plausible, long-horizon human-object interactions with finger motions in contextual environments, driven by human-level instructions. 🔍 Our approach: - LLMs…
[NeurIPS D&B Oral] Embodied Agent Interface: Benchmarking LLMs for Embodied Agents A single line of code to evaluate your model! 🌟Standardize Goal Specifications: LTL 🌟Standardize Modules and Interfaces: 4 modules, 438 tasks, 1475 goals 🌟Standardize Fine-grained Metrics: 18…
Additionally, looking towards the future, we’re releasing PARTNR: a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. Built on Habitat 3.0, it’s the largest benchmark of its kind to study and evaluate human-robot collaboration in household activities By…