Max Fu
@letian_fu
scaling robotics. Intern @NVIDIA. PhD student @UCBerkeley @berkeley_ai. Prev @Apple @autodesk
Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…
User simulators bridge RL with real-world interaction // jessylin.com/2025/07/10/use… How do we get the RL paradigm to work on tasks beyond math & code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ z-lab.ai/projects/spars…
We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough…
Waymo in a new blog post: "We conducted a comprehensive study using Waymo’s internal dataset. Spanning 500,000 hours of driving, it is significantly larger than any dataset used in previous scaling studies in the AV domain. Our study uncovered the following: • Similar to LLMs,…
A legged mobile manipulator trained to play badminton with humans coordinates whole-body maneuvers and onboard perception. Paper: science.org/doi/10.1126/sc……Video: youtu.be/zYuxOVQXVt8 @Yuntao144, Andrei Cramariuc, Farbod Farshidian, Marco Hutter
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
We hope everyone had a great time at the ICRA 2025 Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation (contact-rich.github.io)! Big thanks to our incredible speakers, panelists, and generous sponsors — and most of all, to our amazing co-organizers…
Learning 🤝 Model-Based Method See you tomorrow at ICRA! GWCC Building A, Room 412 1:30 PM - 6:00 PM
Excited to organize Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation @ ICRA 2025! We welcome submissions on a range of topics—check out our website for details: contact-rich.github.io Join us for an incredible lineup of speakers! #ICRA2025
🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose…
Multimodal model support is here in 0.7! Ollama now supports multimodal models via its new engine. Cool vision models to try👇 - Llama 4 Scout & Maverick - Gemma 3 - Qwen 2.5 VL - Mistral Small 3.1 and more 😍 Blog post 🧵👇
Interested in collecting robot training data without robots in the loop? 🦾 Check out this cool new approach that uses a single mobile device scan and a human demo video to generate diverse data for training diffusion and VLA manipulation policies. 🚀 Great work by @letian_fu…
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
Large language models can do new tasks from a few text prompts. What if robots could do the same—with trajectories? 🤖 ICRT enables zero-shot imitation: prompt with a few teleop demos, and it acts—no fine-tuning. Happy to chat more at ICRA! 📍 ICRA | Wed 21 May | 08:35 - 08:40…
Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. icrt.dev [1/N]
Next challenge: scalable learning of robot manipulation skills from truly in-the-wild videos, such as YouTube!
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
Can we scale up robot data collection without a robot? We propose a pipeline to scale robot dataset from one human demonstration. Through a real2render2real pipeline, policies trained with the generated data can be deployed directly on a real robot.
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
Ppl are collecting large-scale teleoperation datasets, which are often just kinematics-level trajectories. Real2Render2Real is a new framework that can generate these data w.o. teleoperation or tricky sim+rl. High data quality for BC + nice scaling effect, plz dive in for more!
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
As we all know, collecting data for robotics is very costly. This is why I’m very impressed by this work: it generates a huge amount of data for different robots without any teleoperation.
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…