Ruijie Zheng
@ruijie_zheng12
Computer Science PhD student at University of Maryland Research Intern @GEAR Lab Ex-Intern: @MSFTResearch
Representation also matters for VLA models! Introducing FLARE: Robot Learning with Implicit World Modeling. With future latent alignment objective, FLARE significantly improves policy performance on multitask imitation learning & unlocks learning from egocentric human videos.

Anima introduced me to the world of graphical models, generative models, and tensors. As her first student, I witnessed how she built her lab from scratch, shaping its direction with vision and energy. Though I’m sure it was a challenging process, she never put any burden or…
I will be at #CVPR2025 tomorrow and thrilled to receive the IEEE Kiyo Tomiyasu award. See you all there !
Great minds think alike! 👀🧠 We also found that more thinking ≠ better reasoning. In our recent paper (arxiv.org/abs/2506.04210), we show how output variance creates the illusion of improvement—when in fact, it can hurt precision. Naïve test-time scaling needs a rethink. 👇…
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
ICML is coming to an end — side adventure: a group photo with my undergrad advisees (plus a 16 year old high schooler tagalong claiming he only learns through ChatGPT), and a rabbit park visit with the best team.
Provably Learning from Language Feedback TLDR: RL theory can help us do better inference-time exploration with feedback. Work done with @wanqiao_xu, @ruijie_zheng12, @chinganc_rl, @adityamodi94, @adith387 📰 arxiv.org/pdf/2506.10341 📍EXAIT Best Paper/Oral Sat 8:45-9:30 am
If you missed @wanqiao_xu’s presentation, here are some of our slides! (The workshop will post full slides later on their website) Paper: arxiv.org/abs/2506.10341
Best paper award goes to the team @ruijie_zheng12 @adith387 @adityamodi94 and @chinganc_rl, and best photography award goes to @chinganc_rl 🥳
As many of you know, I’ve been spending my sabbatical at Capital One with Bayan Bruss @cbbruss’s amazing team, under Prem Natarajan and Milind Naphade. Bayan has been the most supportive and visionary leader: deeply attentive to the technical work across the group. Excited to…
There’s been heated debate lately: Can generative AI truly self-improve? ✅Some say yes, pointing to models learning like curious humans. ❌Others say no, invoking the first law of thermodynamics: You can’t get something from nothing. No new info, no gain. 🧠 But what if the…
Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵
Thanks for the recording! It was a great workshop!
You cannot really train all these models to cater to different preferences. Can you have one model that caters to all? @furongh unveils a technique to customize AI models on-the-fly to user goals, reducing the computational cost of tailoring AI systems to individual needs.
Meet Casper👻, a friendly robot sidekick who shadows your day, decodes your intents on the fly, and lends a hand while you stay in control! Instead of passively receiving commands, what if a robot actively sense what you need in the background, and step in when confident? (1/n)
Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇
Training data collection is one of the biggest challenges in #Robotics. We're solving it with #NVIDIAIsaac GR00T-Dreams. By using #NVIDIACosmos world foundation models and generative AI, developers can create vast, realistic synthetic data at scale. Benefits: ✔️Cheaper &…
Check out GR00T N1.5, our latest GR00T robot foundation model. Compared with N1, N1.5 achieves much better cross embodiment posttraining performance and instruction following capability with a frozen VLM and several architecture & learning objective improvements.
#NVIDIAIsaac GR00T N1.5 is now accessible to #robotics developers working with a wide range of robot form factors, and available to download from @huggingface. 🎉 Dive into our step-by-step tutorial to learn how to easily post-train and adapt it to the LeRobot SO-101 arm, and…
#CVPR2025 Our latest work LEMON will appear at CVPR CVinW spotlight talks tomorrow - a unified, scalable, efficient 3D LMM with elegant architecture design. Everything about 🍋 will be open-sourced this month. 🔗: computer-vision-in-the-wild.github.io/cvpr-2025/
Excited to give a short oral presentation on FLARE at Computer Vision in Wild workshop tomorrow starting from 2pm!
Representation also matters for VLA models! Introducing FLARE: Robot Learning with Implicit World Modeling. With future latent alignment objective, FLARE significantly improves policy performance on multitask imitation learning & unlocks learning from egocentric human videos.
Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics…
Really cool blog post from Physical Intelligence! Also very similar to our "Async Inference" open sourced last week with SmolVLA arxiv.org/pdf/2506.01844 As always with research, if an idea comes up several times with different flavors, it means we are on the right track as a…
Real-time inference is a big challenge for VLAs. We’ve been working on a way to amortize inference delays in π0.5. Our new Real-Time Chunking (RTC) method speeds up π0.5 by allowing the robot to “think” while it’s moving, which makes it quite a bit faster! 🧵👇
Slow demo data generates fast policy execution!Check out our new work that speed up robots with simple entropy guidance! Btw, I think slow demo will exist for a long time because of the lack of widely adopted tactile feedback in teleop :(
Nowadays robot manipulation is often tardy and people speed up their robot videos for presentation. But can we accelerate robot execution in the real world, just like speeding up the videos? Introducing 𝑫𝒆𝒎𝒐𝑺𝒑𝒆𝒆𝒅𝒖𝒑⚡, a self-supervised method to accelerate…
In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky motions at worst. But large VLAs are slow by nature. What can we do about this? An in-depth 🧵:
🤖 VLAs are hot, but they don’t plan ahead. Meet FLARE 🔥: a simple yet powerful upgrade that injects implicit world modeling into VLA-based robot policies. By predicting future latent features, FLARE lets robots think ahead, boosting generalization & performance by up to 26%!…
Representation also matters for VLA models! Introducing FLARE: Robot Learning with Implicit World Modeling. With future latent alignment objective, FLARE significantly improves policy performance on multitask imitation learning & unlocks learning from egocentric human videos.