Roei Herzig
@roeiherzig
Researcher @IBMResearch. Postdoc @berkeley_ai. PhD @TelAvivUni. Working on Compositionality, Multimodal Foundation Models, and Structured Physical Intelligence.
What happens when vision🤝 robotics meet? Happy to share our new work on Pretraining Robotic Foundational Models!🔥 ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better robotic model. @berkeley_ai😊
Yes!🥳
It was nice engaging with the CV community on ways to stand out in the crowd. My answer was simple: work on robotics. There are so many unanswered problems and open pastures for research if you are a new researcher. Below are 6 problems I focussed on in my talk.
Thanks @IlirAliu_ for highlighting our work!🙌 🌐 Project page: arm4r.github.io 🔗 Code: github.com/Dantong88/arm4r More exciting projects on the way—stay tuned!🤖
Robots usually need tons of labeled data to learn precise actions. What if they could learn control skills directly from human videos… no labels needed? Robotics pretraining just took a BIG jump forward. A new Autoregressive Robotic Model, learns low-level 4D representations…
🚀 Our code for ARM4R is now released! Check it out here 👉 github.com/Dantong88/arm4r
What happens when vision🤝 robotics meet? Happy to share our new work on Pretraining Robotic Foundational Models!🔥 ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better robotic model. @berkeley_ai😊
🚀 Excited that our ARM4R paper will be presented next week at #ICML2025! If you’re into 4D representations and robotic particle-based representations, don’t miss it! 🤖✨ I won’t be there in person, but make sure to stop by and chat with Yuvan! 🙌
What happens when vision🤝 robotics meet? Happy to share our new work on Pretraining Robotic Foundational Models!🔥 ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better robotic model. @berkeley_ai😊
Love the core message here! Predictions ≠ World Models. Predictions are task-specific, but world models can generalize across many tasks.
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
Final chances to #ICCV2025 🌴🌺 Submit your best work to the MMFM @ ICCV workshop on all things multimodal: vision, language, audio and more. 🗓️ Deadline: July 1 🔗 openreview.net/group?id=thecv…
🚨 Rough luck with your #ICCV2025 submission? We’re organizing the 4th Workshop on What’s Next in Multimodal Foundation Models at @ICCVConference in Honolulu 🌺🌴 Send us your work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo…
🚨 Rough luck with your #ICCV2025 submission? We’re organizing the 4th Workshop on What’s Next in Multimodal Foundation Models at @ICCVConference in Honolulu 🌺🌴 Send us your work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo…
Re "vision researchers move to robotics"-they’re just returning to the field’s roots. Computer vision began as "robotic vision", focused on agents perceiving & interacting within the world. The shift to "internet vision" came later with the rise of online 2D data. Org CV book👇
”why are so many vision / learning researchers moving to robotics?” Keynote from @trevordarrell #RSS2025
Overall, I think the move from CMT to OpenReview was a great decision. Now if only we can improve the paper-reviewer matching system!
🚀 Excited to share that our latest work on Sparse Attention Vectors (SAVs) has been accepted to @ICCVConference — see you all in Hawaii! 🌸🌴 🎉 SAVs is a finetuning-free method leveraging sparse attention heads in LMMs as powerful representations for VL classification tasks.
🎯 Introducing Sparse Attention Vectors (SAVs): A breakthrough method for extracting powerful multimodal features from Large Multimodal Models (LMMs). SAVs enable SOTA performance on discriminative vision-language tasks (classification, safety alignment, etc.)! Links in replies!…
Honored to be named an 𝐎𝐮𝐭𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐑𝐞𝐯𝐢𝐞𝐰𝐞𝐫 for 𝐂𝐕𝐏𝐑 𝟐𝟎𝟐𝟓 !🎉 Grateful to contribute to the community and support the high standards of the conference. Maybe it's time to start thinking about AC-ing? 🙃 #CVPR2025 @CVPR
Got a big, bold question? Let me know! Open to your questions—ambitious ones especially! 🤖💬
🚨 Our panel kicks off at 11:30 AM in Room 207 A–D (Level 2)! Don't miss an amazing discussion with: Ludwig Schmidt, Andrew Owens, Arsha Nagrani, and Ani Kembhavi 🔥
Come hear @NagraniArsha speak tomorrow at the 3rd Workshop on “What is Next in Multimodal Foundation Models?” 🗓️ 9:05 AM — Talk 🗓️ 11:30 AM — Panel 📍 Room 207 A–D (Level 2) Don’t miss it! #CVPR2025 @CVPR
After a one year conference hiatus, it’s nice to be back at @CVPR! Come say hi if you are around. I’ll be speaking at the MMVM3 workshop at 9am and the EgoVis workshop at 4pm tomorrow (Thursday).
🎉 Excited to speak at the Agents in Interaction workshop at #CVPR2025 — featuring an incredible lineup of speakers! Come hear about our latest work on 𝑺𝒕𝒓𝒖𝒄𝒕𝒖𝒓𝒆𝒅 𝑷𝒉𝒚𝒔𝒊𝒄𝒂𝒍 𝑰𝒏𝒕𝒆𝒍𝒍𝒊𝒈𝒆𝒏𝒄𝒆🗓 Thursday, 2:30 PM 📍 Room 213 Don’t miss it!
Join us for our workshop: Agents in Interaction, from Humans to Robots, on June 12th at 9:25 am, Room 213! We have an exciting line of speakers from both robotics and digital humans. Please come! @CVPR More info: agents-in-interactions.github.io
@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V
Submit your paper to our Multimodal Foundation Models (MMFM) Workshop at ICCV in Honolulu, Hawaii
🚀 Call for Papers! 🚀 Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i 🌺 Submit work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo… #MMFM4 #ICCV2025 #AI #multimodal