Michael Ryoo

@ryoo_michael

prof. with Stony Brook Univ. / research scientist with Salesforce AI Research

Joined October 2021

70Following

350Followers

Pinned

Michael Ryoo Retweeted

AK@_akhaliq · Aug 19

Salesforce presents xGen-MM (BLIP-3) A Family of Open Large Multimodal Models discuss: huggingface.co/papers/2408.08… This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated…

309

124

46.0K

Michael Ryoo@ryoo_michael · May 24

What we end up having at CoRL 2025 will depend on the result.

CConference on Robot Learning@corl_conf · May 24

#CoRL2025 poll: If there is a K-Pop performance by a Korean idol group at the banquet, would you enjoy it?

412

Michael Ryoo Retweeted

Conference on Robot Learning@corl_conf · Apr 16

#CoRL2025 Hey Robot Learning Community! CoRL 2025 will be held in Seoul, Korea, Sep 27 - 30. Submission deadline: Apr 30 AoE. It's two weeks to go! Information: corl.org We are excited to receive your great work on robot learning!

6.0K

Michael Ryoo@ryoo_michael · Feb 3

LLaRA will appear at #ICLR2025 !! It is an efficient transformation of a VLM into a robot VLA. For more details: github.com/LostXine/LLaRA

XXiang Li@XiangLi54505720 · Feb 3

(1/5) Excited to present our #ICLR2025 paper, LLaRA, at NYC CV Day! LLaRA efficiently transforms a pretrained Vision-Language Model (VLM) into a robot Vision-Language-Action (VLA) policy, even with a limited amount of training data. More details are in the thread. ⬇️

2.0K

Michael Ryoo Retweeted

Salesforce AI Research@SFResearch · Jan 15

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on @huggingface! Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out! 🤗32 Token Model: bit.ly/3PBNBBz 🤗128…

2.0K

Michael Ryoo@ryoo_michael · Oct 22

BLIP-3-Video is out!

SSalesforce AI Research@SFResearch · Oct 22

📢📢📢Introducing xGen-MM-Vid (BLIP-3-Video)! This highly efficient multimodal language model is laser-focused on video understanding. Compared to other models, xGen-MM-Vid represents a video with a fraction of the visual tokens (e.g., 32 vs. 4608 tokens). Paper:…

980

Michael Ryoo@ryoo_michael · Jul 1, 2024

Introducing LLaRA !!! github.com/LostXine/LLaRA It's a new robot action model, dataset, and framework based on LLMs/VLMs. It's opensource and trainable at an academic scale (7B LLaVA-based), so you can finetune it for your robotics task!

XXiang Li@XiangLi54505720 · Jul 1, 2024

🚀 Excited to share our latest project: LLaRA - Supercharging Robot Learning Data for Vision-Language Policy! 🤖✨ We create a framework to turn robot expert trajectories into conversation-style data and other auxiliary data for instruction tuning. More details to come! (1/N)

2.0K

Michael Ryoo Retweeted

Google DeepMind@GoogleDeepMind · Jul 28, 2023

Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖 It learns from both web and robotics data and translates this knowledge into generalised instructions. Find out more: dpmd.ai/introducing-rt2

441

2.0K

277

536.0K

Michael Ryoo Retweeted

Karol Hausman@hausman_k · Jul 28, 2023

PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions? Introducing RT-2: robotics-transformer2.github.io our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions!

115

581

231

183.0K

Michael Ryoo@ryoo_michael · Jul 12, 2023

"Diffusion Illusions: Hiding Images in Plain Sight" received #CVPR2023 Outstanding Demo Award. diffusionillusions.com Congratulations @RyanBurgert @kahnchana @XiangLi54505720!

3.0K

Michael Ryoo Retweeted

Ted Xiao@xiao_ted · Jul 5, 2023

Looking forward to showcasing one of the first foundation models for robotics at #RSS2023 next week! Presenting "RT-1: Robotics Transformer for Real-world Control at Scale" from the Google DeepMind robotics team. Website: robotics-transformer.github.io Session: Tuesday 7/12, 3PM-5PM

9.0K

Michael Ryoo Retweeted

Xiang Li@XiangLi54505720 · Jul 6, 2023

Introducing Crossway Diffusion, a diffusion-based visuomotor policy taking advantage of SSL. In short: we add state decoders to reconstruct states during training diffusion policy and it works better. More at: arxiv.org/abs/2307.01849

839