Rowan Zellers
@rown
thinking multimodally @thinkymachines. previously Advanced Voice Mode @openai. website: http://rowanzellers.com (he/him)
Excited to introduce GPT-4o. Language, vision, and sound -- all together and all in real time. This thing has been so much fun to work on. It's been even more fun to play with -- with moments of magic where things feel totally fluid and I forget I'm video chatting with an AI.
Excited to share that I joined @thinkymachines recently! It’s been an incredible experience so far working alongside many talented folks here. We are building multimodal AI that are collaborative with human, as well as a great research infra to accelerate AI and science!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
We are moving incredibly fast. Come light up GPUs with us.
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
Yes - 🥳 Thinky starts hiring again: thinkingmachines.paperform.co
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
We have been working hard for the past 6 months on what I believe is the most ambitious multimodal AI program in the world. It is fantastic to see how pieces of a system that previously seemed intractable just fall into place. Feeling so lucky to create the future with this…
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
It’s really fun to work with a talented yet small team. Our mission is ambitious - multimodal AI for collaborating with humans, so the best is yet to come! Join us— or fill out the application below if interested!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
If you’re excited to build the future of multimodal human/ai collaboration, and jam with Andrew, me, and many other talented people across the stack— dm me! 😀
life update: I joined @thinkymachines! feeling so lucky to build with such a kind, brilliant team, esp pairing with researchers early on as a designer. looking forward to sharing more soon.
life update: I joined @thinkymachines! feeling so lucky to build with such a kind, brilliant team, esp pairing with researchers early on as a designer. looking forward to sharing more soon.
🚀New from Meta FAIR: today we’re introducing Seamless Interaction, a research project dedicated to modeling interpersonal dynamics. The project features a family of audiovisual behavioral models, developed in collaboration with Meta’s Codec Avatars lab + Core AI lab, that…
TIL, the best (and perhaps only!!) way to speak to a human at Xfinity over phone is to say you're cancelling your service. everything else is an automated system ... that said, I learned this trick from o3, so I guess it's AI-versus-AI here
the Singapore MRT (subway) is so impressive. Many lines that go everywhere, high frequency of trains, plus it’s fully automated so it has smooth cross platform transfers. It’s safe and clean (no durians allowed). Open loop payments so you can pay by credit card…

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.
huge congrats to @bowenc0221 for showing image-manipulation works well for vision perception, and for carrying the project all the way through to the finish line!
"Thinking with Images" is what we have been cooking after GPT-4o launched last year and it marks a paradigm shift in how we view/solve perception problems in this new era of RL. It is such a pleasant and an honor to work with this amazing team to get it out!
openai.com/index/thinking… About two years ago, we started building V* to bring visual search into a multimodal LLM and show that it's a key part of how these models can understand the world. I still remember talking with my friend @bowenc0221 and @_alex_kirillov_ about why this…
🔍Introducing V*: exploring guided visual search in multimodal LLMs MLLMs like GPT4V & LLaVA are amazing, but one concern that keeps me up at night: the (frozen) visual encoder typically extracts global image tokens *only once*, regardless of resolution or scene complexity (1/n)
Exciting to share what i've been working on in the past few months! o3 and o4-mini are our first reasoning models with full tool support, including python, search, imagegen, etc. it also comes with the best VISUAL reasoning performance up-to-date!
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation.
If you're going to #ICLR2025... come join me at the @thinkymachines happy hour! There will be food! (Space is limited and we can't guarantee everyone a spot, so please RSVP indicating interest)
Thinking Machines is hosting a happy hour in Singapore during #ICLR2025 on Friday, April 25: lu.ma/ecgmuhmx Come eat, drink, and learn more about us!
We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵