Yujia Qin
@TsingYoga
ByteDancer, Agent, THU (16-20 BS in EE, 20-24 PhD in CS)
Introducing UI-TARS-1.5, a vision-language model that beats OpenAI Operator and Claude 3.7 on GUI Agent and Game Agent tasks. We've open-sourced a small-size version model for research purposes, more details can be found in our blog. TARS learns solely from a screen, but…
China's Gaokao is the biggest exam in the world: 13M test takers and 9hrs. ~0.02% make it to the top uni, Tsinghua. As of this week, AI models can make it too. 625/750 is top 1%ile. Highest human score is ~720-740. Gemini 2.5 Pro gets 655, barely making the cut for Tsinghua!
Meet Agent TARS Beta, based on Seed1.5-VL
Since we have released a brand new new Agent TARS CLI based on Seed1.5-VL, see agent-tars.com/beta , we have to say goodbye to the old Agent TARS Desktop github.com/bytedance/UI-T…
Introducing Agent TARS Beta — a brand new and more powerful Agent TARS! - Agent TARS CLI - Browser Agent driven by Seed-1.5-VL - Native Streaming - Multimodal-friendly Web UI - Layered Agent architecture Blog: agent-tars.com/beta Quick start: agent-tars.com/quick-start
🚀 UI-TARS Desktop v0.2.1 is now live! Free Remote Computer & Browser Operator are ready to roll—no setup, just click and go🎁! Get started: 🔽Download: github.com/bytedance/UI-T… 🔽Quick Start: github.com/bytedance/UI-T…
One way of thinking about what AI will automate first is via the “description-execution gap”: how much harder is it to describe the task than to actually do it? Tasks with large description-execution gaps will be ripe for automation because it’s easy to create training data and…
Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…
I’d argue that computer use, in principle, is much harder than math/coding for current AI. the digital world encompasses a much larger part of the complexity in this world. The goals are often vastly underspecified and require accessing and understanding broad context (in users’…
Guess it's the first open-source multi-turn e2e RL for GUI Agents from academia, and it's based on UI-TARS-1.5-7B. If you want to study multimodal Agent RL, it is a good startpoint~ arxiv.org/abs/2505.16282

Interesting to know the previous operator is based on 4o, not even o1... OpenAI is shifting from reasoning models (o3) to agent models (operator, codex, and deepresearch), with gradual integration of agent data streams from multiple teams—evident in GAIA’s jump from 12.3 to 62.2…
Operator 🤝 OpenAI o3 Operator in ChatGPT has been updated with our latest reasoning model. operator.chatgpt.com
OK, ByteDance Seed is now firmly a top tier lab in my mind. Congrats on many solid works recently, continuously publishing and even releasing models. A shame that I am really really bad at remembering individual Chinese names though :-/