Bowen Wang

@BowenWangNLP

1st year Ph.D. student @XLangNLP @HKUniversity focusing on #NLP. Prev. @Tsinghua_Uni, passionate about computer-use agents.

Hong Kong

Joined July 2023

293Following

438Followers

Pinned

Bowen Wang@BowenWangNLP · Apr 8

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free…

105

335

210

88.0K

Pinned

Bowen Wang@BowenWangNLP · Apr 24

Big congrats to @TsingYoga and their team for pushing the boundaries of CUAs! When developing, UI-TARS-1.5 truly feels like the beginning of a new chapter — the next episode is coming. Stay tuned for the leaderboard🚀!

XXLANG NLP Lab@XLangNLP · Apr 24

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to arena.xlang.ai.

613

Bowen Wang Retweeted

Kimi.ai@Kimi_Moonshot · Jun 20

Meet Kimi-Researcher - an autonomous agent that excels at multi-turn search and reasoning. Powered by k 1.5 and trained with end-to-end agentic RL. Achieved 26.9% pass@1 on Humanity's Last Exam, 69% pass@1 on xbench. 🔗 Tech blog：moonshotai.github.io/Kimi-Researche…

232

1.0K

655

233.0K

Bowen Wang Retweeted

XLANG NLP Lab@XLangNLP · Jun 17

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

20.0K

Bowen Wang Retweeted

OpenAI@OpenAI · May 23

Operator 🤝 OpenAI o3 Operator in ChatGPT has been updated with our latest reasoning model. operator.chatgpt.com

294

592

6.0K

1.0K

1.4M

Bowen Wang@BowenWangNLP · May 23

Based on my own testing, Claude 4 is even stronger in CUA than Claude 3.7 Sonnet, with agentic capabilities enhanced, come on and give it a try!

XXLANG NLP Lab@XLangNLP · May 23

💠Claude Opus 4 & Claude Sonnet 4 Welcome to the Computer Agent Arena🔥 Congratulations on the @AnthropicAI team for the great release!

424

Bowen Wang@BowenWangNLP · May 2

🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsourced real-world tasks. OSWorld: 🥇UI-Tars1.5🥈Operator🥉Claude 3.7 CUA Arena: 🥇Claude 3.7🥈Operator🥉UI-Tars1.5 🚀Rankings likely to evolve quickly

XXLANG NLP Lab@XLangNLP · May 2

🏆 Leaderboard Update! 🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks! Check the full rankings! 👉 arena.xlang.ai/leaderboard

7.0K

Bowen Wang@BowenWangNLP · May 2

😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study: 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.” 2,…

XXLANG NLP Lab@XLangNLP · May 2

2.0K

Bowen Wang Retweeted

Cua @ ICML 25@trycua · Apr 28

Part 2 of Build Your Own Operator on macOS is now live! The new cua-agent framework cuts down complexity and accelerates CUA development - so you can focus on building, not boilerplate.

124

121

20.0K

Bowen Wang@BowenWangNLP · Apr 24

For folks working on CUAs, definitely give o3 and o4-mini a try from @OpenAI. Key takeaway: Enhancing image reasoning and tool-use abilities on FM could significantly boost CUA performances.

XXLANG NLP Lab@XLangNLP · Apr 24

🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena! Test, vote, and explore their full potential with CUAs at arena.xlang.ai! Join the community and dive in!

292

Bowen Wang Retweeted

Yujia Qin@TsingYoga · Apr 17

UI-TARS-1-5

5.0K

Bowen Wang Retweeted

Kimi.ai@Kimi_Moonshot · Apr 9

🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reasoning capability. ✨ Key Highlights: 💡 An MoE VLM and an MoE Reasoning VLM with only ~3B activated parameters 🧠 Strong multimodal reasoning (36.8% on…

213

1.0K

477

117.0K

Bowen Wang@BowenWangNLP · Apr 8

I want to highlight that this was an incredibly complex piece of work put together by @BowenWangNLP. We have been working on this for more than a year - much longer than a typical 3-5 month AI sprint. Big thanks to @taoyds for leading such an impactful project.

BBowen Wang@BowenWangNLP · Apr 8

3.0K