Pengxiang Li
@oliverlee1999
Research intern at http://bigai.ai. #ComputerVision, #MultimodalLearning, #NonEuclideanOptimization Ph.D. Student of BIGAI&BIT @BIT1940.
🔥Introducing SPORT, a multimodal agent that explores tool usage without human annotation. It leverages step-wise DPO to further enhance tool-use capabilities following SFT. SPORT achieves improvements on the GTA and GAIA benchmarks. sport-agents.github.io

Noetix N2 endures some serious abuse but keeps walking.
It's actually a pity that we got no enough time to maintain OpenManus during the past 3 months. But the better news is that we will build a formal open-source community for OpenManus at the end of this month.
Looks good to me. Try to build a horror research game with MGX. Publish something or go die 😭
The MGX · AI Tools Challenge is live! Build a powerful AI app and aim for the top! Just publish your app to join. 🗓️ Deadline: June 17th, 6:00 PM PT 🎁 Top reward: $500 MGX Pro + usage credits Vote daily to earn too — no app needed! [Join now] ➡️ discord.com/invite/NMrp44a…
🗳️ Cast your vote for Yuguang “Michael” Fang for IEEE ComSoc Board of Governors (2025–2027)! With 26+ years of service, he's committed to mentorship, inclusion, and advancing cutting-edge research. Vote now 👉 eballot.app/ieee #IEEE #ComSoc #Leadership #VoteIEEE

Just attended a paper sharing talk on this paper. Spatial reasoning is still a tough challenge for current VLMs, but this work makes a great step forward.
Elevate Visual-Spatial Intelligence with Spatial-MLLM! 🚀🚀🚀 Discover how we incorporate 3D information to help MLLMs better think in space in our work: Spatial-MLLM. 🔗Code: github.com/diankun-wu/Spa… 🌐Project Page: diankun-wu.github.io/Spatial-MLLM/ 📄Paper: arxiv.org/abs/2505.23747
🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗…
Security is a fundamental threshold that must be ensured before CUA can truly enter the user market.
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
We're thrilled to announce the launch of the "Computer Use Agent (CUA)" community on AlphaXiv! 🎉 This community is dedicated to academic discussions, engineering collaborations, and creative brainstorming in the CUA field. alphaxiv.org/invite/b32af68… #CUA #AlphaXiv #AIResearch

Please check out our Qwen3 Technical Report. 👇🏻 github.com/QwenLM/Qwen3/b…
⚠️ Attention: The site is currently down. Our engineering team is investigating. We will update as soon as possible. You can track progress here: status.overleaf.com Sorry for any inconvenience.
New Paper: Continuous Thought Machines 🧠 Neurons in brains use timing and synchronization in the way that they compute, but this is largely ignored in modern neural nets. We believe neural timing is key for the flexibility and adaptability of biological intelligence. We…
Excellent Agent training infra!
Celebrating the open source of VeOmni (github.com/ByteDance-Seed…), part of the training infra behind UI-TARS~ You can now optionally fine-tune UI-TARS with VeOmni🥳🥳