Aditya Soni
@Aditya_Soni_8
MS Student @LTIatCMU Previously Bachelor's in Computer Science @IITKgp
Can we design AI Agents that achieve generalizability across diverse task domains? Our new paper introduces OpenHands-Versa, a generalist agent with strong performance on three challenging agent benchmarks, ranking #1 on SWE-Bench Multimodal and The Agent Company leaderboards 馃殌

One less-known feature of OpenHands is that it allows you to spin up a frontend, and then have the agent test out the frontend to make sure that it works! You can see a video demo here: youtu.be/jMyTCXpEz10
@allhands_ai I love you guys. This is such an amazing product. I've never had an AI that managed to do VISUAL TESTING TOO!!!!!!! Cursor is only textual.
Proud and happy to see OpenAgentSafety coming out! Further pushing the frontier of interactional safety risks in human-AI agent collaboration. Kudos to @sanidhya903 and @Aditya_Soni_8 who led the projects!
1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 馃毃 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 馃У
Stop by the poster sessions today at ICML Workshop on Computer Use Agents to chat about OpenHands-Versa!
Can we design AI Agents that achieve generalizability across diverse task domains? Our new paper introduces OpenHands-Versa, a generalist agent with strong performance on three challenging agent benchmarks, ranking #1 on SWE-Bench Multimodal and The Agent Company leaderboards 馃殌
Excited about the results! OpenHands-Versa ranks #1 both in terms of accuracy and cost 馃殌 The cost savings are primarily due to context condensation in OpenHands-Versa: it suffices to retain the most recent browsing observation instead of all previous browsing observations.
We just updated the leaderboard of TheAgentCompany, a benchmark of tasks like real-world work. - In December 2024, 24% of the tasks could be solved - In June 2025, 33% of the tasks could be solved I'm interested to see when we'll be at 50%.
Coding Agents 馃 Multimodal Browsing Can AI agents generalize beyond their intended scope? Great paper on how you can build generalist agents with superior performance over specialized agents. What models and tools work the best? Here are my notes: