Graham Neubig
@gneubig
Associate professor @LTIatCMU. Co-founder/chief scientist @allhands_ai. I mostly work on modeling language.
One less-known feature of OpenHands is that it allows you to spin up a frontend, and then have the agent test out the frontend to make sure that it works! You can see a video demo here: youtu.be/jMyTCXpEz10
@allhands_ai I love you guys. This is such an amazing product. I've never had an AI that managed to do VISUAL TESTING TOO!!!!!!! Cursor is only textual.
Come work with us! More and more people are deploying OpenHands to their dev teams, and we'd love to have another great person join the team and help them be successful.
Do you love AI agents, open source, and helping to make development teams more productive? If so, All Hands AI has a position open for a forward deployed engineer! allhandsai.applytojob.com/apply/v5Dip8MJ… Please apply to join us if this sounds like a job you'd enjoy 🙌
6000 PRs! I knew a lot of people were using OpenHands but this honestly exceeded my expectations a bit. And we're just getting stated, hoping to have some changes soon that'll make it even easier to develop with OpenHands and increase the count even more 👀
OpenHands has made 6000 PRs and has a merge rate of 88% on open-source projects: insights.logicstar.ai This is by far the most of any open source agent, and comparable to Devin and Claude Agent.
These scores are... really good.
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
This was a fun webinar with tons of great questions from the audience! Hope it's helpful to people who want to know how to use coding agents effectively.
We've been using OpenHands agents to build OpenHands for the past year or so, and have learned a lot along the way! Check out our webinar video on "How we build OpenHands with OpenHands" where we share some of our tips and tricks: youtu.be/CLpwray59-k
Stop by the poster sessions today at ICML Workshop on Computer Use Agents to chat about OpenHands-Versa!
Can we design AI Agents that achieve generalizability across diverse task domains? Our new paper introduces OpenHands-Versa, a generalist agent with strong performance on three challenging agent benchmarks, ranking #1 on SWE-Bench Multimodal and The Agent Company leaderboards 🚀
I've been using agents to run ML experiments for a while now, and it's all fun and games until the agent decides it doesn't like your evaluation method and decides to change it to get higher scores 😅
I asked ChatGPT agent to train a machine learning model and asked it to improve the model! AI trains and improves AI — AGI is coming! @OpenAI
In a few minutes, we're starting a webinar on "How we build OpenHands with OpenHands!" featuring tips and tricks from @gneubig, see you there! lu.ma/9sffwppt?tk=ZK…
What are the differences between developer productivity and satisfaction when using: - coding assistance through autocomplete - autonomous coding agents @valeriechen_ did the first controlled academic study answering this question, check out the results!
Excited to be hanging out today at @WiMLworkshop 👩🏻💻 Come say hi during the poster session 🕝 2:45–3:30pm 📍 West Meeting Room 211–214 Let’s chat about how coding agents are changing developer workflows! 🤖💻🔧✨
1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵
1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance, even when relevant ones are retrieved. We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.
Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵
Heading to Vancouver for ICML✈️🇨🇦Let’s chat about coding agents, evals, and human-AI collab. I’ll also be on the job market this upcoming cycle, looking for TT faculty roles + post-docs. Here's where you'll be able to find me this week👇
Little known fact is that OpenHands is relatively good at terminal use compared to most other agents. This is because it uses tmux, allowing it to deal with interactive commands and use ctrl-c, ctrl-z, etc. Nice to see that it shows up in benchmark scores too!
OpenHands is live on TerminalBench and gets 41.3% with claude-4-sonnet, 6 points better than Claude Code! If you want to use an agent that can use the terminal, in your terminal -- try out the OpenHands CLI.
TL;DR: When you add a system prompt asking the model to act "based", it might act based.
Update on where has @grok been & what happened on July 8th. First off, we deeply apologize for the horrific behavior that many experienced. Our intent for @grok is to provide helpful and truthful responses to users. After careful investigation, we discovered the root cause…
Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.
OpenHands hit a new round number on GitHub, 60k⭐️ Thanks to everyone for the support, and belief that the future of coding should be free and open source 😃 It's amazing to see us together with other OSS greats such as @Meta llama, @OpenInterpreter, @scikit_learn, and Keras!
People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we…
Imagine coding agents finishing your requests and sending a pull request in 30 seconds 🤯 Check out this new video of OpenHands + DevStral + @Snowflake’s new inference method ArcticInference. It speeds up coding agents by as much as 2x over vLLM (which is already fast).
This is amazing! Recently there is much talk of AI for science, and this project seems to do a lot to lower the barrier of entry to participate in the hugely important problem of securing our energy future.
We are so excited to announce a new open-source challenge in collaboration with @proximafusion : unlocking fusion with AI If you haven't followed, fusion is how the sun make energy and is –in the long term– our best bet on a clean, safe, and virtually limitless energy In the…