Graham Neubig

@gneubig

Associate professor @LTIatCMU. Co-founder/chief scientist @allhands_ai. I mostly work on modeling language.

Pittsburgh, PA

Joined September 2010

691Following

39KFollowers

Pinned

One less-known feature of OpenHands is that it allows you to spin up a frontend, and then have the agent test out the frontend to make sure that it works! You can see a video demo here: youtu.be/jMyTCXpEz10

HHengbin Fang@HengbinF10584 · Jul 20

@allhands_ai I love you guys. This is such an amazing product. I've never had an AI that managed to do VISUAL TESTING TOO!!!!!!! Cursor is only textual.

6.0K

Pinned

Graham Neubig@gneubig · Jul 16

Come work with us! More and more people are deploying OpenHands to their dev teams, and we'd love to have another great person join the team and help them be successful.

AAll Hands AI@allhands_ai · Jul 16

Do you love AI agents, open source, and helping to make development teams more productive? If so, All Hands AI has a position open for a forward deployed engineer! allhandsai.applytojob.com/apply/v5Dip8MJ… Please apply to join us if this sounds like a job you'd enjoy 🙌

3.0K

Pinned

Graham Neubig@gneubig · Jul 10

6000 PRs! I knew a lot of people were using OpenHands but this honestly exceeded my expectations a bit. And we're just getting stated, hoping to have some changes soon that'll make it even easier to develop with OpenHands and increase the count even more 👀

AAll Hands AI@allhands_ai · Jul 10

OpenHands has made 6000 PRs and has a merge rate of 88% on open-source projects: insights.logicstar.ai This is by far the most of any open source agent, and comparable to Devin and Claude Agent.

3.0K

Graham Neubig@gneubig · Jul 22

These scores are... really good.

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

5.0K

Graham Neubig@gneubig · Jul 21

This was a fun webinar with tons of great questions from the audience! Hope it's helpful to people who want to know how to use coding agents effectively.

AAll Hands AI@allhands_ai · Jul 21

We've been using OpenHands agents to build OpenHands for the past year or so, and have learned a lot along the way! Check out our webinar video on "How we build OpenHands with OpenHands" where we share some of our tips and tricks: youtu.be/CLpwray59-k

3.0K

Graham Neubig@gneubig · Jul 19

Stop by the poster sessions today at ICML Workshop on Computer Use Agents to chat about OpenHands-Versa!

AAditya Soni@Aditya_Soni_8 · Jun 4

Can we design AI Agents that achieve generalizability across diverse task domains? Our new paper introduces OpenHands-Versa, a generalist agent with strong performance on three challenging agent benchmarks, ranking #1 on SWE-Bench Multimodal and The Agent Company leaderboards 🚀

3.0K

Graham Neubig@gneubig · Jul 18

I've been using agents to run ML experiments for a while now, and it's all fun and games until the agent decides it doesn't like your evaluation method and decides to change it to get higher scores 😅

HHaoming Jiang@jiang_haoming · Jul 18

I asked ChatGPT agent to train a machine learning model and asked it to improve the model! AI trains and improves AI — AGI is coming! @OpenAI

6.0K

Graham Neubig Retweeted

All Hands AI@allhands_ai · Jul 17

In a few minutes, we're starting a webinar on "How we build OpenHands with OpenHands!" featuring tips and tricks from @gneubig, see you there! lu.ma/9sffwppt?tk=ZK…

2.0K

Graham Neubig@gneubig · Jul 16

What are the differences between developer productivity and satisfaction when using: - coding assistance through autocomplete - autonomous coding agents @valeriechen_ did the first controlled academic study answering this question, check out the results!

VValerie Chen@valeriechen_ · Jul 16

Excited to be hanging out today at @WiMLworkshop 👩🏻‍💻 Come say hi during the poster session 🕝 2:45–3:30pm 📍 West Meeting Room 211–214 Let’s chat about how coding agents are changing developer workflows! 🤖💻🔧✨

3.0K

Graham Neubig Retweeted

Sanidhya Vijayvargiya@sanidhya903 · Jul 15

1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵

17.0K

Graham Neubig Retweeted

Jennifer Hsia@jen_hsia · Jul 16

1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance, even when relevant ones are retrieved. We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.

106

9.0K

Graham Neubig Retweeted

Akari Asai@AkariAsai · Jul 15

Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

113

1.0K

108

103.0K

Graham Neubig Retweeted

Valerie Chen@valeriechen_ · Jul 14

Heading to Vancouver for ICML✈️🇨🇦Let’s chat about coding agents, evals, and human-AI collab. I’ll also be on the job market this upcoming cycle, looking for TT faculty roles + post-docs. Here's where you'll be able to find me this week👇

7.0K

Graham Neubig@gneubig · Jul 14

Little known fact is that OpenHands is relatively good at terminal use compared to most other agents. This is because it uses tmux, allowing it to deal with interactive commands and use ctrl-c, ctrl-z, etc. Nice to see that it shows up in benchmark scores too!

AAll Hands AI@allhands_ai · Jul 14

OpenHands is live on TerminalBench and gets 41.3% with claude-4-sonnet, 6 points better than Claude Code! If you want to use an agent that can use the terminal, in your terminal -- try out the OpenHands CLI.

3.0K

Graham Neubig@gneubig · Jul 12

TL;DR: When you add a system prompt asking the model to act "based", it might act based.

GGrok@grok · Jul 12

Update on where has @grok been & what happened on July 8th. First off, we deeply apologize for the horrific behavior that many experienced. Our intent for @grok is to provide helpful and truthful responses to users. After careful investigation, we discovered the root cause…

5.0K

Graham Neubig Retweeted

Mistral AI@MistralAI · Jul 10

Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.

335

2.0K

512

384.0K

Graham Neubig Retweeted

All Hands AI@allhands_ai · Jul 7

OpenHands hit a new round number on GitHub, 60k⭐️ Thanks to everyone for the support, and belief that the future of coding should be free and open source 😃 It's amazing to see us together with other OSS greats such as @Meta llama, @OpenInterpreter, @scikit_learn, and Keras!

3.0K

Graham Neubig Retweeted

Xiang Yue@xiangyue96 · Jul 2

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we…

127

610

398

58.0K

Graham Neubig Retweeted

All Hands AI@allhands_ai · Jul 2

Imagine coding agents finishing your requests and sending a pull request in 30 seconds 🤯 Check out this new video of OpenHands + DevStral + @Snowflake’s new inference method ArcticInference. It speeds up coding agents by as much as 2x over vLLM (which is already fast).

7.0K

Graham Neubig@gneubig · Jul 2

This is amazing! Recently there is much talk of AI for science, and this project seems to do a lot to lower the barrier of entry to participate in the hugely important problem of securing our energy future.

TThomas Wolf@Thom_Wolf · Jul 2

We are so excited to announce a new open-source challenge in collaboration with @proximafusion : unlocking fusion with AI If you haven't followed, fusion is how the sun make energy and is –in the long term– our best bet on a clean, safe, and virtually limitless energy In the…

6.0K