Casey Chu
@caseychu9
Researcher at @openai
We launched ChatGPT Agent today! When tested on a variety of REAL work tasks (expert tasks that might take >10h), we found that its output was human-quality almost 50% of the time Agent puts o3's intelligence into practice - try your work tasks and let us know how it goes!
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
An intuition for relative memory access times (scaled 10^10): Reg: 2 sec - Take from shelf Cache: 6½ min - Get from garage DDR Main: 20 min - Go to store DDR CXL: 1hr Far Mem: 8hr SSD: 6 days - Order online Spinning Disk (3ms): 1yr! Via @dylan522p & @SemiAnalysis_
To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system
watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.
working on bringing that pass@16 number down to pass@1 💪
We also found that, when allowed 16 tries per problem, ChatGPT agent’s score grew from 27% to 49% on the tier 1-3 set. This suggests that better prompting or scaffolding might result in better performance from current models.
Great post from @xikun_zhang_, who did a great job making sure collaboration with Agent feels good!
Just launched ChatGPT Agent (sorry GPT-5 waiters, it is coming!), the most capable AI agent model to date! It has been such an honor to be part of a crazy sprint to get this amazing model trained and shipped together with an absolutely gem team (@isafulf , @caseychu9 ,…
Join us in making the next generation of agents both capable and safe! We think that agents will be a big part of how we interact with AI in the future, making it critical that we think carefully about how we build them.
We're hiring for a new team @OpenAI: Agent Robustness and Control Our goal is to make sure our agents safe and secure during training and deployment. Want to work on some of the hardest problems in AI today? Apply via link in reply or DM me!
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
been waiting years for solomonoff maximalism to become a populist position. god bless
THE BEST DEFINITION OF INTELLIGENCE IS THE ABILITY TO PREDICT THE FUTURE!!! From Donald Trump Truth Social 04/14/25 09:32 AM
LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵
We launched a research preview of Operator today! It's a model built on top of GPT-4o that can control a browser — it is very early and will make mistakes, but it's a taste of things to come openai.com/index/introduc…
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task…
Why don't we measure probabilities in degrees? blog.alexalemi.com/a-degree-of-ce…
I had the joy and the honor of being invited to give the @harveymudd commencement address this year. In the vector space of all advice, I explore a 5-dimension subspace orthogonal to the “follow your dreams” vector. YouTube Link: youtu.be/W3I3kAg2J7w
GPT-4o would not have happened without the vision, talent, conviction, and determination of @prafdhar over a long period of time. that (along with the work of many others) led to what i hope will turn out to be a revolution in how we use computers.
GPT-4o (o for “omni”) is the first model to come out of the omni team, OpenAI’s first natively fully multimodal model. This launch was a huge org-wide effort, but I’d like to give a shout out to a few of my awesome team members who made this magical model even possible!
love this syntax!
For this, we developed a new library to express sharding more clearly. Here’s multihost FSDP and tensor parallelism (TP) for a feedforward network. “F/t” means both “F/t is the size per chip” and “tensor dimension F is sharded over t chips”. d is FSDP, t is TP.