Sam Schoenholz
@sschoenholz
@thinkymachines previously: @openai, google brain.
I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things: - Helping people adapt AI systems to work for their specific needs - Developing strong foundations to build more capable AI systems - Fostering a…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
Some teams use sweeps, heuristics, or scaling laws to determine their training LR. At Character, we just have Noam Shazeer dial it to the right value.
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
# The True Nature of Training A student approached Master Dogen in his pottery workshop. "Master, what is the difference between pre-post-training and post-mid-training?" the student asked. Dogen held up a bowl half-dried on the wheel and another newly glazed. "When does the…
Once again waiting for Gary Marcus to perform the (weekly) ceremonial shifting of the goal posts.
Excited to share what my team has been working on at OpenAI!
Detecting misbehavior in frontier reasoning models Chain-of-thought (CoT) reasoning models “think” in natural language understandable by humans. Monitoring their “thinking” has allowed us to detect misbehavior such as subverting tests in coding tasks, deceiving users, or giving…
Congrats friends 😀😀!! I know how much work was done across the org to get this out. Super impressive.
Today we’re releasing a research preview of GPT-4.5—our largest and best model for chat yet. Rolling out now to all ChatGPT Pro users, followed by Plus and Team users next week, then Enterprise and Edu users the following week.
I had an amazing couple years at openai, and couldn't have asked for a better place to work and do research. Now I'm excited to try to build something from the ground up with an outrageously stacked group of colleagues.
Today, we are excited to announce Thinking Machines Lab (thinkingmachines.ai), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,…
imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them,…
Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)
@ren_hongyu killed it To recap the demo (I'm still sweating), o3-mini wrote it's own ChatGPT UI to talk to *itself* via the OpenAI API, we asked o3-mini to write and execute a script in this UI to evaluate *itself* on GPQA, and the resulting script correctly returned 61%.
I trained a sparse autoencoder on the middle layer residual stream of my genome language model and found human-interpretable latent features that consistently detect specific DNA motifs! 🧵1/8
Lots of progress with ChatGPT Canvas:
Canvas is now available in 4o for all users, Free and Paid! You can now execute Python code, get targeted suggestions, and easily import your own writing & code. We’ll continue making it more powerful & easier to collaborate with ChatGPT.
sora is launching today to all chatgpt pro and plus users! it's been a big effort to make this possible + i think the product is really fun & intuitive. my fav thing to do is generate fake historical found footage. video inpainting is also really strong. have fun!
Sora is here for Plus and Pro users at no additional cost! Pushing the boundaries of visual generation will require breakthroughs both in ML and HCI. Really proud to have worked on this brand new product with @billpeeb @rohanjamin @cmikeh2 and the rest of the Sora team!…