Martin Signoux
@MartinSignoux
Red-teaming innovation, fine-tuning regulation ——— AI policy @OpenAI
better eval -> better models
ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems Our paper introduces the leading benchmark for evaluating AI’s abstract reasoning capabilities - Humans solve 100% of tasks - Frontier AI scores <5% @fchollet @mikeknoop @GregKamradt @bryanlanders Henry Pinkard
This is what adoption rate of a General Purpose Technology looks like 📈
Pew have updated their ChatGPT polling from a year ago. Usage has roughly doubled since summer 2023. The share of employed adults who use ChatGPT for work has roughly tripled over the same period to 28%. Two thirds of US adults have still not yet used ChatGPT.
AI comes at you fast
When will an AI win a Gold Medal in the International Math Olympiad? Median predicted date over time July 2021: 2043 (22 years away) July 2022: 2029 (7 years away) July 2023: 2028 (5 years away) July 2024: 2026 (2 years away) metaculus.com/questions/6728…
Yet another reminder to keep things in perspective and look at the trajectory of progress. Yes AI models (and agents) still fail at many things, but they are also increasingly good at solving very hard tasks. Excited about what’s next
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
👏this👏is👏so👏cool👏
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge…
We've had so much fun organizing a vibe coding event for kids aged 9 to 13 that we decided to show our friends how to host one too and turn it into a global event. 📆 It’s happening on October 10-11, 2025 🧑🏫 with in-person sessions hosted locally, 🌐 all connected in one giant…
🦜 Remember when many people claimed LLMs were just stochastic parrots ?
RLaaS
We're starting to get a clearer picture of the mission of Thinking Machines Lab, the AI startup founded by ex-OpenAI CTO Mira Murati. Investors who have spoken to her are describing it as "RL for businesses." w/ @erinkwoo @rocketalignment: theinformation.com/articles/ex-op…
Ex-OpenAI Peter Deng says AI may be rewiring how kids think, and education could shift with it. The skill won't be memorizing answers. It'll be learning how to ask better questions to unlock deeper thinking. “When the calculator was invented, people didn't stop doing math. They…
Very excited this exists. A hill to climb on one of the traits I listed as super needed for next-gen models :)
Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer! Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate! Details and links to paper & open source code below! 🧵1/9
When I was a lawyer, I went on a secondment to my law firm's Paris office; the trainees I managed there were all extremely bright, but because of French academic culture, would produce 20 page long memos full of legal jargon that clients couldn't care less about. So they had to…
Liquid glass isn’t a design gimmick, nor an imperfect upgrade. It’s a deliberate UI shift to prepare iPhone users for the XR era, where digital content overlay with the world.
Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're hiring
it's so important to have brilliant people like @joannejang @JoHeidecke and many more thinking hard about this topic internally. and i'm proud they can speak about it and share their thoughts. we -as a society- are only getting started exploring this dimension and we need more
some thoughts on human-ai relationships and how we're approaching them at openai it's a long blog post -- tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being.…
Did you ever wonder about the employment effects of AI at the firm level? Our new paper is out in AEA Papers and Proceedings @AEAjournals... Key finding: more AI ➡️ more jobs! 👉…1-4c99-bb2c-0fdc17ec7c2d.filesusr.com/ugd/bacd2d_a26…
"Extending 'GPTs Are GPTs' to Firms" is now out in AEA Papers & Proceedings. Lots of talk about AI's impact on employment this week. One way AI will influence labor demand is through firm-level impacts. We built initial descriptive statistics that can shed light on these…