Lukasz Kaiser

@lukaszkaiser

San Francisco

Joined June 2009

78Following

9KFollowers

Lukasz Kaiser@lukaszkaiser · Jul 21

It wasn't just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving

GGoogle DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

591

54.0K

Lukasz Kaiser@lukaszkaiser · Jul 21

Congratulations!!

GGoogle DeepMind@GoogleDeepMind · Jul 21

2.0K

Lukasz Kaiser Retweeted

Andrew Mayne@AndrewMayne · Jul 20

I asked ChatGPT Agent to turn an image from ChatGPT into a 3D printable file. It printed with no problem on my Bambu A1 printer:

6.0K

Lukasz Kaiser Retweeted

Jerry Tworek@MillionInt · Jul 19

To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system

117

1.0K

246

139.0K

Lukasz Kaiser@lukaszkaiser · Jul 19

AI winning gold in IMO is a huge deal. It was done without tools on new problems that haven't occurred in training data. Solving problems that most people in the world won't be able to solve. x.com/alexwei_/statu…

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

6.0K

Lukasz Kaiser Retweeted

Sakana AI@SakanaAILabs · Jul 18

TransEvalnia: Reasoning-based Evaluation and Ranking of Translations By Richard Sproat, Tianyu Zhao, Llion Jones ArXiv: arxiv.org/abs/2507.12724 We are happy to announce the release of TransEvalnia, a prompting-based translation evaluation and ranking system that uses reasoning…

143

13.0K

Lukasz Kaiser Retweeted

Ethan Mollick@emollick · Jul 17

I had early access & ChatGPT agent is, I think, a big step forward for getting AIs to do real work Even at this stage, it does a good job autonomously doing research & assembling Excel files (with formulas!), PowerPoint, etc. It gives a sense of how agents are coming together

179

1.0K

571

251.0K

Lukasz Kaiser@lukaszkaiser · Jul 17

Developers now often program in English for AI models. Some tasks can be solved by decomposition, others by introducing increasingly more constraints. IFScale is an interesting approach to see how many instructions an LLM can handle and what trends separate different models.

DDaniel J@djarosai · Jul 16

How many instructions can your LLM follow at once? Production LLM systems juggle 10-100s of instructions: policies, style, safety rules, tool use--but when do they overload? We introduce IFScale, a new benchmark measuring how instruction following degrades as instructions scale🧵

6.0K

Lukasz Kaiser Retweeted

Andre Saraiva@andresnds · Jul 17

1/N Yesterday in Tokyo we @OpenAI ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. 👇

202

1.0K

593

727.0K

Lukasz Kaiser@lukaszkaiser · Jul 17

Congratulations!!

AAakanksha Chowdhery@achowdhery · Jul 16

Today we launch Asimov. Asimov is our code research agent that is best-in-class in codebase comprehension. It is built for teams, built for enterprises, and built to remember. We use it everyday to accelerate our velocity and streamline distributed ops. Link below to sign up…

973

Lukasz Kaiser@lukaszkaiser · Jul 14

Training data for future automated building?

DDott. Orikron 🇵🇹@orikron · Jul 13

🇨🇳 A team of construction workers in China operating excavators remotely. Grueling blue-collar work is now a cushy air-conditioned office job.

1.0K

Lukasz Kaiser Retweeted

Nick Card@NS_Card · Jul 9

Our new preprint describes a multimodal intracortical brain-computer interface that a man with ALS has used at home, independently, almost every day for >19 months. It decodes both speech and cursor control to enable him to communicate and use his computer. Here’s a quick tour👇

9.0K

Lukasz Kaiser Retweeted

Haoran Geng@HaoranGeng2 · Jul 8

🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate…

119

456

228

76.0K

Lukasz Kaiser Retweeted

Ethan Mollick@emollick · Jun 29

In practice, for many useful applications, many of the various obvious problems with AI agents (drift, hallucination, compounding errors) are more solvable than they are in theory Clever prompting, tool use, constrained topics, LLM judges & organizational process close some gaps

283

29.0K

Lukasz Kaiser Retweeted

Ethan Mollick@emollick · Jun 28

When there is a lot of natural randomness and discovery in an AI use case (image creation, innovation), the focus should not be on single-threaded conversation that becomes self-reinforcing through autoregression, but embracing variance, randomness & branching. Calls for new UX

114

15.0K

Lukasz Kaiser Retweeted

(

(((ل()(ل() 'yoav))))👾@yoavgo · Jun 28

in a weird turn of events, turns out many neighborhood kids miss the communal shelter times, and now some parents are trying to arrange leisure shelter gatherings

110

8.0K

Lukasz Kaiser@lukaszkaiser · Jun 27

That’s about right

SSpencer Schiff@spencerschiff_ · Jun 27

Right now my AI usage is something like 66% o3-pro, 33% o3, 1% Veo 3, 0% everything else

10.0K

Lukasz Kaiser Retweeted

Percy Liang@percyliang · Jun 18

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

567

5.0K

7.0K

639.0K

Lukasz Kaiser Retweeted

Ethan Mollick@emollick · Jun 20

o3-pro: "write a sentence whose nouns are translations of constellation names & where the last letter of every word spells a constellation in its untranslated name. The first letter of each word must start with a vowel" I didn't even know if it was possible. It was. Impressive!

246

25.0K

Lukasz Kaiser Retweeted

Andrew McCarthy@AJamesMcCarthy · Jun 19

On Sunday I traveled to the middle of the desert to capture this: The ISS against our sun. What I didn't expect: the sun producing a magnificent flare at the same time A once-in-a-lifetime shot I'm thrilled to share with you. See the uncropped shot or get the print in the reply

1.0K

8.0K

96.0K

10.0K

3.5M