Shalev Lifshitz

@Shalev_lif

do androids dream of electric sheep? @ something new, previously @UofT @VectorInst

Toronto

Joined September 2017

386Following

2KFollowers

Pinned

Shalev Lifshitz@Shalev_lif · Feb 28

Hot off the Servers 🔥💻 --- we’ve found a new approach for scaling test-time compute! Multi-Agent Verification (MAV) scales the number of verifier models at test-time, which boosts LLM performance without any additional training. Now we can scale along two dimensions: by…

Shalev_lif's tweet image. Hot off the Servers 🔥💻 --- we’ve found a new approach for scaling test-time compute! Multi-Agent Verification (MAV) scales the number of verifier models at test-time, which boosts LLM performance without any additional training.

Now we can scale along two dimensions: by…

258

227

44.0K

Shalev Lifshitz@Shalev_lif · 11 h

Bonnie is awesome! Join her team!

BBonnie Li@bonniesjli · 20 h

Our team @GoogleDeepMind is hiring! Join a team of world-class researchers working on open-ended self-improvement! 🔥

289

Shalev Lifshitz@Shalev_lif · 20 h

Our team @GoogleDeepMind is hiring! Join a team of world-class researchers working on open-ended self-improvement! 🔥

RRoberta Raileanu@robertarail · Jul 24

I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…

246

23.0K

Shalev Lifshitz@Shalev_lif · Jul 23

These researchers found that 30% of chem and bio questions on the “Humanity’s Last Exam” benchmark had ground-truth answers that contradicted peer-reviewed papers! Important work by @FutureHouseSF.

AAndrew White 🐦‍⬛@andrewwhite01 · Jul 23

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

184

Shalev Lifshitz@Shalev_lif · Jul 22

550k GB200s and GB300s 🤯🤯

EElon Musk@elonmusk · Jul 22

230k GPUs, including 30k GB200s, are operational for training Grok @xAI in a single supercluster called Colossus 1 (inference is done by our cloud providers). At Colossus 2, the first batch of 550k GB200s & GB300s, also for training, start going online in a few weeks. As Jensen…

295

Shalev Lifshitz@Shalev_lif · Jul 22

Qwen3-Coder is out and open-source. Basically on the level of Claude 4 Sonnet on coding tasks!

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

503

Shalev Lifshitz Retweeted

Noam Brown@polynoamial · Jul 21

Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress. Some thoughts on our model and results 🧵

207

2.0K

413

394.0K

Shalev Lifshitz@Shalev_lif · Jul 21

Important to recognize the amazing humans who participated in the IMO! Warren Bei scored a *perfect* score, he’s starting at MIT in the fall (big win for MIT).

GGautam Kamath@thegautamkamath · Jul 21

Everyone's talking about AI performance on the IMO. Let me highlight 🇨🇦Canadian 11th grader Warren Bei🇨🇦, one of five participants with a *perfect* 42/42. This is his *fifth* (and final) IMO representing Canada, with three golds and two silvers. (➡️ MIT undergrad in the fall)

430

Shalev Lifshitz@Shalev_lif · Jul 21

Official gold-medal performance with Gemini at IMO. Massive congrats to the team! “This year, we were amongst an inaugural cohort to have our model results officially graded and certified by IMO coordinators using the same criteria as for student solutions.”

kkoray kavukcuoglu@koraykv · Jul 21

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! deepmind.google/discover/blog/…

3.0K

Shalev Lifshitz@Shalev_lif · Jul 21

This is actually genius. @nvidia please do this!

CCristian Garcia@cgarciae88 · Jul 19

nvidia could do the most viral ai competition in history: start with 10,000 researchers and give each a free gpu to work on a public leaderboard but do rounds of elimination where the winners take the remaining hardware. the final winner gets all the gpus for a year.

200

Shalev Lifshitz@Shalev_lif · Jul 21

AMD is cool

RRavid Shwartz Ziv@ziv_ravid · Jul 21

Someone on LinkedIn posted about cool theoretical research that he wants to check, and someone from AMD just told him that they will give him the compute 😍

252

Shalev Lifshitz@Shalev_lif · Jul 21

Grok 4 Heavy w/ Python + Internet + Test-Time Compute reaches 50.7%. Even with all those +'s, this really is wild.

393

Shalev Lifshitz@Shalev_lif · Jul 21

While companies are closer to releasing after OpenAI, taking less time and closing the gap, I'm honestly still surprised at OpenAI's ability to consistently lead in impactful product releases.

398

Shalev Lifshitz@Shalev_lif · Jul 21

PSA

NNathan Lambert@natolambert · Jul 21

I gain a lot of mental clarity and peace from not bringing my phone: 1. In the bedroom for sleep, 2. For meals, coffee, or a snack with friends close to home/work. Both are very easy and worth trying.

160

Shalev Lifshitz@Shalev_lif · Jul 20

AI slop, AI slop everywhere

155

Shalev Lifshitz@Shalev_lif · Jul 20

While we don’t yet have all the details, the most impressive part of OpenAI’s achievement is not the gold medal, it’s the fact that this was achieved without a specialized formal logic system. AlphaProof scored a silver medal last year, but used LEAN. Awaiting more details to…

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

418

Shalev Lifshitz@Shalev_lif · Jul 19

The world can change in just 7 hours

DDerya Unutmaz, MD@DeryaTR_ · Jul 19

How it started (10 hours ago) —how it’s going (3 hours ago) 😅

514

Shalev Lifshitz@Shalev_lif · Jul 19

Zuck discovered the infinite money glitch

YYuchen Jin@Yuchenj_UW · Jul 18

Heard Zuck poached 4 more OpenAI researchers, including some behind the open-source model. how deep are Zuck’s pockets?

279

Shalev Lifshitz@Shalev_lif · Jul 17

ChatGPT Agent is the first model we classified as "High" capability for biorisk. Some might think that biorisk is not real, and models only provide information that could be found via search. That may have been true in 2024 but is definitely not true today. Based our…

KKeren Gu 🌱👩🏻‍💻@KerenGu · Jul 17

We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe. 🧵

423

135

74.0K