Dan Zhang

@DZhang50

Gemini Model+HW Codesign @ Google DeepMind | Computer Architecture PhD @ UT Austin🤘 | Opinions stated here are my own.

SF Bay Area, CA

Joined November 2014

947Following

3KFollowers

Pinned

Dan Zhang@DZhang50 · Feb 9, 2022

Currently, datacenter ML training and inference uses commodity TPU and GPU devices optimized for a wide range of workloads. Given the extreme scale of large datacenter deployments, would it be practical to build custom accelerators optimized for specific workloads? (1/4)

160

Dan Zhang@DZhang50 · Jul 22

Very excited to announce that I’ll be co-organizing a @NeurIPSConf workshop on LLM evals! Identifying shortcomings in model capabilities in a robust, scientific way is a critical part of model development. Looking forward to discussing ideas and hearing from some eval experts!

LLLM Evals Workshop @NeurIPS@LLM_eval · Jul 22

We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3

5.0K

Dan Zhang@DZhang50 · Jul 21

lol

142

3.0K

258

281.0K

Dan Zhang Retweeted

Quoc Le@quocleix · Jul 21

Excited to share that a scaled up version of Gemini DeepThink achieves gold-medal standard at the International Mathematical Olympiad. This result is official, and certified by the IMO organizers. Watch out this space, more to come soon! deepmind.google/discover/blog/…

707

53.0K

Dan Zhang@DZhang50 · Jul 21

Our IMO gold model is not just an "experimental reasoning" model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥

MMelvin Johnson@melvinjohnsonp · Jul 21

So happy to see this incredible achievement. Huge congrats to @lmthang, @quocleix, @YiTayML and the IMO team on the result. This was a great collaboration across teams to build a general Gemini DeepThink model that can also get gold at IMO.

1.0K

198

297.0K

Dan Zhang@DZhang50 · Jul 21

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…

TThang Luong@lmthang · Jul 25

Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more…

223

2.0K

223

313.0K

Dan Zhang@DZhang50 · Jul 15

not a coincidence that Grok's waifu mode looks like Misa from Death Note 😆

EElon Musk@elonmusk · Oct 14, 2021

Death Note, Evangelion, Ghost in the Shell, Spirited Away, Princess Mononoke, Full Metal Alchemist, Your Name

1.0K

Dan Zhang@DZhang50 · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

182

1.0K

754

183.0K

Dan Zhang Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

654

4.0K

685.0K

Dan Zhang Retweeted

Albert Gu@_albertgu · Jul 8

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

113

782

540

113.0K

Dan Zhang Retweeted

xjdr@_xjdr · Jul 4

This is one of the most interesting papers ive read in a long time. not only in terms of token efficiency but also in terms of potential interesting latent interactions with the higher order trilinear representations. arxiv.org/abs/2507.02754

625

526

63.0K

Dan Zhang@DZhang50 · Jun 25

broke: competing on model quality woke: competing on free food quality

AAndrew Ma@textangel · Jun 25

Not sure if it’s just you guys tho cc @TheGregYang

2.0K

Dan Zhang@DZhang50 · Jun 24

xai has free coffee on weekends; we have free tongsui on weekdays

229

38.0K

Dan Zhang@DZhang50 · Jun 24

Love to see my 𝚘𝚙𝚙𝚜 win 💪

DDan Zhang@DZhang50 · Jun 23

literally me and @dancherp

2.0K

Dan Zhang@DZhang50 · Jun 23

literally me and @dancherp

gguo@guo_dini · Jun 23

strive to be the best person with your name (i’ve never met this lad)

4.0K

Dan Zhang@DZhang50 · Jun 18

do any effective altruists still follow "earn to give"? 80000hours.org/articles/earni…

YYuchen Jin@Yuchenj_UW · Jun 17

Sam Altman says Meta is offering $100M signing bonuses to OpenAI staff. Not $100M annual compensation, just the signing bonus! He clowned Meta: “that’s not how you build a great culture.” Also said none of OpenAI’s best people are leaving. This AI talent war is crazy.

4.0K

Dan Zhang@DZhang50 · Jun 17

I'm on my first Gemini paper! 🥰

437

31.0K

Dan Zhang@DZhang50 · Jun 12

can't stop making memes

133

8.0K

Dan Zhang@DZhang50 · Jun 11

🤔

452

27.0K

Dan Zhang@DZhang50 · Jun 6

getting tired of winning 😎

IIan Nuttall@iannuttall · Jun 5

new gemini 2.5 is out, and cost is so good vs o3 and opus anybody else getting model fatigue? 😂

662

63.0K

Dan Zhang@DZhang50 · May 23

I challenge u to a rap battle @dylan522p

4.0K