Fred Zhang

@FredZhang0

research scientist @googledeepmind, prev phd @berkeley_eecs, DM open

nyc / berkeley

Joined July 2019

466Following

951Followers

Fred Zhang@FredZhang0 · Jul 21

This is the most scaling-pilled project I've ever been part of, and the team really cooked. TL;DR: With RL and inference scaling, Gemini perfectly solved 5 out of 6 problems, reaching a gold medal in IMO '25, all within the time constraints of 4.5hr.

GGoogle DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

535

53.0K

Fred Zhang@FredZhang0 · Apr 16

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

OOpenAI@OpenAI · Apr 16

OpenAI o3 and o4-mini openai.com/live/

434

1.0K

12.0K

6.0K

3.8M

Fred Zhang@FredZhang0 · Mar 20

Every OOM improvement along this trendline can be qualitatively different and break the line itself. In particular, I expect an t-AGI, for t ~ 1 week, would automate a decent fraction of tasks in day-to-day AI R&D and accelerate the trend, potentially to superexponential rate.

MMETR@METR_Evals · Mar 19

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

2.0K

Fred Zhang@FredZhang0 · Feb 25

Arthur, Neel and the interp team at GDM are incredibly brilliant. You should consider working with them!

AArthur Conmy@ArthurConmy · Feb 25

We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with @NeelNanda5, the @GoogleDeepMind AGI Safety team, and me: apply by 28th February as a…

1.0K

Fred Zhang Retweeted

Jiahai Feng@feng_jiahai · Dec 17

LMs can generalize to implications of facts they are finetuned on. But what mechanisms enable this, and how are these mechanisms learned in pretraining? We develop conceptual and empirical tools for studying these qns. 🧵

148

105

21.0K

Fred Zhang@FredZhang0 · Dec 13

Check out our new work on scaling training data attribution (TDA) toward LLM pretraining - and some interesting things we found along the way. arxiv.org/abs/2410.17413 and more below from most excellent student researcher @tylerachang ⬇️

TTyler Chang@tylerachang · Dec 13

We scaled training data attribution (TDA) methods ~1000x to find influential pretraining examples for thousands of queries in an 8B-parameter LLM over the entire 160B-token C4 corpus! medium.com/people-ai-rese…

2.0K

Fred Zhang Retweeted

Alex Pan@aypan_17 · Dec 13

LLMs have behaviors, beliefs, and reasoning hidden in their activations. What if we could decode them into natural language? We introduce LatentQA: a new way to interact with the inner workings of AI systems. 🧵

131

20.0K

Fred Zhang@FredZhang0 · Nov 1

Memorization is NOT merely detrimental for reasoning tasks - sometimes, it’s surprisingly helpful. I’m really enjoying this project, as we work toward a more rigorous definition and understanding of reasoning and memorization (albeit in a controlled synthetic setting):…

CChulin Xie@ChulinXie · Nov 1

*Do LLMs learn to reason, or are they just memorizing?*🤔 We investigate LLM memorization in logical reasoning with a local inconsistency-based memorization score and a dynamically generated Knights & Knaves (K&K) puzzle benchmark. 🌐: memkklogic.github.io (1/n)

223

109

40.0K

Fred Zhang@FredZhang0 · Oct 29

alternative timeline: strong interp is information theoretically solvable, but never solved, due to computational complexity barriers same may apply to neuroscience and fundamental physics

JJames Campbell@jam3scampbell · Oct 28

strong interpretability will be solved, it’s just a matter of when (before AGI/before ASI/after ASI) but when it is solved, it’ll mark a major shift from taming dragons to designing super-dragons

3.0K

Fred Zhang@FredZhang0 · Oct 23

Exciting time working on interp!

TTransluce@TransluceAI · Oct 23

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

1.0K

Fred Zhang@FredZhang0 · Oct 17

sparks of AGI -> no, we need rigorous scientific eval -> lots of eval came out -> test sets saturated & leaked everywhere -> now we need human expertise & creativity to design the last exam, a.k.a., the approach of sparks of AGI

NNathan Lambert@natolambert · Oct 16

Normally, when you hear about "eval contamination" in leading language models you assume a) negligence or b) explicit cheating on evaluations. With extensive synthetic data usage, this is changing, which means we need to be even more careful with transparency and data curation.…

2.0K

Fred Zhang Retweeted

Michael Lepori@Michael_Lepori · Oct 14

The ability to properly contextualize is a core competency of LLMs, yet even the best models sometimes struggle. In a new preprint, we use #MechanisticInterpretability techniques to propose an explanation for contextualization errors: the LLM Race Conditions Hypothesis. [1/9]

101

15.0K

Fred Zhang@FredZhang0 · Oct 12

> Doubling of the human lifespan last year, I made a funny bet against a friend that, with 20% chance, >1 person of our generation will live to be >500 years old. still feeling it since then.

DDario Amodei@DarioAmodei · Oct 11

Machines of Loving Grace: my essay on how AI could transform the world for the better darioamodei.com/machines-of-lo…

712

Fred Zhang@FredZhang0 · Oct 2

Glad to have played a small role in this new benchmark effort on evaluating LM for forecasting. TL;DR: it's a fully dynamic set that asks you to forecast the future and so is always contamination-free. We find frontier models still not as good as humans.

FForecasting Research Institute@Research_FRI · Oct 1

Today, we're excited to announce ForecastBench: a new benchmark for evaluating AI and human forecasting capabilities. Our research indicates that AI remains worse at forecasting than expert forecasters. 🧵 Arxiv: arxiv.org/abs/2409.19839 Website: forecastbench.org

2.0K

Fred Zhang Retweeted

Danny Halawi@dannyhalawi15 · Mar 15, 2024

Language models can imitate patterns in prompts. But this can lead them to reproduce inaccurate information if present in the context. Our work (arxiv.org/abs/2307.09476) shows that when given incorrect demonstrations for classification tasks, models first compute the correct…

12.0K