Nicolas Zucchet

@NicolasZucchet

PhD student @CSatETH prev. student researcher @GoogleDeepMind | @Polytechnique

Joined December 2017

372Following

500Followers

Pinned

Nicolas Zucchet@NicolasZucchet · May 26

🧵What if emergence could be explained by learning a specific circuit: sparse attention? Our new work explores this bold hypothesis, showing a link between emergence and sparse attention that reveals how data properties influence when emergence occurs during training.

NicolasZucchet's tweet image. 🧵What if emergence could be explained by learning a specific circuit: sparse attention?

Our new work explores this bold hypothesis, showing a link between emergence and sparse attention that reveals how data properties influence when emergence occurs during training.

257

254

27.0K

Pinned

Nicolas Zucchet@NicolasZucchet · May 26

Some nice analysis by Nicolas & Francesco of a clear case of emergence — and how to accelerate its acquisition!

NNicolas Zucchet@NicolasZucchet · May 26

4.0K

Pinned

Nicolas Zucchet Retweeted

Andrew Lampinen@AndrewLampinen · May 2

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

152

765

690

97.0K

Nicolas Zucchet@NicolasZucchet · Jul 17

Super excited to host a student researcher together with @oswaldjoh this year! Please sign up if you wanna have some research fun with us :)

JJohannes Oswald@oswaldjoh · Jul 17

We are hosting a student researcher this year at the Paradigms of Intelligence team at Google! Interested in working with @ninoscherrer and me on AGI, or whatever you think is the next big thing 🥰, please consider applying! docs.google.com/forms/u/2/d/e/…

6.0K

Nicolas Zucchet Retweeted

Johannes Oswald@oswaldjoh · Jun 17

Super happy and proud to share our novel scalable RNN model - the MesaNet! This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.

396

332

86.0K

Nicolas Zucchet Retweeted

Stephanie Chan@scychan_brains · Jun 5

Emergence in transformers is a real phenomenon! Behaviors and capabilities can appear in models in sudden ways. Emergence is not always just a "mirage". Compiling some examples here (please share any I missed): 🧵

357

337

30.0K

Nicolas Zucchet Retweeted

Antonio Orvieto@orvieto_antonio · Jun 3

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287

172

10.0K

Nicolas Zucchet@NicolasZucchet · May 27

Smooth predictable scaling laws are central to our conceptions and forecasts about AI -- but lots of capabilities actually *emerge* in sudden ways. Awesome work by @NicolasZucchet @dngfra bringing more predictability to emergent phenomena, by studying one type: sparse attention

NNicolas Zucchet@NicolasZucchet · May 26

1.0K

Nicolas Zucchet Retweeted

Tyler John@tyler_m_john · May 4

I really like this new op ed from @DavidDuvenaud on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.

351

257

38.0K

Nicolas Zucchet Retweeted

Antonio Orvieto@orvieto_antonio · Apr 23

This is just a reminder for your NeurIPS experiments: if you are comparing architectures, optimizers, or whatever at a single hyperparameter setting (e.g., LR), you are automatically not a scientist. You can be better than this. Produce science, not hype.

106

11.0K

Nicolas Zucchet@NicolasZucchet · Apr 3

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

NNicolas Zucchet@NicolasZucchet · Mar 31

Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! arxiv.org/abs/2503.21676

3.0K

Nicolas Zucchet@NicolasZucchet · Apr 2

Google DeepMindより、LLMの知識獲得プロセスを解明した論文が出た。 LLMの学習初期には知識獲得の停滞期(プラトー期)が存在する。だが実は、この期間に特定の要素に着目し、知識獲得を行う効率的な注意パターンを確立。そして急速な知識獲得を始める。これは幼児の知識獲得プロセスと類似する。

RRohan Paul@rohanpaul_ai · Mar 31

How LLMs acquire factual knowledge during training remains unclear. This paper investigates these learning dynamics using synthetic biographies, revealing a three-phase process where models first learn statistics, plateau while forming attention circuits, and finally acquire…

256

1.0K

568

120.0K