Lorenzo Noci

@lorenzo_noci

PhD in Machine Learning at @ETH working on deep learning theory and principled large-scale AI models.

Zurigo, Svizzera

Joined December 2013

290Following

436Followers

Pinned

Lorenzo Noci@lorenzo_noci · May 7

Pretraining large-depth transformers just got easier! 🚀 HP transfer across model scale ⚡ Compute-efficient pretraining. Super cool collab with @DeyNolan @BCZhang_ @mufan_li @CPehlevan @ShaneBergsma @BorisHanin Joel Hastness @CerebrasSystems

NNolan Dey@DeyNolan · May 6

(1/7) @CerebrasSystems Paper drop: arxiv.org/abs/2505.01618 TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇

4.0K

Lorenzo Noci@lorenzo_noci · Jul 12

Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.

JJacopo Graldi@JGraldi · Jul 12

🚨 Excited to present our new paper at 🇨🇦 #ICML2025! 🚨 "The Importance of Being Lazy: Scaling Limits of Continual Learning" Great collab with @alebreccia99, @glanzillo11 , Thomas Hofmann, @lorenzo_noci. 🧵 1/6

2.0K

Lorenzo Noci Retweeted

Aurelien Lucchi@AurelienLucchi · May 3

Our research group in the department of Mathematics and CS at the University of Basel (Switzerland) is looking for several PhD candidates and one post-doc who have a theoretical background in optimization and machine learning or practical experience in reasoning. RT please.

3.0K

Lorenzo Noci Retweeted

Alberto Bietti@albertobietti · Apr 24

Come hear about how transformers perform factual recall using associative memories, and how this emerges in phases during training! #ICLR2025 poster #602 at 3pm today. Lead by @EshaanNichani Link: iclr.cc/virtual/2025/p… Paper: arxiv.org/abs/2412.06538

3.0K

Lorenzo Noci@lorenzo_noci · Apr 18

Come build with us and @OpenAI !!

LLucas Beyer (bl16)@giffmana · Apr 18

You're in Zurich or its zone of influence (Lausanne, Paris, BXL, Munich, London, ...) and like AI + Robots? We (@openai) together with @mimicrobotics, @lokirobotics, and Zurich Builds are organizing a hackathon from Fri 9 May afternoon to Sun 11. Limited spots, more below:

2.0K

Lorenzo Noci Retweeted

Lénaïc Chizat@LenaicChizat · Apr 14

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science EPFL, Sept 1–5, 2025 Speakers: Bach (@BachFrancis) Bandeira Mallat Montanari (@Andrea__M) Peyré (@gabrielpeyre) For PhD students & early-career researchers Application deadline: May 15

102

9.0K

Lorenzo Noci Retweeted

Blake Bordelon ☕️🧪👨‍💻@blake__bordelon · Dec 13

Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter limits of transformer models! Poster in Hall A-C #4804 at 11 AM PST Friday Paper arxiv.org/abs/2405.15712 Work with @hamzatchaudhry and @CPehlevan

4.0K

Lorenzo Noci@lorenzo_noci · Dec 13

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

BBobby@bobby_he · Nov 8

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

5.0K

Lorenzo Noci@lorenzo_noci · Dec 6

Systematic empirical analysis of the role of feature learning in continual learning using scaling limits theory. Meet Jacopo in Vancouver :)

JJacopo Graldi@JGraldi · Dec 6

🎉 Excited to be in #Vancouver next week for #NeurIPS to present results from my Master’s Thesis at the Scalable Continual Learning Workshop on December 14th! 🚀 Our work investigates the role of scale and training regimes in Continual Learning. What did we find? 👇 1/3

393

Lorenzo Noci@lorenzo_noci · Nov 21

Indeed very useful :)

CCengiz Pehlevan@CPehlevan · Nov 21

We collected lecture notes and blog posts by group members about recent topics in deep learning theory here. Hope it is useful! pehlevan.seas.harvard.edu/resources-0

735

Lorenzo Noci@lorenzo_noci · Nov 8

BBobby@bobby_he · Jun 10, 2024

Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training & prevent benefits of quantisation🥲but why do OFs appear & which design choices minimise them? Our new work (+@lorenzo_noci @DanielePaliotta @ImanolSchlag T. Hofmann) takes a look👀🧵

154

38.0K

Lorenzo Noci Retweeted

Chris J. Maddison@cjmaddison · Aug 5

I'm also recruiting PhD/MSc students this coming cycle, with an eye towards applications in drug discovery. cs.toronto.edu/~cmaddis/ DM me or email me if you have any questions at all!

3.0K

Lorenzo Noci Retweeted

Aurelien Lucchi@AurelienLucchi · Aug 1

My group has multiple openings both for PhD and Post-doc positions to work in the area of optimization for ML, and deep learning theory. We are looking for people with a strong theoretical background (degree in math, theoretical physics or CS with strong theory emphasis).

275

119

50.0K

Lorenzo Noci Retweeted

Bobby@bobby_he · Jun 10, 2024

183

110

49.0K

Lorenzo Noci Retweeted

Alex Atanasov@ABAtanasov · May 3, 2024

[1/n] Thrilled that this project with @jzavatoneveth and @cpehlevan is finally out! Our group has spent a lot of time studying high dimensional regression and its connections to scaling laws. All our results follow easily from a single central theorem 🧵 arxiv.org/abs/2405.00592

114

22.0K

Lorenzo Noci@lorenzo_noci · Mar 12, 2024

From stochastic parrot 🦜 to Clever Hans 🐴? In our work with @_vaishnavh we carefully analyse the debate surrounding next-token prediction and identify a new failure of LLMs due to teacher-forcing 👨🏻‍🎓! Check out our work arxiv.org/abs/2403.06963 and the linked thread!

VVaishnavh Nagarajan@_vaishnavh · Mar 12, 2024

🗣️ “Next-token predictors can’t plan!” ⚔️ “False! Every distribution is expressible as product of next-token probabilities!” 🗣️ In work w/ @GregorBachmann1 , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963

3.0K