Andrew Saxe (@SaxeLab)

Pinned

A

Andrew Saxe@SaxeLab · Jun 4

How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention arxiv.org/abs/2501.16265 Led by Yedi Zhang with @Aaditya6284 and Peter Latham

2

23

117

87

12.0K

Andrew Saxe Retweeted

G

Gatsby Computational Neuroscience Unit@GatsbyUCL · Jul 21

🥳 Congratulations to Rodrigo Carrasco-Davison on passing his PhD viva with minor corrections! 🎉 📜 Principles of Optimal Learning Control in Biological and Artificial Agents.

3

1

58

5

3.0K

A

Andrew Saxe@SaxeLab · Jul 15

Come chat about this at the poster @icmlconf, 11:00-13:30 on Wednesday in the West Exhibition Hall #W-902!

AAndrew Saxe@SaxeLab · Jun 4

How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention arxiv.org/abs/2501.16265 Led by Yedi Zhang with @Aaditya6284 and Peter Latham

0

4

14

3

2.0K

Andrew Saxe Retweeted

G

Gatsby Computational Neuroscience Unit@GatsbyUCL · Jul 11

👋 Attending #ICML2025 next week? Don't forget to check out work involving our researchers!

1

5

29

3

3.0K

A

Andrew Saxe@SaxeLab · Jul 15

Excited to present this work in Vancouver at #ICML2025 today 😀 Come by to hear about why in-context learning emerges and disappears: Talk: 10:30-10:45am, West Ballroom C Poster: 11am-1:30pm, East Exhibition Hall A-B # E-3409

AAaditya Singh@Aaditya6284 · Mar 10

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

1

5

21

4

2.0K

Andrew Saxe Retweeted

A

Alexandra Proca@a_proca · Jun 20

How do task dynamics impact learning in networks with internal dynamics? Excited to share our ICML Oral paper on learning dynamics in linear RNNs! with @ClementineDomi6 @mpshanahan @PedroMediano openreview.net/forum?id=KGOcr…

4

18

98

68

15.0K

A

Andrew Saxe@SaxeLab · Jun 9

Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, and an extra thanks to my amazing collaborators for making this project so much fun to work on :)

AAaditya Singh@Aaditya6284 · Mar 10

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

7

5

34

9

3.0K

Andrew Saxe Retweeted

A

Aaditya Singh@Aaditya6284 · Mar 10

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

1

23

133

103

22.0K

A

Andrew Saxe@SaxeLab · Jun 4

Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread 🧵 with some extra highlights

AAndrew Saxe@SaxeLab · Jun 4

How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention arxiv.org/abs/2501.16265 Led by Yedi Zhang with @Aaditya6284 and Peter Latham

1

5

26

12

3.0K