Annabelle Michael Carrell

@annabelle_cs

Cambridge machine learning PhD student. Formerly Amazon, @JohnsHopkins. 🏳️‍🌈 she/her

Baltimore, MD

Joined December 2017

646Following

585Followers

Pinned

Annabelle Michael Carrell Retweeted

Sam Power@sp_monte_carlo · Oct 26, 2022

cool uploads: arxiv.org/abs/2210.13574 'Understanding Linchpin Variables in Markov Chain Monte Carlo' - Dootika Vats, Felipe Acosta, Mark L. Huber, Galin L. Jones

Annabelle Michael Carrell@annabelle_cs · Jul 19

So you want to skip our thinning proofs—but you’d still like our out-of-the-box attention speedups? I’ll be presenting the Thinformer in two ICML workshop posters tomorrow! Catch me at Es-FoMo (1-2:30, East hall A) and at LCFM (10:45-11:30 & 3:30-4:30, West 202-204)

AAnnabelle Michael Carrell@annabelle_cs · Jul 14

Your data is low-rank, so stop wasting compute! In our new paper on low-rank thinning, we share one weird trick to speed up Transformer inference, SGD training, and hypothesis testing at scale. Come by ICML poster W-1012 Tuesday at 4:30!

2.0K

Annabelle Michael Carrell Retweeted

Ojewale Victor@OjewaleV · Apr 12, 2023

GRAD SCHOOL APPLICATION(2.0) 🧵 Got multiple fully funded PhD offers recently and realized from conversations I have been having that many people don't approach the application process intentionally. Sharing my application process doc as an example below. Open and Retweet 🔃

2.0K

Annabelle Michael Carrell Retweeted

Xinyi Chen@XinyiChen2 · Feb 15, 2023

Optimizer tuning can be manual and resource-intensive. Can we learn the best optimizer automatically with guarantees? With @HazanPrinceton, we give new provable methods for learning optimizers using a control approach. Excited about this result! buff.ly/3IoMOkN (1/n)

153

39.0K

Annabelle Michael Carrell Retweeted

Ashok Cutkosky@AshokCutkosky · Feb 15, 2023

Neural networks are non-convex, and non-smooth. Unfortunately, most theoretical analysis is either convex, or smooth. Should we abandon the past? No! With @bremen79 and @n0royalroad, we import prior know-how via an *online to non-convex* conversion: arxiv.org/abs/2302.03775.

200

38.0K

Annabelle Michael Carrell Retweeted

Preetum Nakkiran@PreetumNakkiran · Jan 1, 2023

My favorite non-ML paper I read this year is probably "Bayesian Persuasion" (2011), which I somehow only found out about recently. Simple & beautiful. The first 2 pages are sufficient to be persuaded. web.stanford.edu/~gentzkow/rese…

143

1.0K

666

128.0K

Annabelle Michael Carrell Retweeted

Michael Black@Michael_J_Black · Dec 3, 2022

In the LLM-science discussion, I see a common misconception that science is a thing you do and that writing about it is separate and can be automated. I’ve written over 300 scientific papers and can assure you that science writing can’t be separated from science doing. Why? 1/18

470

2.0K

675

Annabelle Michael Carrell Retweeted

Surya Ganguli@SuryaGanguli · Jun 30, 2022

1/Is scale all you need for AGI?(unlikely).But our new paper "Beyond neural scaling laws:beating power law scaling via data pruning" shows how to achieve much superior exponential decay of error with dataset size rather than slow power law neural scaling arxiv.org/abs/2206.14486

154

857

320

Annabelle Michael Carrell@annabelle_cs · Jun 21, 2022

We prove open problem that Thompson sampling has optimal regret for linear quadratic control in any dimension. Previously only proven in one dimension. We develop novel lower bound on probability that TS gives an optimistic sample. @SahinLale @tkargin_ @Azizzadenesheli @caltech

aarXiv Daily@Arxiv_Daily · Jun 20, 2022

Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control deepai.org/publication/th… by @tkargin_ et al. including @SahinLale, @AnimaAnandkumar #Probability #ThompsonSampling

Annabelle Michael Carrell Retweeted

Konstantin Mishchenko@konstmish · Jun 16, 2022

Five years ago, I started my first optimization project, which was about asynchronous gradient descent. Today, I'm happy to present our new work (with @BachFrancis, M. Even and B. Woodworth) where we finally prove: Delays do not matter. arxiv.org/abs/2206.07638 🧵1/5

318

Annabelle Michael Carrell Retweeted

Petar Veličković@PetarV_93 · Jun 2, 2022

Proud to share our CLRS benchmark: probing GNNs to execute 30 diverse algorithms! ⚡️ github.com/deepmind/clrs arxiv.org/abs/2205.15659 (@icmlconf'22) Find out all about our 2-year effort below! 🧵 w/ Adrià @davidmbudden @rpascanu @AndreaBanino Misha @RaiaHadsell @BlundellCharles

269

Annabelle Michael Carrell@annabelle_cs · Apr 27, 2022

Gradient Descent provably generalizes. I should say that our thinking was shaped and influenced by the amazing work done by the one and only @DimitrisPapail, the amazing couple @roydanroy and @gkdziugaite and of course @neu_rips, @mraginsky, @mrtz, @beenwrekt

DDionysis Kalogerias@DKalogerias · Apr 27, 2022

Does full-batch Gradient Descent (GD) generalize efficiently? We provide a rather positive answer for smooth, possibly non-Lipschitz losses. Check our paper today at arxiv.org/abs/2204.12446. With @aminkarbasi, and our amazing postdocs Kostas Nikolakakis and @Farzinhaddadpou 1/n