Shane Bergsma
@ShaneBergsma
Man bites data
Power Lines paper now out: arxiv.org/abs/2505.13738 TL;DR - we identify how AdamW's weight decay should scale with batch size, dataset size, and model size in LLM pre-training. We also investigate the scaling of both "optimal" and "critical" batch size.
Major finding #1: λ=0.1 used in the majority of LLMs is suboptimal! Our work shows that optimal weight decay (λ) scales linearly with batch size. Most researchers use the same λ regardless of batch size, leaving performance on the table.
(1/7) @CerebrasSystems Paper drop: arxiv.org/abs/2505.01618 TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇
It’s #ICLR2025 week, and we’re proud to share that Team Cerebras will be presenting their paper: "Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs" at @iclr_conf! Big congrats to the authors, your work is powering the future of AI compute.
Cerebras has set a new record for AI inference speed, serving Llama 3.1 8B at 1,850 output tokens/s and 70B at 446 output tokens/s. @CerebrasSystems has just launched their API inference offering, powered by their custom wafer-scale AI accelerator chips. Cerebras Inference is…
OMG, now food trucks are even part of the A.I. bandwagon!

My son, after reading half the books: "J.R.R. Tolkien is a man? I had no idea." Than you, @jk_rowling
I'm surprised at the number of recent articles that still mention the top hits on Google for some query, as if this was still a universal shared experience.
Wow! Smart replies now account for more than 10% of the replies on GMail. #wsdm2018
Next to our ad blockers we now need bitcoin-mining-Javascript-blockers... theregister.co.uk/2017/09/25/sho…
It is getting harder and harder to see my Solr dashboard...

It's hard to stay positive sometimes, so here's a bird realizing a love of drumming
The whole group? Wow, this migration of academics to industry is getting out of control.
Fernando!
I couldn't resist... earningmyturns.org/2017/06/a-comp…
Wikipedia (one of the supreme achievements of humanity) doesn't get enough love, so just let me say, "thank you, Wikipedia."