Daniel Vega-Myhre

@vega_myhre

ML SWE working on PyTorch

Joined March 2025

37Following

118Followers

Pinned

Just wrote an illustrated deep-dive into overlapping the compute and comms in TP+SP using Async TP. My eyeballs hurt now so hopefully somebody finds it useful :) danielvegamyhre.github.io/ml/performance…

154

142

10.0K

Daniel Vega-Myhre Retweeted

Mark Saroufim@marksaroufim · 9 h

On Sep 6 in NYC, this won't be your typical hackathon where you do your own thing in a corner and then present at the of the day. You'll deploy real models to the market, trades will happen, chaos should be expected. The fastest model is great but time to market matters more.

5.0K

Daniel Vega-Myhre Retweeted

Mihir Prabhudesai@mihirp98 · Jul 22

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

121

171

967

833

171.0K

Daniel Vega-Myhre@vega_myhre · Jul 18

I’m at #ICML2025 today presenting a poster on our paper on TorchAO at the CodeML workshop - come say hey! Paper: openreview.net/forum?id=HpqH0…

548

Daniel Vega-Myhre@vega_myhre · Jun 21

When I was at Google, maintaining high training goodput in the face of infra failures was a big challenge we faced for massive distributed training runs. Cool to see progress on fault-tolerant training happening at the framework layer.

SSoumith Chintala@soumithchintala · Jun 20

the model keeps on training even when the underlying infra keeps failing....out-of-the-box PyTorch

400

Daniel Vega-Myhre Retweeted

PyTorch@PyTorch · Apr 28

Working with @CrusoeAI's new H200 cluster, tests demonstrated 34–43% #PyTorch training acceleration at scale by leveraging TorchTitan’s HSDP2 and TorchAO’s new #float8 rowwise. Along with substantial speedups, training showed comparable convergence and stability to BF16. 📖➡️…

9.0K

Daniel Vega-Myhre@vega_myhre · Apr 29

We just demonstrated proof of stability at scale for PyTorch native float8 training with rowwise scales. Similar convergence to bfloat16 with a ~33% speedup! pytorch.org/blog/accelerat…

206

Daniel Vega-Myhre@vega_myhre · Mar 31

For any ML folks who want to deepen their understanding of ML scalability & performance techniques, I wrote an illustrated deep-dive into Megatron-style tensor parallelism: danielvegamyhre.github.io/ml/performance… any feedback is welcome!

2.0K