Wanchao Liang
@wanchao_
building @thinkymachines ex-PyTorch @ Meta. Author of PyTorch DTensor and TorchTitan. Opinions are my own
The @PyTorch team is developing a library for large model training called torchtitan 👀 They have scripts to train Llama-3 from scratch The library went public today on GitHub but it is still in pre-release state & active development Check it out → github.com/pytorch/torcht…
I’ll be presenting TorchTitan: a PyTorch native platform for training foundation models tomorrow at the ICML @ESFoMo workshop! Come and say Hi!
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
Excited to share that I joined @thinkymachines recently! It’s been an incredible experience so far working alongside many talented folks here. We are building multimodal AI that are collaborative with human, as well as a great research infra to accelerate AI and science!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
This is starting to feel more like a conference, less like a course every day. We're now having the amazing @wanchao_ as a guest speaker talking about TorchTitan and DTensors!
torchft + TorchTitan: 1200+ failures, no checkpoints, model convergence. A Llama 3 model was trained across 300 L40S GPUs with synthetic failures every 15s. No restarts. No rollbacks. Just asynchronous recovery and continued progress. 📘 hubs.la/Q03t1Z0b0 #PyTorch…
Pytorch docs are sometimes lacking, especially new features lack of real-life code examples. You would read through implementations or codebases. Now, here is @OpenAI deep research's result on DTensor. It basically read through all the torch doc / github issues and Its really…
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10
We have been working on PyTorch native float8 and FSDP2 for distributed training. Check out TorchTitan and TorchAO/float8 dev-discuss.pytorch.org/t/enabling-flo… with Andrew Gu, @wanchao_ , @drisspg , @vkuzo , @brian_hirsh
Announcing the alpha release of torchtune! torchtune is a PyTorch-native library for fine-tuning LLMs. It combines hackable memory-efficient fine-tuning recipes with integrations into your favorite tools. Get started fine-tuning today! Details: hubs.la/Q02t214F0
🚨New🌟blog✍️ on ⏩ maximizing🌙 FLOPS 🚀 Training large models requires maximizing flops/GPU, especially at scale. Excited to share a few of the cool tricks in thread👀. 1/N
This is a good question, it gets to the root of the tradeoff between performance and flexibility so how do PyTorch folks think about this? Long answer: So if we're in a world where a single base model can be fine-tuned over all tasks and we're fairly certain that this base model…
why use pytorch/jax at all? why don't people just write CUDA programs?
PyTorch 2.0 Q&A: 🗓️ March 1 ⏰ 11am PT ✅ Register: hubs.la/Q01DvW9Q0 Introduction to 2-D Parallelism (FSDP + Tensor Parallel) to train large scale ViT models and Introduction to PyTorch DistributedTensor. Join @wanchao_ & Junjie Wang Host DA: @shshnkp
Excited about the future of PyTorch 2.0!
The PyTorch roadmap by @soumithchintala. Points: More speed with the same flexibility, dynamic shapes & graphs, with the TorchDynamo+TorchInductor compiler. A more compact backend. Simpler distributed training.
We just introduced PyTorch 2.0 at the #PyTorchConference, introducing torch.compile! Available in the nightlies today, stable release Early March 2023. Read the full post: bit.ly/3VNysOA 🧵below! 1/5
Excited to see many awesome community members in person at #PyTorchConference tomorrow! Some major announcements are coming too…
Google is done. Compare the quality of these responses (ChatGPT)
Wanchao has posted an RFC for distributed tensors in PyTorch at github.com/pytorch/pytorc… ; if you're interested in tensor parallel distributed training check it out! He'll also be at PyTorch Conference, if you want to chat with him IRL (sign up at pytorchconference22.splashthat.com)