Modular
@Modular
The future of AI development starts here. Sign up to our 📪 Newsletter → http://modular.com/newsletters. We are hiring → http://modular.com/careers 🚀
We’re one step closer to truly scalable, hardware-independent AI. Today’s premiere introduced major updates that simplify deployment, speed up development, and bring powerful performance to any GPU. Here's what dropped 👇

Episode 109: GPU Programming and Language Design with @clattner_llvm! 🎉
Hey @AMD devs, got an RDNA3 or RDNA4 GPU? Mojo supports general GPU programming on these cards, but we'd love help enhancing MAX model support. The flash attention kernel is the next big hurdle. Get your PR merged & we’ll send you a Mojo/MAX gamer pad: forum.modular.com/t/calling-all-…
New in MAX nightly: provide an entire MAX graph as a PyTorch custom op with the `@graph_op` decorator! This makes it easier to use everything from individual MAX kernels to full subgraphs within your existing PyTorch models. Example in our GitHub repo: github.com/modular/modula…
Join us at the top of the hour (now, 9am PT) for a great talk on PyTorch and Mojo and Adaptive, RL-based AI Systems Performance Tuning: meetup.com/ai-performance… @Modular
Tired of wrestling with low‑level GPU and accelerator code to get @PyTorch custom ops up and running? During Monday's AI Perf Engineering Meetup, @ehsanmok will show how MAX & Mojo simplify workflows with cleaner code + easier debugging. Tune in @ 9 AM PT: meetup.com/ai-performance…
Did you know that our kernel library is open source and ready for community contributions? Build a high-performance kernel in Mojo for your favorite hardware, and make a meaningful impact across the AI ecosystem (and secure awesome swag 😎). Here's a list of kernels we’d love to…
Excited to announce the @Modular x @TensorWaveCloud partnership! "With this integration, teams can now serve the same workloads for 60–70% less cost—or handle 3–7x more traffic without increasing their budget" Huge win for ML teams looking to scale efficiently! Read the full…
TensorWave + @Modular deliver better inference performance at a fraction of the cost. If you’re serving billions of tokens, the math isn’t even close. See the breakdown → na2.hubs.ly/y0vMMG0
MAX isn’t just accelerator-agnostic, it’s accelerator-optimized. Running on @tensorwavecloud MI325X GPUs, MAX delivers up to 70% lower inference costs and faster throughput than H200 + vLLM. Same models. Better performance. 📊 See the results → tensorwave.com/blog/save-up-t…
What if PyTorch ops were faster and easier to write? @ehsanmok will show how Mojo makes this possible at the virtual AI Performance Engineering Meetup on July 21st, hosted by @cfregly! RSVP: meetup.com/ai-performance…