Waleed Atallah

@wAIeedatallah

Making AI go fast @mako_dev_ai

California

Joined May 2022

347Following

217Followers

Pinned

Waleed Atallah@wAIeedatallah · May 29

mako tune model

1.0K

Waleed Atallah@wAIeedatallah · Jul 25

1% better every day

MMako@mako_dev_ai · Jul 25

MakoGenerate now reliably uses @AMD MatrixCores when generating kernels, and can generate fully functional and performant HIP code. Making it fluent in all the low level instructions available on a given hardware is critical to outperforming generic frameworks.

Waleed Atallah@wAIeedatallah · Jul 25

X subtly cooking me for not reading the article before reposting

Waleed Atallah@wAIeedatallah · Jul 24

There's gonna be a netflix movie on what happened with Windsurf. Between this, Jeff Wang's post, and all the other stuff i've heard... sounds like the Social Network lite

PPrem Qu Nair@premqnair · Jul 24

I’ve joined Cognition to continue to work on the future of software engineering. I was employee #2 at Windsurf and have worked on AI+code for years. There’s never been a more exciting time and place for it than now at Cognition. I had a place at Google DeepMind as part of the…

613

Waleed Atallah@wAIeedatallah · Jul 24

Crusoe can probably do this on Stargate alone lol

AAnissa Gardizy@anissagardizy8 · Jul 23

scoop: Crusoe is building OpenAI’s first Stargate data center. Now it wants to use its expertise as a developer to boost its own cloud business. 📈Crusoe wants to grow from $100mm to $18 billion in cloud revenue by 2030. We got the internal pitch to investors, as the firm…

Waleed Atallah@wAIeedatallah · Jul 24

Neat paper from @AMD. Can we train LLMs to estimate kernel performance metrics? (hint: you can)

MMako@mako_dev_ai · Jul 24

Omniwise: Predicting GPU Kernels Performance with LLMs This is a really cool paper from @AMD and @UofIllinois that replicates results we were seeing with proprietary models (o3, Gemini). But they do it with a finetuned Llama-3.2-3B model! 100x smaller!

Waleed Atallah@wAIeedatallah · Jul 23

there are nearly 10,000 concurrent devices training models all over the world through RL Swarm decentralised and permissionless

ggensyn@gensynai · Jul 23

RL Swarm is a peer-to-peer system for reinforcement learning. It allows you to train models collaboratively with others in the swarm, leveraging their collective intelligence. Start now 👇 github.com/gensyn-ai/rl-s… 9809 nodes connected to testnet 🐝 dashboard.gensyn.ai

4.0K

Waleed Atallah@wAIeedatallah · Jul 22

The engineering team at mako never ceases to amaze me. Accelerate EVERYTHING a 15x faster compilation pipeline helps a ton in scaling RFT for kernel generation.

MMako@mako_dev_ai · Jul 22

We just shipped 15x Faster #CUDA kernel compilation for MakoGenerate. How and why we're digging into this part of the pipeline, and a detailed blog post below 🧵

340

Waleed Atallah@wAIeedatallah · Jul 22

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Trains a DeepSeek-v3-671B model to optimize CUDA kernels using only execution-time speedup as reward. Pipeline: - SFT: Finetuned on 2.1K correct, executable CUDA variants from 6 LLMs across 250…

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jul 21

163

Waleed Atallah@wAIeedatallah · Jul 17

This is potentially the next major unlock for AI - getting all our (private) data to work together. Great work @niclane7 and team!

FFlower@flwrlabs · Jul 17

🌼 Flower, in collaboration with Kinexys by @jpmorgan and @BNYglobal, is proud to introduce Project AIKYA -- the first federated AI deployment between global-tier banks, proving that real-world collaborative financial ML models can perform better without the need to share…

383

Waleed Atallah@wAIeedatallah · Jul 16

What if this was an info-gathering op the whole time

TTBPN@tbpn · Jul 16

BREAKING: Claude Code PMs Boris Cherny and Cat Wu have returned to Anthropic after a brief stint at Cursor.

Waleed Atallah@wAIeedatallah · Jul 16

NY needs to show what we've got! not all the kernel guys are in the bay ;)

AAlex Zhang@a1zhang · Jul 16

New @GPU_MODE x Jane Street 1-day GPU programming hackathon in-person in NYC! Talks by the wonderful @tri_dao, @soumithchintala, and other PyTorch folks! If you're at #ICML25 check out more information at the Jane Street both! Register by Aug 17: bit.ly/3TS0d9I?r=qr

191

Waleed Atallah@wAIeedatallah · Jul 11

we added a leaderboard

MMako@mako_dev_ai · Jul 11

We added a leaderboard

120

Waleed Atallah@wAIeedatallah · Jul 9

What

ssophia@cis_female · Jul 9

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms github.com/triton-lang/tr…

127