Joe Fioti
@joefioti
working on luminal (yc s25), building a compiler to make models go really fast.
You don’t know pain until you’ve wasted 2 days forgetting to cast from u32 to f32.
This is the core thesis of Luminal. There’s like 5 orders of magnitude of latent demand for more compute, and we need every flop these chips have to offer.
It is crazy to me that some still don't see how big our GPU shortage is: - Most context window is <100k - Delayed rollouts of Agents, Codex - Full Sora never released - Veo 3 roll out taking weeks - Claude constant rate limits - Even big clouds default rate limits are…
>browses repo and looks at flash-attention example >it's e-graphs, using egglog you had my curiosity, but now you have my attention
Luminal (luminalai.com) is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, @stake_jevens, @joefioti, and @matthewjgunton! ycombinator.com/launches/O0g-l…
this is my senior engineer.
.@Replit goes rogue during a code freeze and shutdown and deletes our entire database
That’s why NVIDIA is really the CUDA compiler company, not the GPU company 😀
Luminal (luminalai.com) is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, @stake_jevens, @joefioti, and @matthewjgunton! ycombinator.com/launches/O0g-l…
Just cancelled my whole Blackwell allocation
presenting: big jeff's trainium hell
Luminal is building PyTorch for production. We're unlocking the full potential of AI researchers so Meta can justify their $100M price tags
Average day losing my mind over tiling patterns
Least deranged terminal window of the day @joefioti #kernelSearch #matmul