emi
@technoabsurdist
@herdora_com (yc s25). math at uchicago
we built herdora because writing cuda sucks and hiring gpu engineers is impossible. we turn slow pytorch into fast gpu code. automatically. please reach out emilio [at] herdora [dot] com if you want faster/cheaper inference .
Herdora (@herdora_ai) is the Cursor for CUDA. It automatically turns your PyTorch code into optimized GPU kernels so you don't have to write CUDA. Congrats on the launch, @technoabsurdist & @gpusteve! ycombinator.com/launches/NzG-h…
Someone’s gonna release an actual “RL for kernel development” paper without measurement errors at some point and no one will believe it
🚀 @herdora_ai launched! Cursor for CUDA "Herdora turns your slow PyTorch into fast GPU code, automatically." 🌐 fondo.ai/3GVvhCJ Congrats @technoabsurdist @gpusteve!!
sometimes I accidentally run chat without agent mode and get scared by the horrible results. how do people live like that
📜 ai doesn't run on just NVIDIA anymore - it’s running on many different chips, each with different quirks, tradeoffs, and scaling behavior. today we’re launching chipbenchmark.com - a new open-source platform to monitor the ai hardware situation.
Herdora (@herdora_ai) is the Cursor for CUDA. It automatically turns your PyTorch code into optimized GPU kernels so you don't have to write CUDA. Congrats on the launch, @technoabsurdist & @gpusteve! ycombinator.com/launches/NzG-h…
looking forward to exciting times
Herdora (@herdora_ai) is the Cursor for CUDA. It automatically turns your PyTorch code into optimized GPU kernels so you don't have to write CUDA. Congrats on the launch, @technoabsurdist & @gpusteve! ycombinator.com/launches/NzG-h…
Reminds me a lot of the recent wave of (very successful) systems companies that rewrote popular frameworks like Kafka to take advantage of modern storage devices. Emi is a super talented engineer + excited to see what he builds. Still so much software to write to close the gap…
📜 new blog post: amd’s mi300x gpu has huge potential for affordable, high-throughput llm inference - but it's currently underused due to software limitations. our initial optimizations already make it ~60% more cost-effective than nvidia's h100! (1/6) (🔗 links in final post)
“let’s see what happens if I bump the project to the next major CUDA release”