Anoop Saha
@obviousanoop
I correlate; therefore, I cause! Tweets on semiconductor, AI infrastructure (from chip to systems), and unit economics. Distributed training
I have said it before and I will say it again. Going to office and meeting colleagues and customers in person has a disproportionately high RoI!! For the individual..
This is easily the best set of lectures in AI. It goes deep into GPU architecture to model design to distributed computing, combining multiple fields into one.
Our latest CS336 Language Modeling from Scratch lectures are now available! View the entire playlist here: youtube.com/playlist?list=…
What you need to read, if you want to work with LLMs on scale.
“San Francisco is a mindset. The home of AI” - Mayor @DanielLurie at the #whartonforum. And as the capital of AI, the only mindset you need is that you will use your GPUs efficiently.

Leadership needs courage. Indomitable @vkhosla at the #whartonforum



torchft + TorchTitan: 1200+ failures, no checkpoints, model convergence. A Llama 3 model was trained across 300 L40S GPUs with synthetic failures every 15s. No restarts. No rollbacks. Just asynchronous recovery and continued progress. 📘 hubs.la/Q03t1Z0b0 #PyTorch…
Models must constantly evolve and adapt to their environment.
Congratulations to @SarvamAI team…
It's official 🇮🇳 We're proud to announce that Sarvam has been selected by the Government of India under the IndiaAI Mission to build India's sovereign Large Language Model. Building India's sovereign model from the ground up is a crucial step toward Atmanirbhar Bharat. The…
🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min…
I'm delighted to have joined my good friend and colleague @NoamShazeer for a 2+hour conversation with @dwarkesh_sp about a wide range of topics (early Google, ML hardware, training trillion token LLMs in 2007, model sparsity, continual learning, and more). Thanks for a fantastic…
The @JeffDean & @NoamShazeer episode. We talk about 25 years at Google, from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and soon to ASI. My favorite part was Jeff's vision for AGI as one giant MoE that is grown in bits and pieces over time like a forest,…
DeepSeek-R1 now available on both Amazon Bedrock and SageMaker AI. Have at it.
🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being…
🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
I'm surprised that I can't find any posts on X about 'Distributed shared memory', which is an important technology for the NVIDIA Hopper architecture. So, let's analyze Nvidia's patents related to it. $NVDA developer.nvidia.com/ko-kr/blog/nvi…
Regarding this $SMCI fiasco, here is the 8K SuperMicro themselves filed. Underlined in red is what their (former) accountant is willing to agree with. This thread has been a long time coming, but I didn't want to write it. It is very clearly time. 🧵
Come see the scientists that should've shared the Nobel Prize for "Computational Protein Design", speak: ipd.uw.edu/protein-design…
Arm Holdings is cancelling a license allowing Qualcomm to use Arm intellectual property to design chips, escalating their legal dispute, Bloomberg reports, and threatening to roil the smartphone and PC markets as Qualcomm sells hundreds of millions of processors each year. Arm…
2¹³⁶²⁷⁹⁸⁴¹−1, discovered today, is the largest known prime. It's a Mersenne prime (2ᵖ-1), which are easier to find. It took nearly 6 years for the GIMPS software to find it after the previous largest known prime. It was also the first Mersenne prime found using GPUs.