Henry Ko
@henryHM_ko
performance and efficiency in ML | CS @ UC Berkeley, @BerkeleyML
I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ henryhmko.github.io/posts/tpu/tpu.…



looking for my next thing! thinking about dropping out. would love to learn more about opportunities within hardware acceleration or interpretability. dms are open. happy to chat. would love to hear what makes you excited!
per the event description: Viren Jain is a Senior Staff Research Scientist at Google in Mountain View, California, where he leads Google’s Connectomics team., responsible for tools like SegCLR and TensorStore. lu.ma/067stci3
Introducing the first open-source implementation of native sparse attention: github.com/fla-org/native…. Give it a spin and cook your NSA model! 🐳🐳🐳
Do you want to train massive deep learning models with ease? Our 10 new tutorial notebooks of our popular UvA DL course show you how, implementing data, pipeline and tensor parallelism (and more) from scratch in JAX+Flax! 🚀🚀 Check them out here: uvadlc-notebooks.readthedocs.io/en/latest/tuto… 🧵 1/11
Introducing our latest technical report: Context Rot - How Increasing Input Tokens Impacts LLM Performance Our results reveal that models do not use their context uniformly. full report in replies
Context windows are huge now (1M+ tokens) but context depth remains limited. Attention can only resolve one link at a time. Our tiny 5-layer model beats GPT-4.5 on a task requiring deep recursion. How? It learned to divide & conquer. Why this matters🧵
Recordings: youtube.com/watch?v=Kvw_d3…
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET! MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
NVIDIA Tensor Core Evolution From Volta To Blackwell Amdahl’s Law, Strong Scaling Asynchronous Execution Blackwell, Hopper, Ampere, Turing, Volta semianalysis.com/2025/06/23/nvi…
Great deep dive into TPUs with amazing visuals by our very own @henryHM_ko!
I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ henryhmko.github.io/posts/tpu/tpu.…
ksim is a JAX-based framework that makes your wackiest RL ideas simple to implement. Why use it? It's modular. Trying new architectures, updating rollout logic, and reformulating your objective is as easy as overriding a method. github.com/kscalelabs/ksim
In the last month, we’ve been building an open-source framework for robot learning and sim-to-real transfer, made for RL whole-body control from simple walking to complex human imitation Check out the details on HN: news.ycombinator.com/item?id=440221… Get started in 5 minutes ⬇️
(1/5) I’m pleased to share that my research with @seowondeog12052 has been accepted to RECOMB 2025 (Poster) and IEEE EMBC 2025 (Paper)! Preprint: arxiv.org/abs/2501.14469 We introduce a generative approach to pesticide design—optimizing small molecules to reduce toxicity.
Google's TPUv7 is out! ML accelerator marketing material is usually pretty inscrutable (what numbers are even comparable?), so here I'll explain concretely how this compares with Nvidia. 🧵
excited to share what I’ve been working on @trychroma! we introduce representative generative benchmarking - custom eval sets built from your own data link to technical report in replies
Join us on Monday 3/10 for our latest installment of the BioML @ Berkeley seminar series! We'll be learning from the exceptional Elana Simon (@ElanaPearl) about mechanistic interpretability in BioML. lu.ma/guiyjbf9
I've uploaded the latest slides & beamer source code to github.com/sustcsonglin/l…. Hopefully this repository will help train an LLM that generates Beamer slides better than I do :)
Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (@SonglinYang4 MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A
Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (@SonglinYang4 MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,…
I've created slides for those curious about the recent rapid progress in linear attention: from linear attention to Lightning-Attention, Mamba2, DeltaNet, and TTT/Titans. Check it out here: sustcsonglin.github.io/assets/pdf/tal…
MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era We are thrilled to introduce our latest open-source models: the foundational language model MiniMax-Text-01 and the visual multi-modal model MiniMax-VL-01. 💪Innovative Lightning Attention…