Alex Cheema - e/acc
@alexocheema
Building @exolabs | prev @UniOfOxford We're hiring: http://exolabs.net
The future of AI is open source and decentralized.
"Exo's use of Llama 405B and consumer-grade devices to run inference at scale on the edge shows that the future of AI is open source and decentralized." - @mo_baioumy x.com/ac_crypto/stat…
Apple has more FLOPS than NVIDIA
If every post-2020 Apple device lit up its Neural Engine at once, humanity would have ~20 zettainteger-ops-per-second of on-device AI oomph—about five times the cumulative floating-point tensor capacity of all NVIDIA GPUs sold in the same period. In practice, Nvidia’s…
‘The number of Macs that can train together coherently doubles every 2 months’; I'll call this 'Cheema's Law'. And it might sound a joke, but it has been remarkable how much progress we've made on this problem in such a short time. When you're working in a space that is mostly…
We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. Our new KPOP optimizer was designed specifically for the hardware constraints of Apple Silicon and implemented using mlx.distributed.
introducing KPOP - a novel optimiser designed to leverage the massive 512GB RAM on the latest-gen M3 Ultra Mac Studios. matches AdamW performance by using significantly larger batch sizes. all on consumer hardware. catch @MattBeton and @tychovdo presenting this at ICML
Paper is out. Link: openreview.net/pdf?id=TJjP8d5…
We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. Our new KPOP optimizer was designed specifically for the hardware constraints of Apple Silicon and implemented using mlx.distributed.
New research from Exo done (in part) with MLX on Apple silicon: An algorithm for distributed training that leverages higher RAM capacity of Apple silicon relative to FLOPs and inter-machine bandwidth.
New research from Exo done (in part) with MLX on Apple silicon: An algorithm for distributed training that leverages higher RAM capacity of Apple silicon relative to FLOPs and inter-machine bandwidth.
Paper is out. Link: openreview.net/pdf?id=TJjP8d5…
EXO 💛 MLX
New research from Exo done (in part) with MLX on Apple silicon: An algorithm for distributed training that leverages higher RAM capacity of Apple silicon relative to FLOPs and inter-machine bandwidth.
Paper is out. Link: openreview.net/pdf?id=TJjP8d5…
KPOP is a new DL optimizer designed for large scale distributed training on Apple Silicon. KPOP uses a lot more memory but is more efficient per FLOP than AdamW, so it's a better fit for hardware with a high memory:flops ratio. Some hardware numbers: H100: 80GB, 1000TFLOPS…
im in Vancouver this week for ICML. let's grab a coffee if you're interested in what we're doing @exolabs or want to chat about distributed training / inference or on-device AI, dm's open
A new approach to efficient large scale distributed training on Apple Silicon. Most AI research today is focused on traditional GPUs. These GPUs have a LOT of FLOPS but not much memory. They have a low memory:flops ratio. Apple Silicon has a lot more memory available for the GPU…
💸OVERDRAFT - the first fiat DEX. Swap fiat <> crypto in seconds. No custody, no fees, no fund freeze. Beta now live @HyperLiquidX
EXO isn't just for inference.
A new approach to efficient large scale distributed training on Apple Silicon. Most AI research today is focused on traditional GPUs. These GPUs have a LOT of FLOPS but not much memory. They have a low memory:flops ratio. Apple Silicon has a lot more memory available for the GPU…
A new approach to efficient large scale distributed training on Apple Silicon. Most AI research today is focused on traditional GPUs. These GPUs have a LOT of FLOPS but not much memory. They have a low memory:flops ratio. Apple Silicon has a lot more memory available for the GPU…
I’m going to be in Vancouver next week for ICML! Would love to meet anyone involved with distributed training, infrastructure, inference engines, open source AI. I'll be presenting two papers: - EXO Gym - an open source framework for simulating distributed training algorithms…
if they ever tell my story, let them say I walked with giants; men rise and fall like the winter wheat, but these names will never die.
HomeDAO Cohort 1 produced $6bn worth of companies. Apply to be part of HomeDAO Cohort 2 now. Deadline - 15th of July.
pump is one of the fastest growing startup ever. 0 to $1B ARR in 9 months. 25% of revenue for $PUMP buy backs is insane, i'm predicting this ends up in the top 10.
the moment you’ve all been waiting for $PUMP is launching through an Initial Coin Offering on Saturday, July 12th. airdrop coming soon. our plan is to Kill Facebook, TikTok, and Twitch. On Solana. learn more about $PUMP and how to get involved 👇
We’re already doing this with @exolabs Last month was the first trial, we provided free M-chip public cloud access to developers at a hackathon. These were M3 Max/Ultra Mac Studios with up to 512GB unified memory. @awnihannun gave a talk at the hackathon on how to leverage MLX
NEWS: Apple is considering turning its M chips into a public cloud for developers
He applied to @exolabs last year
PSA: there’s a guy named Soham Parekh (in India) who works at 3-4 startups at the same time. He’s been preying on YC companies and more. Beware. I fired this guy in his first week and told him to stop lying / scamming people. He hasn’t stopped a year later. No more excuses.