Alexander Long

@_AlexanderLong

Founder @PluralisHQ | ML PhD Protocol Learning: Multi-participant, low-bandwidth model parallel

Joined July 2023

912Following

2KFollowers

Pinned

Alexander Long@_AlexanderLong · May 29

When I started Pluralis I thought it was going to take 2-3 years and several papers to get to this point. I was considered borderline delusional to hold that view. Well: computational graph itself is split over nodes. Nodes are physically in different continents. 8B model. No…

PPluralis Research@PluralisHQ · May 29

Almost exactly 1 year ago today Pluralis set out to solve a very difficult problem. We are pleased to announce an update on that problem. We can now combine small consumer devices, connected via the internet, to train giant models. Full paper release in 72 hours.

204

27.0K

Pinned

Alexander Long Retweeted

Jake Brukhman 🚀 deAI Summer 2025@jbrukh · Jul 17

Welcome to DeAI Summer. My op ed in @coindesk on the progress we're making in decentralized AI. coindesk.com/opinion/2025/0…

5.0K

Alexander Long Retweeted

roon@tszzl · Jul 26

pretraining is an elegant science, done by mathematicians who sit in cold rooms writing optimization theory on blackboards, engineers with total absorb of distributed systems of titanic scale posttraining is hair raising cowboy research where people drinking a lot of diet coke…

188

3.0K

709

215.0K

Alexander Long@_AlexanderLong · Jul 24

Missing the point that it'll be a system who's behaviour is controlled by a few people, in companies that don't have the best track record when it comes to this kind of thing. It's fundamentally a level of power that's never existed before it's that simple.

HHaider.@slow_developer · Jul 23

Sam Altman says people are growing emotionally overreliant on AI especially young people who can't make any decisions without ChatGPT Imagine a U.S president or CEO handing control to a system they don't fully understand: "ChatGPT-7, you're in charge. Good luck."

861

Alexander Long@_AlexanderLong · Jul 22

Thats a wrap for ICML2025. Incredible to watch the space go from "What are you talking about" to "That's impossible" to "Hmmm thats very interesting" in just over a year. @tha_ajanthan @hmdolatabadi

_AlexanderLong's tweet image. Thats a wrap for ICML2025. Incredible to watch the space go from "What are you talking about" to "That's impossible" to "Hmmm thats very interesting" in just over a year. @tha_ajanthan @hmdolatabadi

2.0K

Alexander Long Retweeted

vitrupo@vitrupo · Jul 21

Jack Dorsey says AI must be permissionless because constraint kills innovation. Five CEOs shouldn't dictate what brings humanity forward. Open source is the answer. To protect ourselves, we have to race ahead. Eliminating single points of failure before they become…

284

419

3.0K

1.0K

444.0K

Alexander Long Retweeted

Nat McAleese@__nmca__ · Jul 19

If you claim to have seen this coming, you better have the manifold P&L to back it up

140

22.0K

Alexander Long@_AlexanderLong · Jul 19

Noam tends to not exaggerate.

NNoam Brown@polynoamial · Jul 19

Where does this go? As fast as recent AI progress has been, I fully expect the trend to continue. Importantly, I think we’re close to AI substantially contributing to scientific discovery. There’s a big difference between AI slightly below top human performance vs slightly above.

707

Alexander Long@_AlexanderLong · Jul 17

Totally agree - Flower labs another group actively publishing great stuff and now squarely focused on decentralised training. Should be a major datapoint for everyone still skeptical of the area - flower team is as legitimate as it gets and Nic Lane is pretty much top of the…

nnic lane@niclane7 · Jul 16

Congrats on the paper @_AlexanderLong. But your left out @flwrlabs that published a full system (photon) with validated in-the-wild fully decentralized training up to 13B @MLSysConf. Along with a key technique of the decentralized stack (decoupled embeddings) published as an oral…

934

Alexander Long Retweeted

kelington@kelxyz_ · Jul 15

Pluralis + Prime Intellect + Ambient flips Ethereum by 2030

131

12.0K

Alexander Long@_AlexanderLong · Jul 15

From my experience, getting a paper on decentralized DL accepted to top-level conferences can be quite tough. The motivation is not familiar to many reviewers, and standard experiment settings don't account for the problems you aim to solve. Hence, I'm very excited to see…

AAlexander Long@_AlexanderLong · Jul 14

For people not familiar with AI publishing; there are 3 main conferences every year. ICML, ICLR and NeurIPS. These are technical conferences and the equivalent of journals in other disciplines - they are the main publishing venue for AI. The competition to have papers at these…

7.0K

Alexander Long@_AlexanderLong · Jul 15

Feel like meta closing models was very predictable. I explicitly said this would happen last year and explained why (from blog.pluralis.ai/p/article-2-pr…).

SShane Gu@shaneguML · Jul 14

RIP to the unicorn AI startups that have zero products, zero foundation models, and were just going to depend on big labs releasing open-source models for free to model merge. I know one or two.

4.0K

Alexander Long@_AlexanderLong · Jul 14

Hidden in the article "Furthermore, there have been some billion dollar offers that were not accepted by researcher/engineering leadership at OpenAI." I believe if @dylan522p wrote it that its true... but how can that be possible?

SSemiAnalysis@SemiAnalysis_ · Jul 14

Zucc confirms that they are building multiple GW clusters. 1000MW Prometheus will come online in 2026 and Hyperion will be built in phases and be able to scale to 5000MW+. For context, the largest H100/H200 clusters operational today is only 150-200MW. Link to full article below…

439

Alexander Long@_AlexanderLong · Jul 14

Spoke 50 minutes straight to a packed room of cracked AI researchers at ICML, presenting work by @akashnet_, @PrimeIntellect, @gensynai, @NousResearch, @PluralisHQ, and @GoogleDeepMind. There is now an enormous interest in DeAI. Mission (Partially) Accomplished.

AAmanda@vakaytion · Jul 13

@gregosuri LIVE at #ICML2025 on “Distributed Computing Architectures as a Solution to AI's Energy Crisis: Empirical Analysis of Decentralized Training”

201

11.0K

Alexander Long@_AlexanderLong · Jul 14

People forget Policy Gradient based RL is the most data-inefficient form of training. Going to be major algorithmic advances in RL'ing the base models, probably using something like artificial curiosity (arxiv.org/pdf/1705.05363). But the current methods are not there.

AAndrej Karpathy@karpathy · Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

617

Alexander Long@_AlexanderLong · Jul 14

PPluralis Research@PluralisHQ · Jul 14

Pluralis has a main tack paper at ICML this week and the team is in Vancouver running several events. The best will be the Open Source Mixer we're running with @gensynai and @akashnet_ Thursday night. For anyone interested in decentralised training it should be a great evening.…

20.0K

Alexander Long@_AlexanderLong · Jun 30

Using beautiful Grafana dashboards for everything internally, so much nicer than Tensorboard. Wandb still good but doesn't really work with decentralised training. Makes me wonder what the internal vis tooling is like in openai - must be incredible.

_AlexanderLong's tweet image. Using beautiful Grafana dashboards for everything internally, so much nicer than Tensorboard. Wandb still good but doesn't really work with decentralised training. Makes me wonder what the internal vis tooling is like in openai - must be incredible.

4.0K

Alexander Long@_AlexanderLong · Jun 14

In Dec 2023 I spent like 3 days non stop compulsively reading all of Max's papers. SWARM had 1 citation at that point. No one I talked to had ever heard of it. Decided to start Pluralis after reading. Only other time I ever felt like that was 2015 when I learned about RL.

FFerdinand Mom@FerdinandMom · Jun 12

The research paper video review on "Swarm Parallelism" along with the author @m_ryabinin, Distinguished Research Scientist @togethercompute is now out ! Link below 👇 For context, most decentralized training today follows DDP-style approaches requiring full model replication on…

3.0K

Alexander Long@_AlexanderLong · Jun 5

Here’s an accessible breakdown of @PluralisHQ’s incredible paper. When we train large models on decentralized networks, the idea is to break them down into pieces and have different nodes process the different pieces. There are a few ways to do this. One way is low hanging…

PPluralis Research@PluralisHQ · Jun 3

We've reached a major milestone in fully decentralized training: for the first time, we've demonstrated that a large language model can be split and trained across consumer devices connected over the internet - with no loss in speed or performance.

10.0K