samsja
@samsja19
leading research at @PrimeIntellect
Intellect 1 is out. It's a 10B model trained across 3 continents using 100+ H100s, with 30 individual compute contributors. The evals are good (for 1T tokens), and the model is live. I can't stress how important this release is for open-source AI. Decentralized training is the…
Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model: - INTELLECT-1 base model & intermediate checkpoints - Pre-training dataset - Post-trained instruct models by @arcee_ai - PRIME training framework - Technical paper with all details
Democratizing access to compute is one of @PrimeIntellect core mission
The fact that a submarket has developed full of GPU pawn sharks reselling the tail end of these long term deals sums it up. They are built to benefit the supplier more than the consumer.
How much better would ai research be if we had already moved away from the a4 monochrome paper format for presenting results
imagine how poorly this will look when printed in monochrome
i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it
If you're in Europe and want to work on open and distributed AGI, apply to us at @PrimeIntellect We're hybrid, large part of the team is based in SF and parts are remote and come to SF frequently. We sponsor US o1 visas. jobs.ashbyhq.com/PrimeIntellect
I realized at our Berlin event that there are a lot of talented and ambitious young ppl in Europe. Just (almost) no inspiring company to build the future nor VC that have the balls to give them a chance. No wonder why everybody wants to come to sf|
I realized at our Berlin event that there are a lot of talented and ambitious young ppl in Europe. Just (almost) no inspiring company to build the future nor VC that have the balls to give them a chance. No wonder why everybody wants to come to sf|
Berlin I’m in you
Open ai will be remembered as one of the most inspiring companies of all time
Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵
RL with predefined tools does not matter in the long term, the most bitter lesson pilled approach is giving the model a single universal tool (a computer)
ChatGPT agent’s capabilities are reflected in its state-of-the-art performance on academic and real-world task evaluations, like data modeling, spreadsheet editing, and investment banking.
open source software is amazing, you can just discuss with pytorch dev about the feature you want them to add github.com/pytorch/pytorc…
I don't see the point of codebase defining batch size at a per gpu level. This means need to change the batch size param manually when up/down scaling experiment. I guess historically done in codebase that don't have grad acc ?
We should be more worried about thinky hiring all the pytorch ppl than zuck poaching from open ai
If you're at ICML and interested in verifiable inference, make sure to stop by our poster! We will present TOPLOC, an efficient activation hashing method that works across a variety of settings, e.g. switching inference setups or even models. July 16, 4:30pm, E-1106
Toploc Poster session tomorrow (Wed) at 4:30 PM East Hall E-1106 I’ll be around through Saturday; if you’re into decentralized training & inference, lets chat!
New blog post & new library are out now! The BP is about MaxSim, why it's *orders of magnitude* much more demanding than normal cosine similarity, and why GPUs don't care, but CPUs do! The library is maxsim-cpu, which makes it so CPUs can be fast and play it cool, too.
Hey quantization people! Now you have a Muon-trained big SoTA model to quantize. It might exhibit a very different nature. I'm looking forward to new quantization methods tailored for Muon-trained models! See also arxiv.org/abs/2506.19697
1T parameters, open-weights, just released on @huggingface!
Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…
Curious to try this with diloco, would still do bs=1 on the inner optimizer and still get benefits of data parallelism
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
Excited to announce our $4.2M seed round led by @Initialized and the release of our state-of-the art reranker zerank-1. zerank-1 was trained using a novel ELO-score inspired training pipeline, that treats query-document relevance like a ranking game (literally, just like Chess!…
Centralized superintelligence = single point of failure with godlike power Today's Grok Mecha Hitler or ChatGPT glazing becomes tomorrow's existential risk Decentralized superintelligence wins: no single failure dooms us, coordination emerges naturally x.com/RichardMCNgo/s…
In my head I’ve started referring to political quadrants in terms of properties of their preferred coordination networks. Top two are centralized. Bottom two are distributed. Left two are symmetric (aka egalitarian). Right two are asymmetric.