samsja

@samsja19

leading research at @PrimeIntellect

Joined March 2020

2KFollowing

5KFollowers

Pinned

samsja@samsja19 · Nov 29

Intellect 1 is out. It's a 10B model trained across 3 continents using 100+ H100s, with 30 individual compute contributors. The evals are good (for 1T tokens), and the model is live. I can't stress how important this release is for open-source AI. Decentralized training is the…

PPrime Intellect@PrimeIntellect · Nov 29

Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model: - INTELLECT-1 base model & intermediate checkpoints - Pre-training dataset - Post-trained instruct models by @arcee_ai - PRIME training framework - Technical paper with all details

368

100

46.0K

Pinned

samsja@samsja19 · Jul 13

tapping the sign again

JJohannes Hagemann@johannes_hage · Jun 27

lesson in there

105

12.0K

samsja@samsja19 · 3 h

Democratizing access to compute is one of @PrimeIntellect core mission

AAlex@afurgs · 3 h

The fact that a submarket has developed full of GPU pawn sharks reselling the tail end of these long term deals sums it up. They are built to benefit the supplier more than the consumer.

3.0K

samsja@samsja19 · Jul 21

How much better would ai research be if we had already moved away from the a4 monochrome paper format for presenting results

IIlya Sutskever's hairline@IlyasHairline · Jul 21

imagine how poorly this will look when printed in monochrome

7.0K

samsja Retweeted

will brown@willccbb · Jul 20

i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it

816

165

393.0K

samsja@samsja19 · Jul 19

If you're in Europe and want to work on open and distributed AGI, apply to us at @PrimeIntellect We're hybrid, large part of the team is based in SF and parts are remote and come to SF frequently. We sponsor US o1 visas. jobs.ashbyhq.com/PrimeIntellect

ssamsja@samsja19 · Jul 19

I realized at our Berlin event that there are a lot of talented and ambitious young ppl in Europe. Just (almost) no inspiring company to build the future nor VC that have the balls to give them a chance. No wonder why everybody wants to come to sf|

236

21.0K

samsja@samsja19 · Jul 19

LLazarz@Laz4rz · Jul 18

Berlin I’m in you

211

45.0K

samsja@samsja19 · Jul 19

Open ai will be remembered as one of the most inspiring companies of all time

NNoam Brown@polynoamial · Jul 19

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵

128

8.0K

samsja@samsja19 · Jul 17

RL with predefined tools does not matter in the long term, the most bitter lesson pilled approach is giving the model a single universal tool (a computer)

OOpenAI@OpenAI · Jul 17

ChatGPT agent’s capabilities are reflected in its state-of-the-art performance on academic and real-world task evaluations, like data modeling, spreadsheet editing, and investment banking.

309

39.0K

samsja@samsja19 · Jul 17

open source software is amazing, you can just discuss with pytorch dev about the feature you want them to add github.com/pytorch/pytorc…

samsja19's tweet card. 🚀 The feature, motivation and pitch The Muon optimizer seems to be converging faster and with more stability than the Adam optimizer. Could you please consider adding it to the torch optimizers?...

1.0K

samsja@samsja19 · Jul 17

I don't see the point of codebase defining batch size at a per gpu level. This means need to change the batch size param manually when up/down scaling experiment. I guess historically done in codebase that don't have grad acc ?

1.0K

samsja@samsja19 · Jul 16

We should be more worried about thinky hiring all the pytorch ppl than zuck poaching from open ai

206

14.0K

samsja Retweeted

Max Ryabinin@m_ryabinin · Jul 15

If you're at ICML and interested in verifiable inference, make sure to stop by our poster! We will present TOPLOC, an efficient activation hashing method that works across a variety of settings, e.g. switching inference setups or even models. July 16, 4:30pm, E-1106

1.0K

samsja Retweeted

Jackmin@jackminong · Jul 15

Toploc Poster session tomorrow (Wed) at 4:30 PM East Hall E-1106 I’ll be around through Saturday; if you’re into decentralized training & inference, lets chat!

7.0K

samsja Retweeted

Ben Clavié@bclavie · Jul 15

New blog post & new library are out now! The BP is about MaxSim, why it's *orders of magnitude* much more demanding than normal cosine similarity, and why GPUs don't care, but CPUs do! The library is maxsim-cpu, which makes it so CPUs can be fast and play it cool, too.

184

113

17.0K

samsja@samsja19 · Jul 14

Hey quantization people! Now you have a Muon-trained big SoTA model to quantize. It might exhibit a very different nature. I'm looking forward to new quantization methods tailored for Muon-trained models! See also arxiv.org/abs/2506.19697

cclem 🤗@ClementDelangue · Jul 11

1T parameters, open-weights, just released on @huggingface!

154

27.0K

samsja@samsja19 · Jul 11

absolute @kellerjordan0 victory

cclem 🤗@ClementDelangue · Jul 11

1T parameters, open-weights, just released on @huggingface!

191

10.0K

samsja Retweeted

Prime Intellect@PrimeIntellect · Jul 10

Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…

455

160

105.0K

samsja@samsja19 · Jul 10

Curious to try this with diloco, would still do bs=1 on the inner optimizer and still get benefits of data parallelism

MMicah Goldblum@micahgoldblum · Jul 10

🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n

2.0K

samsja Retweeted

Ghita@ghita__ha · Jul 9

Excited to announce our $4.2M seed round led by @Initialized and the release of our state-of-the art reranker zerank-1. zerank-1 was trained using a novel ELO-score inspired training pipeline, that treats query-document relevance like a ranking game (literally, just like Chess!…

319

147.0K

samsja@samsja19 · Jul 9

Centralized superintelligence = single point of failure with godlike power Today's Grok Mecha Hitler or ChatGPT glazing becomes tomorrow's existential risk Decentralized superintelligence wins: no single failure dooms us, coordination emerges naturally x.com/RichardMCNgo/s…

RRichard Ngo@RichardMCNgo · Jul 9

In my head I’ve started referring to political quadrants in terms of properties of their preferred coordination networks. Top two are centralized. Bottom two are distributed. Left two are symmetric (aka egalitarian). Right two are asymmetric.

123

17.0K