Mika Senghaas
@mikasenghaas
research @primeintellect, msc data science @epfl
We did it — SYNTHETIC‑2 is complete. A planetary-scale decentralized inference run generating 4M verified reasoning samples. 1,250+ GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks. Full open-source release + technical report coming next week!
We did it — SYNTHETIC‑2 is complete. A planetary-scale decentralized inference run generating 4M verified reasoning samples. 1,250+ GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks. Full open-source release + technical report coming next week!
with synthetic-2 we scaled heterogeneity across not 1, not 2, but 3 axes in a single shot. we run on a variety of models, tasks and hardware, all while allowing completely permissionless compute contributions. on the surface it might just seem like a dataset release, but so much…
Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run. Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks. Contribute towards AGI via open, permissionless compute.
insane to see @mike64_t and @_mario_neo_ pull this off over the last few months
Introducing PCCL, the Prime Collective Communications Library — a low-level communication library built for decentralized training over the public internet, with fault tolerance as a core design principle. In testing, PCCL achieves up to 45 Gbit/s of bandwidth across datacenters…
Releasing INTELLECT-2: We’re open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: • Detailed Technical Report • INTELLECT-2 model checkpoint primeintellect.ai/blog/intellect…
we are just getting started
update: i joined @primeintellect :) cannot describe how excited i am to be joining such an incredible team and mission. there is a dire shortage of labs who are truly embracing open-source research. it’s hard to get the incentives right. you need a business model where…
legend
excited to share that TOPLOC has been accepted at ICML 2025! see you in Vancouver 🇨🇦
We wrote an extensive blog post on large-scale pipelined inference and released a VLLM integration to connect any machines over the internet to serve a model. Will be the foundation of our next synthetic-2 run (and later allow consumer GPU to join rl run)
We are excited to share a preview of our peer-to-peer decentralized inference stack Engineered for consumer GPUs and high-latency networks — plus a research roadmap to scale it to a planetary-scale decentralized inference engine.
Very excited to soon release SYNTHETIC-2, partially powered by consumer grade GPUs. Very confident that what we’ve planned for this dataset will be incredibly useful for the open source community
We are excited to share a preview of our peer-to-peer decentralized inference stack Engineered for consumer GPUs and high-latency networks — plus a research roadmap to scale it to a planetary-scale decentralized inference engine.
wrote a little something on our learnings from decentralizing inference and open-sourced 3 research codebases. tl;dr optimizing inference under decentralized constraints is worthwhile, non-trivial, and far from solved. excited to be building this with the team! more soon, when we…
We are excited to share a preview of our peer-to-peer decentralized inference stack Engineered for consumer GPUs and high-latency networks — plus a research roadmap to scale it to a planetary-scale decentralized inference engine.
Today we’re launching INTELLECT-2: The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless. Scaling towards frontier reasoning across coding, math and science.
Announcing our $15m raise — led by @foundersfund. To build our peer to peer compute and intelligence protocol. With participation from @MenloVentures and angels like @karpathy @ClementDelangue @tri_dao @dylan522p @balajis @EMostaque and many others.
reasoning has never been so accessible
Releasing SYNTHETIC-1: The largest open dataset of 2M reasoning traces from DeepSeek-R1, created by compute contributors across the globe: - SYNTHETIC-1: Verified math, coding and science reasoning traces - SYNTHETIC-1-SFT-7B: Fine-tuned on 800k samples primeintellect.ai/blog/synthetic…