Mario Sieg (@_mario_neo_)

Pinned

M

Mario Sieg@_mario_neo_ · Jan 3

I built my own PyTorch from scratch over the last 5 months in C and modern Python. Check it out on GitHub: github.com/MarioSieg/magn… Fenyman's quote: What I cannot create, I do not understand. Building my own programming language, game engine and now my machine learning framework…

_mario_neo_'s tweet card. (WIP) A small but powerful, homemade PyTorch from scratch. - MarioSieg/magnetron

51

166

3.0K

2.0K

252.0K

Mario Sieg Retweeted

V

Vincent Weisser@vincentweisser · Jul 25

great list, missing (totally unbiased selection): @johannes_hage @samsja19 @MatternJustus @Grad62304977 @jackminong @mike64_t @_mario_neo_

0

1

21

4

1.0K

M

Mario Sieg@_mario_neo_ · Jul 16

Upcoming features of piquant - our blazingly fast quantization library - int2 quantization - direct quanitization of bf16 tensors - sign quantization - SIMD kernels for stochastic rounding

0

8

1

447

M

Mario Sieg@_mario_neo_ · Jul 15

Sometimes I have random creative "attacks" where I build random stuff. Last time it was techno music generated with pure code, this time it's a small cryptocurrency... It's not about money, it's about exploring, learning and having fun. This approach taught me 99% of what I…

0

12

0

566

Mario Sieg Retweeted

P

Prime Intellect@PrimeIntellect · Jun 23

Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run. Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks. Contribute towards AGI via open, permissionless compute.

63

201

1.0K

398

344.0K

M

Mario Sieg@_mario_neo_ · Jun 25

Our fast quantization library piquant will support 2-bit quantization and new 4-bit kernels for even higher performance on AVX-512 CPUs in the next release. Get ready to crunch those packed integers!

2

1

26

3

2.0K

Mario Sieg Retweeted

M

Mathieu@miniapeur · Jun 23

10

320

3.0K

226

72.0K

M

Mario Sieg@_mario_neo_ · Jun 9

Seems like I've come to a point where my C code crashes a modern LLVM compiler and makes it spit out LLVM IR 🙄

0

4

0

377

M

Mario Sieg@_mario_neo_ · Jun 6

This is not PyTorch. It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT. Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more. The API got very close to PyTorch the last month, more to come!…

_mario_neo_'s tweet image. This is not PyTorch.

It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT.
Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more.
The API got very close to PyTorch the last month, more to come!…

1

27

5

1.0K

M

Mario Sieg@_mario_neo_ · Jun 4

To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors. Boolean tensors are used for filtering, indexing and as attention and loss masks and much more. The main logical operators AND, OR, XOR and NOT are now supported. Another more step towards LLM…

_mario_neo_'s tweet image. To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors.

Boolean tensors are used for filtering, indexing and as attention and loss masks and much more.
The main logical operators AND, OR, XOR and NOT are now supported.

Another more step towards LLM…

0

1

14

1

523

M

Mario Sieg@_mario_neo_ · May 23

Awesome work by @_mario_neo_ to accelerate quantization of pseudo-gradients in decentralized training settings like DiLoCo - already integrated in pccl (prime collective communication library)

PPrime Intellect@PrimeIntellect · May 23

Introducing pi-quant, the Prime Intellect Fast Quantization Library. Hand-tuned, parallel CPU per-tensor quantization kernels, over 2x faster than PyTorch on all tested hardware. Optimized for various CPU architectures.

5

7

69

7

11.0K

M

Mario Sieg@_mario_neo_ · May 23

great work by @_mario_neo_, already integrated into PCCL to make quantization of pseudo-gradients in DiLoCo lightning fast

PPrime Intellect@PrimeIntellect · May 23

Introducing pi-quant, the Prime Intellect Fast Quantization Library. Hand-tuned, parallel CPU per-tensor quantization kernels, over 2x faster than PyTorch on all tested hardware. Optimized for various CPU architectures.

0

6

58

4

6.0K

M

Mario Sieg@_mario_neo_ · May 23

Another C++ library developed for the unique requirements of PCCL. Great work by @_mario_neo_

PPrime Intellect@PrimeIntellect · May 23

Introducing pi-quant, the Prime Intellect Fast Quantization Library. Hand-tuned, parallel CPU per-tensor quantization kernels, over 2x faster than PyTorch on all tested hardware. Optimized for various CPU architectures.

0

7

65

9

7.0K

M

Mario Sieg@_mario_neo_ · May 23

Let the CPUs go brrrr! We will also continue adding more fine tuned kernels for even more CPUs.

PPrime Intellect@PrimeIntellect · May 23

Introducing pi-quant, the Prime Intellect Fast Quantization Library. Hand-tuned, parallel CPU per-tensor quantization kernels, over 2x faster than PyTorch on all tested hardware. Optimized for various CPU architectures.

5

2

69

14

7.0K

Mario Sieg Retweeted

P

Prime Intellect@PrimeIntellect · May 21

Introducing PCCL, the Prime Collective Communications Library — a low-level communication library built for decentralized training over the public internet, with fault tolerance as a core design principle. In testing, PCCL achieves up to 45 Gbit/s of bandwidth across datacenters…

16

105

696

224

141.0K

M

Mario Sieg@_mario_neo_ · May 21

the team (@mike64_t, @_mario_neo_ et al.) is cooking

PPrime Intellect@PrimeIntellect · May 21

Introducing PCCL, the Prime Collective Communications Library — a low-level communication library built for decentralized training over the public internet, with fault tolerance as a core design principle. In testing, PCCL achieves up to 45 Gbit/s of bandwidth across datacenters…

1

2

32

2

2.0K