Federico Cassano

@ellev3n11

training @cursor_ai prev @neu_prl, @scale_AI, @Roblox, @trailofbits

San Francisco - Milan

Joined September 2020

234Following

2KFollowers

been a @system76 customer for many years. what is this ID verification i have to do to buy a laptop... seems a bit insane

315

Federico Cassano@ellev3n11 · Jun 28

TIL Llama 4 2T is training with FP8 at 390 TFLOPS

2.0K

Federico Cassano@ellev3n11 · Jun 28

Does anyone know someone who loves Slurm, lots of GPUs, and despises Kubernetes? Would love to chat with them

27.0K

Federico Cassano Retweeted

tender@tenderizzation · Jun 16

the four blackwells in a GB200 node when the CPU isn’t bothering them (they’re all replaying CUDA graph captures)

328

18.0K

Federico Cassano@ellev3n11 · Jun 6

careful in updating transformers. the new version puts the chat template in some new file in the model directory, not in tokenizer_config.json; big breaking change

2.0K

Federico Cassano@ellev3n11 · Jun 4

Our models have lower loss and higher MFU because of BugBot

FFederico Cassano@ellev3n11 · Jun 4

BugBot has saved me from so many bugs that missed human review, people should try it! I found it to be especially useful for figuring out buggy edge cases in complex logic.

3.0K

Federico Cassano@ellev3n11 · Jun 4

BugBot has saved me from so many bugs that missed human review, people should try it! I found it to be especially useful for figuring out buggy edge cases in complex logic.

CCursor@cursor_ai · Jun 4

Cursor 1.0 is out now! Cursor can now review your code, remember its mistakes, and work on dozens of tasks in the background.

6.0K

Federico Cassano@ellev3n11 · Jun 3

my rl codebase is multi-step, multi-objective, multi-reward, multi-environment, btw

12.0K

Federico Cassano Retweeted

Cursor@cursor_ai · May 29

A conversation on the optimal reward for coding agents, infinite context models, and real-time RL

139

2.0K

1.0K

267.0K

Federico Cassano@ellev3n11 · May 21

not bullish on the diffusion models. they are much more expensive to train; only give benefits on decode speed. the GB200 NVL72 + distributed GEMMs + speculation will just solve decode bottleneck for big AR models.

1.0K

Federico Cassano Retweeted

Prime Intellect@PrimeIntellect · May 21

Introducing PCCL, the Prime Collective Communications Library — a low-level communication library built for decentralized training over the public internet, with fault tolerance as a core design principle. In testing, PCCL achieves up to 45 Gbit/s of bandwidth across datacenters…

105

696

225

141.0K

Federico Cassano@ellev3n11 · May 3

when you deploy the randomly-initialized weights

☯☯︎Cyber Taoist☯︎五道杠少年@136Division · May 2

Just moments ago, a robot in a lab suddenly went berserk, marking the first robot rebellion in human history.

912

Federico Cassano@ellev3n11 · Apr 28

I need a break from this, when is it coming out?

686

Federico Cassano Retweeted

Aman Sanger@amanrsanger · Apr 28

Cursor writes almost 1 billion lines of accepted code a day. To put it in perspective, the entire world produces just a few billion lines a day.

251

297

5.0K

744

761.0K

Federico Cassano@ellev3n11 · Apr 16

has someone built a better pytorch memory_viz? pytorch.org/memory_viz this one crashes with large snapshots and has weird UI bugs that hide the stack trace

391

Federico Cassano@ellev3n11 · Apr 14

"petabyte-scale" sounds funny now that storage is super cheap

467

Federico Cassano@ellev3n11 · Apr 5

trying out some new puppies

825

Federico Cassano@ellev3n11 · Mar 31

the saratoga water is so sparkly it's crazy

506

Federico Cassano@ellev3n11 · Mar 22

woof woof

1.0K

Federico Cassano@ellev3n11 · Mar 21

Incredibly excited to work with Sasha!

SSasha Rush@srush_nlp · Mar 20

Some personal news: I recently joined Cursor. Cursor is a small, ambitious team, and they’ve created my favorite AI systems. We’re now building frontier RL models at scale in real-world coding environments. Excited for how good coding is going to be.

852