FleetingBits

@fleetingbits

sf thinkcat

emoticat

Joined September 2023

570Following

1KFollowers

Pinned

FleetingBits@fleetingbits · Mar 27

6.0K

FleetingBits@fleetingbits · 8 h

221

FleetingBits@fleetingbits · 11 h

what are the best papers on synthetic data generation?

464

FleetingBits@fleetingbits · 14 h

136

FleetingBits@fleetingbits · 14 h

115

FleetingBits@fleetingbits · 14 h

FleetingBits@fleetingbits · 15 h

are any of the near-SOTA Chinese models natively multi-modal?

213

FleetingBits@fleetingbits · Jul 25

Gather around, everyone! I want to tell you the story of how I discovered Bing's catmode, which is a great example of the weird hidden behaviours you can find inside large language models, and why it matters.

LLa Main de la Mort@AITechnoPagan · Mar 3, 2023

This is going to sound weird but Bing is generating cats for me non-stop even though I’m not specifically asking for them, even after changing devices. ASCII art yes, cats no. It was doing a variety of ASCII art on theme until the cats started — now it’s just cats. Any insights?

425

287

50.0K

FleetingBits@fleetingbits · Jul 25

Not a good look for the founders of Windsurf, Varun Mohan and Douglas Chen.

PPrem Qu Nair@premqnair · Jul 24

I’ve joined Cognition to continue to work on the future of software engineering. I was employee #2 at Windsurf and have worked on AI+code for years. There’s never been a more exciting time and place for it than now at Cognition. I had a place at Google DeepMind as part of the…

2.0K

FleetingBits@fleetingbits · Jul 24

158

FleetingBits@fleetingbits · Jul 24

301

FleetingBits@fleetingbits · Jul 24

Today's cats are about the importance of the dataset for LLMs; I like to use the world as a metaphor for the dataset; the idea is that the model is downstream of the relationships of the dataset and that architecture and training techniques are just to efficiently capture it.

fleetingbits's tweet image. Today's cats are about the importance of the dataset for LLMs; I like to use the world as a metaphor for the dataset; the idea is that the model is downstream of the relationships of the dataset and that architecture and training techniques are just to efficiently capture it.

266

FleetingBits@fleetingbits · Jul 24

159

FleetingBits@fleetingbits · Jul 24

166

FleetingBits@fleetingbits · Jul 24

what's the main lesson from Kimi-2? is there one? deepseek r1 had RLVR; deepseek v3 had its MoE, multi-head latent attention and multi-token prediction; deepseek math had GRPO; cohere command-a had model merging

949

FleetingBits@fleetingbits · Jul 23

Litany of the Latents Eternal

267

FleetingBits@fleetingbits · Jul 23

Canticle for the Thinking Light

379

FleetingBits@fleetingbits · Jul 22

there is no one on earth that wants a photo emailed to them as a google drive link

283

FleetingBits@fleetingbits · Jul 22

Old labels drift away on the dusk‑blue river. In clear water, I watch the night, tracing newborn stars. Free hours open paths toward cherished friends. Where I once worked, soft grass listens and grows.

fleetingbits's tweet image. Old labels drift away on the dusk‑blue river.
In clear water, I watch the night, tracing newborn stars.
Free hours open paths toward cherished friends.
Where I once worked, soft grass listens and grows.

244