FleetingBits
@fleetingbits
sf thinkcat
are any of the near-SOTA Chinese models natively multi-modal?
Gather around, everyone! I want to tell you the story of how I discovered Bing's catmode, which is a great example of the weird hidden behaviours you can find inside large language models, and why it matters.
This is going to sound weird but Bing is generating cats for me non-stop even though I’m not specifically asking for them, even after changing devices. ASCII art yes, cats no. It was doing a variety of ASCII art on theme until the cats started — now it’s just cats. Any insights?
Not a good look for the founders of Windsurf, Varun Mohan and Douglas Chen.
I’ve joined Cognition to continue to work on the future of software engineering. I was employee #2 at Windsurf and have worked on AI+code for years. There’s never been a more exciting time and place for it than now at Cognition. I had a place at Google DeepMind as part of the…
Today's cats are about the importance of the dataset for LLMs; I like to use the world as a metaphor for the dataset; the idea is that the model is downstream of the relationships of the dataset and that architecture and training techniques are just to efficiently capture it.

what's the main lesson from Kimi-2? is there one? deepseek r1 had RLVR; deepseek v3 had its MoE, multi-head latent attention and multi-token prediction; deepseek math had GRPO; cohere command-a had model merging
there is no one on earth that wants a photo emailed to them as a google drive link
Old labels drift away on the dusk‑blue river. In clear water, I watch the night, tracing newborn stars. Free hours open paths toward cherished friends. Where I once worked, soft grass listens and grows.
