Simo Ryu
@cloneofsimo
I like cats, math and codes [email protected]
10B parameter DiT trained on 80M images, all owned by @freepik . Model commercially usable, raw model without distillation, open sourced. Proud to demonstrate first model-training project with our client @freepik: "F-Lite", from @FAL
🚀Excited to announce F Lite: a new open-source text-to-image model by @freepik and @FAL! The first at this scale that’s both open-source and trained exclusively on licensed, high-quality data.🧵
Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult", its unprincipledly, 'randomly' complex
1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:
Hey this is my first time seeing someone same named being famous
Fidji Simo to head OpenAI’s apps division; terms AI as world’s biggest opportunity engine economictimes.indiatimes.com/tech/artificia…
Most refreshing answer!
There is a more general version of this question: why not scale up the parameters of the attention operation and make it more expressive? (you can do it as suggested below, or simply increase the dimension of QKV) The empirical answer is that it’s not nearly as effective as…
Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha

I should visit sf less often tbh My back is gon break like fp8

Time to summon @main_horse "the sakana destroyer"
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Trains a DeepSeek-v3-671B model to optimize CUDA kernels using only execution-time speedup as reward. Pipeline: - SFT: Finetuned on 2.1K correct, executable CUDA variants from 6 LLMs across 250…
If you were planning to train flow model that maps data -> noise, you quickly realize its completely symmetrical task as noise -> data model, only t -> 1 - t and n - x -> x - n. So in a way DDIM-inversion isn't heuristic, its perfact inverse flow. Then, if you think of role of…
Maybe... just maybe... I'll also cope when Im threatened to loose my job

Congrats! This is an incredible milestone and I was truly shocked by it. “Thinking for hours” means 10x or even 100x of current test-time compute, and I can’t wait to see the model think for days, months, years, centuries to solve the science challenges!
Congrats! This is an incredible milestone and I was truly shocked by it. “Thinking for hours” means 10x or even 100x of current test-time compute, and I can’t wait to see the model think for days, months, years, centuries to solve the science challenges!
Fuck, so *this* is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul, wishful thinking that my job was any different from everyone else.

There is some secret sauce that isnt "lets RL on set of narrow env" But what..? If the results are not cherrypicked (whatever that could be), its so over
5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.