Simo Ryu

@cloneofsimo

I like cats, math and codes [email protected]

Seoul, Korea

Joined May 2022

748Following

14KFollowers

Pinned

Simo Ryu@cloneofsimo · Apr 29

10B parameter DiT trained on 80M images, all owned by @freepik . Model commercially usable, raw model without distillation, open sourced. Proud to demonstrate first model-training project with our client @freepik: "F-Lite", from @FAL

IIván de Prado@ivanprado · Apr 29

🚀Excited to announce F Lite: a new open-source text-to-image model by @freepik and @FAL! The first at this scale that’s both open-source and trained exclusively on licensed, high-quality data.🧵

476

188

97.0K

Simo Ryu@cloneofsimo · 6 h

Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult", its unprincipledly, 'randomly' complex

MMichael R. Bock@michaelrbock · Jul 23

1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:

4.0K

Simo Ryu@cloneofsimo · 7 h

Good paper btw

4.0K

Simo Ryu@cloneofsimo · Jul 23

Why cant this be photo of the city I live

2.0K

Simo Ryu@cloneofsimo · Jul 23

Hey this is my first time seeing someone same named being famous

EEconomic Times@EconomicTimes · Jul 22

Fidji Simo to head OpenAI’s apps division; terms AI as world’s biggest opportunity engine economictimes.indiatimes.com/tech/artificia…

4.0K

Simo Ryu@cloneofsimo · Jul 22

Gm @FAL

5.0K

Simo Ryu@cloneofsimo · Jul 22

Most refreshing answer!

SShuangfei Zhai@zhaisf · Jul 22

There is a more general version of this question: why not scale up the parameters of the attention operation and make it more expressive? (you can do it as suggested below, or simply increase the dimension of QKV) The empirical answer is that it’s not nearly as effective as…

3.0K

Simo Ryu@cloneofsimo · Jul 21

Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha

267

26.0K

Simo Ryu@cloneofsimo · Jul 21

I should visit sf less often tbh My back is gon break like fp8

2.0K

Simo Ryu@cloneofsimo · Jul 21

Time to summon @main_horse "the sakana destroyer"

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jul 21

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Trains a DeepSeek-v3-671B model to optimize CUDA kernels using only execution-time speedup as reward. Pipeline: - SFT: Finetuned on 2.1K correct, executable CUDA variants from 6 LLMs across 250…

117

13.0K

Simo Ryu@cloneofsimo · Jul 20

If you were planning to train flow model that maps data -> noise, you quickly realize its completely symmetrical task as noise -> data model, only t -> 1 - t and n - x -> x - n. So in a way DDIM-inversion isn't heuristic, its perfact inverse flow. Then, if you think of role of…

4.0K

Simo Ryu@cloneofsimo · Jul 20

Maybe... just maybe... I'll also cope when Im threatened to loose my job

2.0K

Simo Ryu@cloneofsimo · Jul 19

Congrats! This is an incredible milestone and I was truly shocked by it. “Thinking for hours” means 10x or even 100x of current test-time compute, and I can’t wait to see the model think for days, months, years, centuries to solve the science challenges!

CCheng Lu@clu_cheng · Jul 19

5.0K

Simo Ryu@cloneofsimo · Jul 19

Fuck, so *this* is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul, wishful thinking that my job was any different from everyone else.

cloneofsimo's tweet image. Fuck, so *this* is what artists felt when they saw midjourney and dalle2.
Existential crisis from my soul, wishful thinking that my job was any different from everyone else.

172

17.0K

Simo Ryu@cloneofsimo · Jul 19

There is some secret sauce that isnt "lets RL on set of narrow env" But what..? If the results are not cherrypicked (whatever that could be), its so over

AAlexander Wei@alexwei_ · Jul 19

5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

4.0K