Sasha Rush

@srush_nlp

Programmer, professor, currently in the bay area https://www.youtube.com/@srush_nlp

New York, NY

Joined December 2015

481Following

69KFollowers

Pinned

Sasha Rush Retweeted

Cursor@cursor_ai · Jun 30

Cursor is now on your phone and on the web. Spin off dozens of agents and review them later in your editor.

473

855

9.0K

3.0K

1.8M

Sasha Rush@srush_nlp · Jul 23

o3 since this was driving me crazy: A type that implements Rust’s Try trait—like Result or Option—is a “fallible wrapper” that can produce either a success value (Output) or short-circuit with its error/none form (Residual). The Residual itself is that error/early-exit shape, and…

DDmitrii Kovanikov@ChShersh · Jul 22

After my recent news, many people have asked me: "Why not Rust?" Here's my answer:

6.0K

Sasha Rush Retweeted

Mihir Prabhudesai@mihirp98 · Jul 22

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

122

172

977

841

175.0K

Sasha Rush Retweeted

Ludwig Yeetgenstein@yeetgenstein · Jul 20

20 years ago, this type of person would become an elite math professor. Now they're making AI breakthroughs. This is progress (probably)!

489

160

45.0K

Sasha Rush Retweeted

Alexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

406

1.0K

7.0K

2.0K

5.3M

Sasha Rush@srush_nlp · Jul 17

One ML trend I've been grappling with is that we are post-abstraction. Since everyone is building roughly the same "one big model" there isn't really much need generality. We're roughly converging to giant, straight line, assembly-coded systems that get recoded each year.

10.0K

Sasha Rush@srush_nlp · Jul 16

Scaling Data-Constrained LMs is now also in JMLR: jmlr.org/papers/v26/24-… Looking back at it 2yrs later, repeating & mixing seem standard now, but maybe another powerful lever to scale data-constrained LMs turns out to have been RL - arguably underrated back then!

NNeurIPS Conference@NeurIPSConf · Dec 11, 2023

**Outstanding Main Track Runner-Ups** Scaling Data-Constrained Language Models Direct Preference Optimization: Your Language Model is Secretly a Reward Model

9.0K

Sasha Rush Retweeted

Justin T Chiu@justintchiu · Jul 15

haven't made a new blog post in over a year, so here's a new one: justintchiu.com/blog/sftrl/ it's short

178

174

15.0K

Sasha Rush@srush_nlp · Jul 14

Can we build an operating system entirely powered by neural networks? Introducing NeuralOS: towards a generative OS that directly predicts screen images from user inputs. Try it live: neural-os.com Paper: huggingface.co/papers/2507.08… Inspired by @karpathy's vision. 1/5

AAndrej Karpathy@karpathy · May 1

"Chatting" with LLM feels like using an 80s computer terminal. The GUI hasn't been invented, yet but imo some properties of it can start to be predicted. 1 it will be visual (like GUIs of the past) because vision (pictures, charts, animations, not so much reading) is the 10-lane…

184

25.0K

Sasha Rush Retweeted

Simon Shaolei Du@SimonShaoleiDu · Jul 14

Can transformers analyze code efficiently? ✅ Yes. We prove transformers efficiently handle real compiler tasks (AST construction, symbol resolution, type infer) using only log size—while RNNs require linear size (in input length). Paper: arxiv.org/abs/2410.14706 #COLM2025

364

236

31.0K

Sasha Rush@srush_nlp · Jul 14

L1 is heading to COLM! We've released 5 new open L1 models and the Massive-Math dataset to celebrate:

PPranjal Aggarwal ✈️ ICML 2025@PranjalAggarw16 · Jul 13

Super excited to see L1 accepted to #COLM2025! We are further open-sourcing 5 new models & a dataset: 1. L1-7B & L1-8B: Exact and Max variants 2. L1-1.5B-Short: Short reasoning model (SRM), RL-trained on 1.2M data points 3. Massive-Math-455K: A clean, unified math dataset 🧵

5.0K

Sasha Rush Retweeted

Alex Wettig@_awettig · Jul 14

two updates: 1. flying to ICML tonight 2. i joined @cursor_ai a month ago come talk to me to learn what makes research at cursor special :)

445

40.0K

Sasha Rush@srush_nlp · Jul 12

😅

jjxmo@jxmnop · Jul 11

today i woke up to a living version of a phd student's nightmare. a new paper in my inbox: a detailed reproduction of a paper i wrote several years ago. every table, graph, model, line of code everything should certainly reproduce! but i hadn't checked in a while... 😳

8.0K

Sasha Rush Retweeted

Keyon Vafa@keyonV · Jul 11

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

213

1.0K

7.0K

5.0K

1.3M

Sasha Rush Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

705

5.0K

4.0K

699.0K

Sasha Rush@srush_nlp · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

184

1.0K

756

186.0K

Sasha Rush Retweeted

will brown@willccbb · Jun 30

WOW! 🤯 this groundbreaking dataset from Meta’s Chief AI Scientist has revolutionized the way that we understand vision 👀 🚀 is this one of the highest-impact releases of all time?? ⏳🔥 10 crazy examples below: 🧵

102

1.0K

211

152.0K

Sasha Rush@srush_nlp · Jun 30

Finally closed our $11M+ funding round! Backed by top Japanese VCs and amazing angel investors including Joi Ito, @Thom_Wolf from @huggingface, @nlpnoah, @LukeZettlemoyer, and @srush_nlp. Now it’s time to focus on commercialization and tech development!!

KKotoba Technologies@kotoba_tech · Jun 30

リアルタイム音声AIの事業化と研究開発を加速させる為に17億円のシード2ラウンドを完了しました🔥 採用強化中です! prtimes.jp/main/html/rd/p… #GlobisCapitalPartners #BoostCapital #SIPCapital @Joi @Thom_Wolf #ToruShimada @LukeZettlemoyer @nlpnoah

15.0K