Tyler Romero

@tyleraromero

http://tylerromero.com; language modeling research and engineering @allen_ai

Joined September 2017

987Following

313Followers

Pinned

Tyler Romero@tyleraromero · Feb 10

Wrote a short post on reducing memory usage in RLHF post-training (PPO/GRPO) by optimizing log probability computations. Includes implementation details for selective log-softmax, with benchmarks and code. I recently contributed this optimization to TRL, OpenRLHF, and Verl.

tyleraromero's tweet image. Wrote a short post on reducing memory usage in RLHF post-training (PPO/GRPO) by optimizing log probability computations. Includes implementation details for selective log-softmax, with benchmarks and code.

I recently contributed this optimization to TRL, OpenRLHF, and Verl.

2.0K

Pinned

Tyler Romero Retweeted

Nathan Lambert@natolambert · Jun 11

occam's razor explanation of OpenAI dropping o3 price by 80% was they were sitting on a fat margin and wanted to test out demand. They changed nothing w model, upgraded their inference code a bit, and made less profit. No nonsense at all they don't have time for that.

436

29.0K

Pinned

Tyler Romero Retweeted

Paul Graham@paulg · May 15

Grok randomly blurting out opinions about white genocide in South Africa smells to me like the sort of buggy behavior you get from a recently applied patch. I sure hope it isn't. It would be really bad if widely used AIs got editorialized on the fly by those who controlled them.

650

840

9.0K

778

3.1M

Tyler Romero Retweeted

ellis 🍔@hamburger · Jul 16

open AI has pushed the industry forward again... three audio input buttons, people. take notes.

4.0K

390

356.0K

Tyler Romero Retweeted

sophia@cis_female · Jul 9

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms github.com/triton-lang/tr…

358

139

42.0K

Tyler Romero@tyleraromero · Jul 5

"Hiring pure backend engineer and expecting them to do non backend stuff IMO is wrong" Sigh. Any capable intern/new grad picks up whatever new technology is needed to get the job done. If you, as an *experienced* engineer, refuse to do so: you're less capable than an intern

AAnonGirder@AnonGirder · Jul 5

Not hiring a backend engineer is entirely ok Hiring pure backend engineer and expecting them to do non backend stuff IMO is wrong even in startups. Startups doesn't mean a pure backend engineer should be made to work on things he has no clue about/not interested in.

102

2.0K

209

326.0K

Tyler Romero Retweeted

Epoch AI@EpochAIResearch · Jun 13

Furthermore, the low diversity of codebases limits external validity. Django comprises nearly half of all issues and five repositories account for over 80% of the benchmark.

14.0K

Tyler Romero Retweeted

Ai2@allen_ai · Jun 30

OLMoTrace is a one-of-a-kind system and is made possible by Ai2’s commitment to making large pretraining and post-training datasets open in the interest of advancing scientific research in AI and public understanding of AI systems.

2.0K

Tyler Romero Retweeted

Luca Soldaini 🎀@soldni · Jun 24

in this increasingly digital era, there's no substitute for the book guillotine

3.0K

Tyler Romero Retweeted

Nathan Lambert@natolambert · Jun 23

New Ai2 office views for my meetings. We’re always hiring top AI talent excited about making the ecosystem more open.

456

28.0K

Tyler Romero@tyleraromero · Jun 18

give me your infra and i will code it for you system ML literally my fav thing to work on

wwill brown@willccbb · Jun 18

everybody wants to do fun experiments nobody wants to write core infrastructure code

2.0K

Tyler Romero Retweeted

will brown@willccbb · Jun 7

"it's a bad benchmark" is cope it's a beautiful benchmark that makes a very compelling argument about efficiency of learning and *should* be solvable by sufficiently intelligent models and nothing today is even close and the 2024 "o3" scaling results aren't a proper solution

478

18.0K

Tyler Romero@tyleraromero · Jun 5

This is crazy. We all knew open models would be better for privacy, but to have a court order to maintain 100% of logs under all circumstances is just awful for many types of OpenAI users.

SSimon Willison@simonw · Jun 5

OpenAI are now under a court order to permanently preserve logs of temporary conversations or paid API usage (previously subject to a 30 day retention policy) - a new twist in the now 17 month lawsuit between the New York Times and OpenAI simonwillison.net/2025/Jun/5/ope…

244

18.0K

Tyler Romero@tyleraromero · May 30

nice! we also recently trained a set of models on 25 different pretraining corpora, each corpus having 14 model sizes trained (4M to 1B), to 5x Chinchilla. We released 30,000+ checkpoints! x.com/allen_ai/statu… arxiv.org/pdf/2504.11393

AAi2@allen_ai · Apr 15

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

5.0K

Tyler Romero@tyleraromero · May 28

Thrilled to announce I've joined the incredible team at @allen_ai! I'll be working on language modeling!

155

11.0K

Tyler Romero Retweeted

Nathan Lambert@natolambert · May 27

immortalizing this moment forever when RL is so easy that you can just use random rewards and your benchmarks still go up smh

1.0K

514

95.0K

Tyler Romero Retweeted

Aidan McLaughlin@aidan_mclau · May 16

the biggest headline about codex is really tasteful end-to-end rl we didn’t just stick an api model into a scaffold and ship; like deep research, the codex model has had a ton of practice doing real autonomous coding

791

122

79.0K

Tyler Romero Retweeted

Nathan Lambert@natolambert · May 11

leaders in ai talk like there's some master plan but it's literally just this

122

2.0K

154

68.0K

Tyler Romero@tyleraromero · Apr 30

There's a new paper circulating looking in detail at LMArena leaderboard: "The Leaderboard Illusion" arxiv.org/abs/2504.20879 I first became a bit suspicious when at one point a while back, a Gemini model scored #1 way above the second best, but when I tried to switch for a few…

llmarena.ai@lmarena_ai · Apr 30

Thanks for the authors’ feedback, we’re always looking to improve the platform! If a model does well on LMArena, it means that our community likes it! Yes, pre-release testing helps model providers identify which variant our community likes best. But this doesn’t mean the…

191

422

4.0K

2.0K

682.0K

Tyler Romero Retweeted

Stella Biderman @ ICML@BlancheMinerva · Apr 30

Really incredible detective work by @singhshiviii et al. at @Cohere_Labs and elsewhere documenting the ways in which @lmarena_ai works with companies to help them game the leaderboard. arxiv.org/abs/2504.20879

426

213

49.0K

Tyler Romero@tyleraromero · Apr 26

With first Claude and now Gemini playing Pokemon, I was thinking of doing my own game-playing experiment over the weekend. However, I quickly learned that it's very far from the VLA-style "pixels->plan" that I naively thought it was, and wanted to do myself. It's like 90%…

LLogan Kilpatrick@OfficialLoganK · Apr 25

Gemini 2.5 Pro just got the final 8th badge in Pokemon Blue, incredible pace of progress by the world's most powerful model!!! Next up: victory road and final 4 : )

1.0K

468

344.0K