Rosmine
@rosmine_b
ML researcher. LLMs + RL + Code gen. Tweets express the views of my employer (myself). DM me ML questions
I trained a model with GRPO to generate better SVG images. Here's improvement over 120 steps More details below prompt: Two tall giraffes are next to bare trees.

Ok, I want to know the highest effort data cleaning that everyone did
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking
My friend @alignment_lab is selling prebuilt 2x 5090 machines for $5K. Great deal if you want to get started with your own gpus
Introducing SENTER We are announcing the availability of SENTER, a powerful workstation we built to perform research and train AI without the extreme costs of cloud and API fees. It's designed to put your intelligence, data, privacy, and productivity back into your hands.…
> low value work (like data cleaning) uh...
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking
Just talked to a meta recruiter. I asked about comp and he said he didn't know lol Of course he knows, he just doesn't want to say it because with Zuck's hiring spree, people will be disappointed with anything less than 10M
My training runs kept becoming unstable late in training. Turns out a "clever" optimization hack I tried was effectively increasing the LR the closer it got to a local min, making convergence impossible
A general research tip: pay attention to the details. When something strange happens—don’t ignore it. Especially in robotics, odd or incorrect robot behavior often reveals deeper insights. Dig in and ask why
Reminder: If your RL reward has multiple components on very different scales, the model’s going to take a loooooong time to learn from the smaller components