kalomaze

@kalomaze

ML researcher (@primeintellect), speculator • extremely silly jester

Joined October 2020

2KFollowing

16KFollowers

Pinned

kalomaze@kalomaze · Dec 9

let's verify the unverifiable

137

116.0K

kalomaze@kalomaze · 17 h

so ready for this

mmrfakename@realmrfakename · 19 h

GLM-4.5-Air coming soon? h/t "Dr. Chad PhD"

4.0K

kalomaze@kalomaze · Jul 24

maybe i should move to austin

3.0K

kalomaze@kalomaze · Jul 24

"i can fix her"

kkalomaze@kalomaze · Jul 24

here's qwen7b instruct falling off a cliff

4.0K

kalomaze@kalomaze · Jul 24

i'm assuming most people knew that this is how Anthropic and co. already handle fuzzier stuff, but good to see it represented in papers rather than just products

TTanishq Abraham is at ICML@iScienceLuvr · Jul 24

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

196

106

12.0K

kalomaze@kalomaze · Jul 24

here's qwen7b instruct falling off a cliff

kkalomaze@kalomaze · Jul 24

proper multiturn state tracking task working with RL-able pass rate on devstral 24b, cool

7.0K

kalomaze Retweeted

secemp@secemp9 · Jul 22

many are saying

2.0K

kalomaze@kalomaze · Jul 24

proper multiturn state tracking task working with RL-able pass rate on devstral 24b, cool

6.0K

kalomaze@kalomaze · Jul 23

first you secure a visa, next you secure the future

mmike64_t@mike64_t · Jul 21

fuck yeah

2.0K

kalomaze@kalomaze · Jul 23

wandb to introduce new accessibility feature called "prime intellect mode" for colorblind users

mmike64_t@mike64_t · Jul 23

The Saga of wandb color debates continues. Apparently everyone at prime intellect is color blind...

6.0K

kalomaze@kalomaze · Jul 23

noticed this before in qwen models yeah. can't tell if bad init or aggressive pretraining LR or whatever the hell causes entire rows to be outliers

IIlya Sutskever's hairline@IlyasHairline · Jul 22

What the fuck, Qwen?

6.0K