P
Pawel Garbacki
@pawelg
GenAI researcher, @FireworksAI_HQ co-founder, ex-Meta, ex-Google
Joined April 2009
404Following
63Followers
Pinned
P
Pawel Garbacki@pawelg · Jan 29
Check out our latest blog post explaining how GRPO (Group Relative Policy Optimization), employed by models like DeepSeek R1, helps models learn effectively without the heavy lifting of value networks or massive supervised datasets: * By skipping a standalone Value Model, GRPO…
Let's talk about How Reinforcement Learning Empowers AI with Minimal Labels 👇 Supervised fine-tuning has long been the go-to method for refining AI models, but reinforcement learning (RL) is emerging as a game-changer—reducing reliance on labeled data while keeping training…
1
0
1
0
123