Pawel Garbacki

@pawelg

GenAI researcher, @FireworksAI_HQ co-founder, ex-Meta, ex-Google

Joined April 2009

404Following

63Followers

Pinned

Check out our latest blog post explaining how GRPO (Group Relative Policy Optimization), employed by models like DeepSeek R1, helps models learn effectively without the heavy lifting of value networks or massive supervised datasets: * By skipping a standalone Value Model, GRPO…

FFireworks AI@FireworksAI_HQ · Jan 29

Let's talk about How Reinforcement Learning Empowers AI with Minimal Labels 👇 Supervised fine-tuning has long been the go-to method for refining AI models, but reinforcement learning (RL) is emerging as a game-changer—reducing reliance on labeled data while keeping training…

123