Jacob Eisenstein
@jacobeisenstein
@jacobeisenstein.bsky.social. Not here very often.
It's never been more clear what this site is about and there are perfectly good alternatives. Pick a side. giphy.com/gifs/Ggt7ZY74E…
If you work at xai, you can just quit. You can get a job almost anywhere. What on earth are you doing.
sad that i have multiple former colleagues that went to work at the xai, the nazi chatbot company. everyone has to make their own decisions, and maybe you didn't see this coming until now, somehow. but if you're choosing to work there after today, that sure is a choice.
xAI has disabled Grok, deleted a slew of its antisemitic and neo-Nazi posts, posted a statement, and are evidently rolling back the prompt that made it identify as "MechaHitler," but this new low for Elon Musk's chatbot will live in internet infamy: rollingstone.com/culture/cultur…
We're hiring a research scientist on the Foundational Research in Language team at GDM. The role is right here in sunny Seattle! job-boards.greenhouse.io/deepmind/jobs/…
I'm pretty excited about this paper! arxiv.org/abs/2410.18077 I wrote some more over here: bsky.app/profile/jacobe…
🚨New paper led by @setlur_amrith on process rewards for reasoning! Our PRMs that model specific notion of "progress" reward (NO human supervision) improve: - compute efficiency of search by 1.5-5x - online RL by 6x - 3-4x vs past PRM results arxiv.org/abs/2410.08146 How? 🧵👇
🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x ✅ RL sample efficiency by 6x ✅ 3-4x ⬆️ accuracy gains vs prior works ❌ human supervision What's the secret sauce 🤔?: See 🧵 ⬇️ arxiv.org/pdf/2410.08146
Google presents Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Achieves 5 − 6× gain in sample efficiency, 1.5 − 5× more compute-efficiency, and > 6% gain in accuracy, over ORMs on test-time search arxiv.org/abs/2410.08146
Is this site now part of Trump's re-election campaign? forbes.com/sites/antoniop… Anyway, you can find me at @jacobeisenstein.bsky.social
Excited to share new work from @GoogleDeepMind / @GoogleResearch on improving LLM evals using ML predictions together with a simple but effective stratified sampling approach that strategically divides the underlying data for better performance. Paper: arxiv.org/abs/2406.04291
Excited to share new work from @GoogleDeepMind: “ProtEx: A Retrieval-Augmented Approach for Protein Function Prediction” biorxiv.org/content/10.110…
Excited to share new work from @GoogleDeepMind / @GoogleResearch on “Robust Preference Optimization through Reward Model Distillation”. arxiv.org/abs/2405.19316
Want to train an aligned LM in a new language 🌏 but don’t have preference data for training the reward model (RM)? 💡 Just use a RM for another language: it often works well, sometimes even BETTER than if you had a RM in your target language! 🤯 arxiv.org/abs/2404.12318
Google presents Reuse Your Rewards Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human
If this were a science paper, you would expect a country that picks its science workforce at random as a “weak baseline” and a leading nation like the US to actively experiment towards state-of-the-art, or at least beat the baseline. Not providing a guaranteed path for…
H1B lottery ❌ It was less than a 1 in 3 chance, but sucks anyway!
This one weird* trick will fix all** your LLM RLHF issues! * not weird ** as long as your issues are about how to combine multiple objectives, and avoid reward hacking
Transforming the reward used in RLHF gives big wins in LLM alignment and makes it easy to combine multiple reward functions! arxiv.org/pdf/2402.00742… @nagpalchirag @JonathanBerant @jacobeisenstein @alexdamour @sanmikoyejo @victorveitch @GoogleDeepMind
Transforming the reward used in RLHF gives big wins in LLM alignment and makes it easy to combine multiple reward functions! arxiv.org/pdf/2402.00742… @nagpalchirag @JonathanBerant @jacobeisenstein @alexdamour @sanmikoyejo @victorveitch @GoogleDeepMind
Transforming and Combining Rewards for Aligning Large Language Models paper page: huggingface.co/papers/2402.00… A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the…
Transforming and Combining Rewards for Aligning Large Language Models paper page: huggingface.co/papers/2402.00… A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the…
Have you been compiling reward-KL tradeoffs to compare different alignment methods? Have you been using 𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 as a baseline? Have you wondered about the analytical formula that claims this formula? 𝐾𝐿 (𝑏𝑒𝑠𝑡-𝑜𝑓-𝑛 || 𝑏𝑎𝑠𝑒) = 𝑙𝑜𝑔(𝑛) - (𝑛-1)/𝑛