Noam Razin
@noamrazin
Postdoctoral Fellow at @PrincetonPLI | Past: Computer Science PhD @TelAvivUni & Apple Scholar in AI/ML | Interested in the foundations of deep learning
The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality? 📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers! 🧵

New extended version of the preprint “Edge of Stochastic Stability (EoSS)” out! w/ @arseniqum 👉 arxiv.org/pdf/2412.20553 🗓️ Tomorrow (Wed July 23, 12 PM EDT) I’ll talk about it at OWML — sfu.zoom.us/j/89334355925 I've never explained what that was about I'll do it here:
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Do neural nets really need gradient descent to generalize?🚨 We dive into matrix factorization and find a sharp split: wide nets rely on GD, while deep nets can thrive with any low-training-error weights! arxiv.org/abs/2506.03931 🧵
Do NNs need GD to generalize? Check out our new paper 👇
Do neural nets really need gradient descent to generalize?🚨 We dive into matrix factorization and find a sharp split: wide nets rely on GD, while deep nets can thrive with any low-training-error weights! arxiv.org/abs/2506.03931 🧵
LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)
In a new blog post, @HowardYen1 and @xiye_nlp introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs. Read now: pli.princeton.edu/blog/2025/long…