Haoyu Zhao
@thomaszhao1998
PhD student @Princeton, Research Intern @MSFTResearch. Recently interested in theorem proving.
Very proud to be a member of the Goedel team and contribute to our prover!
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️
@QuantaMagazine featured our work on emergence of skill compositionality (and its limitations) in LLMs among the CS breakthroughs of the year. tinyurl.com/5f5jvzy5. Work was done over 2023 @GoogleDeepMind and @PrincetonPLI. Key pieces: (i) mathematical framework for…
Fine-tuning can improve chatbots (e.g., Llama 2-Chat, GPT-3.5) on downstream tasks — but may unintentionally break their safety alignment. Our new paper: Adding a safety prompt is enough to largely mitigate the issue, but be cautious about when to add it! arxiv.org/abs/2402.18540
@icmlconf **paper alert** Fine-tuning LLM on a task gives it new skill. Our “Skill localization” paper shows this skill lives in < 0.01% parameters — rest can be reverted to pre-trained values. 1/6 With @NSaunshi,@thomaszhao1998,@prfsanjeevarora Link: arxiv.org/abs/2302.06600