Angelica Chen
@_angie_chen
Gemini training @ GDM PhD from @NYUDataScience, previously @Princeton 🐅 angie-chen at 🦋 Interested in LLMs, pastries, and running
New work w/@sadhikamalladi, @lilyhzhang, @xinyichen2, @QiuyiRichardZ, Rajesh Ranganath, @kchonyc: Contrary to conventional wisdom, RLHF/DPO does *not* produce policies that mostly assign higher likelihood to preferred responses than to less preferred ones.

CDS PhD student @_angie_chen presents LLOME, using LLMs to optimize synthetic sequences with potential applications for drug design. Co-led by @samuel_stanton_ & @nc_frey and with insights from @kchonyc, @RichBonneauNYC, and others at @PrescientDesign. nyudatascience.medium.com/language-model…
🌉 Bridging Offline & Online RL for LLMs 🌉 📝: arxiv.org/abs/2506.21495 New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO…
Bridging Offline and Online Reinforcement Learning for LLMs Investigates the effectiveness of RL for finetuning LLMs when transitioning from offline to semi-online to fully online regimes for both verifiable and nonverifiable tasks.
What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
🚨 Diverse Preference Optimization (DivPO) 🚨 SOTA LLMs have model collapse🫠: they can't generate diverse creative writing or synthetic data 🎨 DivPO trains for both high reward & diversity, vastly improving variety with similar quality. Paper 📝: arxiv.org/abs/2501.18101 🧵below
I saw a slide circulating on social media last night while working on a deadline. I didn’t comment immediately because I wanted to understand the full context before speaking. After learning more, I feel compelled to address what I witnessed during an invited talk at NeurIPS 2024…
It is just so sad that the #NeurIPS2024 main conference ended with such a racist remark by a faculty when talking about ethics. How ironic! I also want to commend the Chinese student who spoke up right on spot. She was respectful, decent, and courageous. Her response was…
I’ll be at NeurIPS this week! Presenting at the Thursday 4:30pm poster session and giving a spotlight talk at the AIDrugX workshop on Sunday. Also, I’ve finally joined 🦋. Come find me, both at NeurIPS and on 🦋! ☺️



Two @NeurIPSConf workshop spotlight talks from our lab this year! @amyxlu will present on all-atom protein generation from sequence-only inputs at MLSB and @_angie_chen will present on LLMs as highly-constrained biophysical sequence optimizers at AIDrugX
🚨🔔Foundational graph search task as testbed: for some distribution, transformers can learn to search (100% acc). We interpreted their algo!! But as graph size ↑, transformers struggle. Scaling up # params does not help; CoT does not help. 1.5 years of learning in 10 pages!
Check out Sadhika's talk tomorrow! She'll be talking about our paper "Preference Learning Algorithms Do Not Learn Preference Rankings" (arxiv.org/abs/2405.19534) as well as some cool very cool follow-up work :)
I will be giving a talk on "Failure Modes of Preference Learning" through the AI Tinkerers club on 11/26 at 12pm ET. I gave this talk at a few universities recently, and I'm excited to share it with the broader community! paperclub.aitinkerers.org/p/join-paper-c…
LLMs are clearly very general interfaces, but we weren't sure they could be made precise enough for protein design to really work. With active data collection, the right preference tuning, and test-time scaling (or just search as we used to call it) it looks like yes!
LLMs are highly constrained biological sequence optimizers. In new work led by @_angie_chen & @samuel_stanton_ , we show how to drive an active learning loop for protein design with an LLM. 1/
LLMs are highly constrained biological sequence optimizers. In new work led by @_angie_chen & @samuel_stanton_ , we show how to drive an active learning loop for protein design with an LLM. 1/
What makes some LM interpretability research “mechanistic”? In our new position paper in @BlackboxNLP, @sarahwiegreffe and I argue that the practical distinction was never technical, but a historical artifact that we should be—and are—moving past to bridge communities.
Be sure to stop by Angie's oral presentation and our poster on our preference learning work (arxiv.org/abs/2405.19534) at the MHFAIA workshop at ICML! We'll also be presenting this poster at the Theoretical Foundations of Foundation Models (TF2M) workshop :)
I'll be at @icmlconf next week! Giving a plenary talk at the HiLD workshop and an oral on our recent paper (arxiv.org/abs/2405.19534) at the MHFAIA workshop! Pls reach out to chat if you're also interested in any of these topics! 😊
Self-rewarding LMs at #icml2024 ! Thru iterative DPO (w/ a small amount of seed data), LLM instruction following ↑ (AlpacaEval 2.0, human, MT-bench) & reward modeling ↑ (corr w human rankings). @jingxu_ml will be presenting in Vienna (Tues 7/23 11:30am); please stop by! (1/2)
🚨New paper!🚨 Self-Rewarding LMs - LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO - Reward modeling ability improves during training rather than staying fixed ...opens the door to superhuman feedback? arxiv.org/abs/2401.10020 🧵(1/5)