Jane Pan @ ACL 2025
@JanePan_
CS PhD at @nyuniversity, @NSF GRFP, @Deepmind Fellowship, @SiebelScholars | @Princeton @Princeton_nlp '23 | @Columbia '21.
I really like the paper from Jane Pan (w @danqi_chen) abt this: arxiv.org/abs/2305.09731. ICL in big models is clearly a mix of task recognition and "real learning" (you're not learning to translate from 3 examples, but you're not getting an arbitrary label mapping from the prior)
Bored of seeing pristine, perfect posters? Come see me at Hall X5, Board 105 at 6pm to witness my masterpiece, featuring bonus Sharpie scribbles and a QR code that betrayed me at the last moment 😤
I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!
I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!
When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]
What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 github.com/AngelaZZZ-611/… We have labeled answer data on its way. Stay tuned!
Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below
Do reasoning models know when their answers are right?🤔 Really excited about this work led by Anqi and @YulinChen99. Check out this thread below!
Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below
How do you know if a method is better, or just has better hyperparameters? @hhexiy, @kchonyc, and I give a new tool to answer this in our #NAACL2024 paper: "Show Your Work with Confidence" arxiv.org/abs/2311.09480. Use it in your own work with just a "pip install opda"! 🧵 1/8