Dan Friedman

@danfriedman0

PhD student @princeton_nlp

Joined September 2020

296Following

817Followers

Pinned

Dan Friedman@danfriedman0 · Jul 16, 2024

How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constructed a Transformer that implements the classic ELIZA chatbot algorithm (with @Abhishek_034 and @danqi_chen). Paper: arxiv.org/abs/2407.10949 (1/6)

danfriedman0's tweet image. How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constructed a Transformer that implements the classic ELIZA chatbot algorithm (with @Abhishek_034 and @danqi_chen). Paper: arxiv.org/abs/2407.10949 (1/6)

141

13.0K

Dan Friedman Retweeted

Princeton PLI@PrincetonPLI · May 8

In a new blog post, @HowardYen1 and @xiye_nlp introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs. Read now: pli.princeton.edu/blog/2025/long…

3.0K

Dan Friedman Retweeted

Michael Hu ✈️ ACL 2025 🇦🇹@michahu8 · Feb 27

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

108

713

484

80.0K

Dan Friedman Retweeted

Alex Wettig@_awettig · Feb 18

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

210

104

48.0K

Dan Friedman Retweeted

Simon Park@parksimon0808 · Jan 8

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… @Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora

18.0K

Dan Friedman Retweeted

Tianyu Gao@gaotianyu1350 · Jan 6

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956

200

25.0K

Dan Friedman Retweeted

John Hewitt@johnhewtt · Nov 25

I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding

158

883

362

103.0K

Dan Friedman Retweeted

Xi Ye@xiye_nlp · Nov 21

🔔 I'm recruiting multiple fully funded MSc/PhD students @UAlberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

160

527

275

69.0K

Dan Friedman Retweeted

Griffiths Computational Cognitive Science Lab@cocosci_lab · Nov 18

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/9780262049412/…

464

2.0K

172.0K

Dan Friedman Retweeted

Tom McCoy@RTomMcCoy · Nov 14

🤖🧠 I'll be considering applications for postdocs & PhD students to start at Yale in Fall 2025! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! Postdoc link: rtmccoy.com/prospective_po… PhD link: rtmccoy.com/prospective_st…

344

167

39.0K

Dan Friedman Retweeted

Angelina Wang @ ACL 2025 @angelinawang.bsky.social@ang3linawang · Nov 3

I am recruiting PhD students for Fall 2025 at Cornell Tech! If you are interested in topics relating to machine learning fairness, algorithmic bias, or evaluation, apply and mention my name in your application: infosci.cornell.edu/phd/admissions Also, go vote!

237

946

302

104.0K

Dan Friedman Retweeted

Aaron Mueller@amuuueller · Oct 30

I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models). Details below!

189

730

292

63.0K

Dan Friedman Retweeted

Abhishek Panigrahi@Abhishek_034 · Oct 11

Progressive distillation, where a student model learns from multiple checkpoints of the teacher, has been shown to improve the student–but why? We show it induces an implicit curriculum that accelerates training. Work w @BingbinL, @SadhikaMalladi, @risteski_a, @SurbhiGoel_

19.0K

Dan Friedman Retweeted

Tom McCoy@RTomMcCoy · Oct 10

🤖🧠NOW OUT IN PNAS🧠🤖 Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29 In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do pnas.org/doi/10.1073/pn… Major updates since the preprint! 1/n

351

188

51.0K

Dan Friedman Retweeted

Akshara Prabhakar@aksh_555 · Oct 7

🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 arxiv.org/abs/2407.01687 1/n

315

228

77.0K

Dan Friedman Retweeted

John Yang@jyangballin · Oct 7

We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ @_carlosejimenez 🧵

268

52.0K

Dan Friedman Retweeted

Tianyu Gao@gaotianyu1350 · Oct 4

Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): shorturl.at/JnBHD ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a Here is a story of how we arrived there

199

56.0K

Dan Friedman Retweeted

Tianyu Gao@gaotianyu1350 · Jul 22, 2024

Meet ProLong, a Llama-3 based long-context chat model! huggingface.co/princeton-nlp/… (64K here, 512K coming soon) ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks.

143

21.0K

Dan Friedman Retweeted

Mengzhou Xia@xiamengzhou · Jul 19, 2024

🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀 SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity. ✨ Recipe: on-policy…

178

42.0K

Dan Friedman Retweeted

Tianyu Gao@gaotianyu1350 · Jul 16, 2024

If you are attending ICML this year, stop by our workshop on long-context foundation models! Schedule: longcontextfm.github.io/schedule/ Also, RSVP for our social event with our sponsor @togethercompute on July 24: lu.ma/9fctiq9k 🥳

222

40.0K