Guy Davidson

@guyd33

PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).

New York, USA

Joined April 2019

2KFollowing

1KFollowers

Pinned

Guy Davidson@guyd33 · May 23

New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

guyd33's tweet image. New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

271

47.0K

Pinned

Guy Davidson@guyd33 · Jul 23

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!

ttodd gureckis@todd_gureckis · Jul 21

Today we open-sourced a new project for developing behavioral experiments online. It is called Smile. Announcement of v0.1.0: todd.gureckislab.org/2025/07/22/s... Smile has been used internally in my lab for several years and has substantially increased our productivity.

479

Guy Davidson@guyd33 · Jul 23

John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!

JJohn (Yueh-Han) Chen@jcyhc_ai · Jul 22

New SAGE-Eval results: Both o3 and Claude-sonnet-4 underperformed(!) their previous generations (o3 vs. o1, Claude-4 vs. Claude-3.7). ➟ Stronger models are not always safer! Gemini-2.5-pro is No.1, but it only passed 72% of the safety facts. Still lots of room for improvement.

932

Guy Davidson@guyd33 · Jul 8

Cool new work on localizing and removing concepts using attention heads from colleagues at NYU and Meta!

DDr. Karen Ullrich@karen_ullrich · Jul 8

How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.

528

Guy Davidson Retweeted

Dr. Karen Ullrich@karen_ullrich · Jul 8

7.0K

Guy Davidson@guyd33 · Jun 30

Today! Come hear from some wonderful folks about problem solving and design at 1 PM PT / 4 PM ET / 8 PM UTC

JJunyi Chu@JunyiChu · Jun 6

🧩 On problem solving (6/30, 2-3pm PT): @KelseyRAllen @BonanZhao @RanjayKrishna Register here: stanford.zoom.us/meeting/regist…

796

Guy Davidson@guyd33 · Jun 6

You (yes, you!) should work with Sydney! Either short-term this summer, or longer term at her nascent lab at NYU!

SSydney Levine@sydneymlevine · Jun 6

🔆 I'm hiring! 🔆 There are two open positions: 1. Summer research position (best for master's or graduate student); focus on computational social cognition. 2. Postdoc (currently interviewing!); focus on computational social cognition and AI safety. sites.google.com/corp/site/sydn…

700

Guy Davidson Retweeted

Sonia@soniajoseph_ · Jun 4

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 @CVPR @miv_cvpr2025

277

101

19.0K

Guy Davidson@guyd33 · May 30

Fantastic new work by @jcyhc_ai (with @LakeBrenden and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!

JJohn (Yueh-Han) Chen@jcyhc_ai · May 29

Do LLMs show systematic generalization of safety facts to novel scenarios? Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this! - Claude-3.7-Sonnet passes only 57% of facts evaluated - o1 and o3-mini passed <45%! 🧵

1.0K