Guy Davidson
@guyd33
PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).
New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!
Today we open-sourced a new project for developing behavioral experiments online. It is called Smile. Announcement of v0.1.0: todd.gureckislab.org/2025/07/22/s... Smile has been used internally in my lab for several years and has substantially increased our productivity.
John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!
New SAGE-Eval results: Both o3 and Claude-sonnet-4 underperformed(!) their previous generations (o3 vs. o1, Claude-4 vs. Claude-3.7). ➟ Stronger models are not always safer! Gemini-2.5-pro is No.1, but it only passed 72% of the safety facts. Still lots of room for improvement.
Cool new work on localizing and removing concepts using attention heads from colleagues at NYU and Meta!
How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
Today! Come hear from some wonderful folks about problem solving and design at 1 PM PT / 4 PM ET / 8 PM UTC
🧩 On problem solving (6/30, 2-3pm PT): @KelseyRAllen @BonanZhao @RanjayKrishna Register here: stanford.zoom.us/meeting/regist…
You (yes, you!) should work with Sydney! Either short-term this summer, or longer term at her nascent lab at NYU!
🔆 I'm hiring! 🔆 There are two open positions: 1. Summer research position (best for master's or graduate student); focus on computational social cognition. 2. Postdoc (currently interviewing!); focus on computational social cognition and AI safety. sites.google.com/corp/site/sydn…
Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 @CVPR @miv_cvpr2025
Fantastic new work by @jcyhc_ai (with @LakeBrenden and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!
Do LLMs show systematic generalization of safety facts to novel scenarios? Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this! - Claude-3.7-Sonnet passes only 57% of facts evaluated - o1 and o3-mini passed <45%! 🧵