Lujain Ibrahim لجين إبراهيم
@lujainmibrahim
Working on AI evaluations & societal impact / PhD candidate @oiioxford / previously @googledeepmind @govai_ @schwarzmanorg @nyuniversity
📣New preprint!📣We’ve long known humans tend to anthropomorphize computers. But with the rise of social AI applications, like AI companions, studying this is now more crucial than ever. We introduce a new method for *empirically evaluating* anthropomorphic behaviors in LLMs🧵

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
How do people reason so flexibly about new problems, bringing to bear globally-relevant knowledge while staying locally-consistent? Can we engineer a system that can synthesize bespoke world models (expressed as probabilistic programs) on-the-fly?
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
A new @SinicaPodcast recorded in Shaxi, in Southwest China's Yunnan province, with the always brilliant economic historian @adam_tooze, who has in recent years gotten very interested in China — and very knowledgeable about it too, though he'd humbly deny it. Link below.
new paper 🌟 interpretation of uncertainty expressions like "i think" differs cross-linguistically. we show that (1) llms are sensitive to these differences but (2) humans overrely on their outputs across languages
🤩🤩🤩@Saad97Siddiqui and @lujainmibrahim adapted AGORA's taxonomy to compare US and Chinese documents on AI risk: "...despite strategic competition, there exist concrete opportunities for bilateral U.S. China cooperation in the development of responsible AI." 🔗🧵
I wish data centers would offer tours to the public and schools could take field trips to them. They are the defining pieces of infrastructure of our generation, but unlike railroads, the grid, or anything else, we never get to see them and experience their scale.
Important work. Non-Claude models seem to refuse reasoning about alignment faking, and have less intrinsic tendency for goal-guarding. Observing this diff is a step towards better aligning AI. I'm in awe that 2025 is seeing alignment become an increasingly empirical discipline!
New blog post! AI agents are becoming increasingly capable, but will need new protocols and systems in order to work effectively and safely. Who should build such protocols and systems?
Join @MLCommons for a social at @FAccTConference 2025, where we're tackling the critical need for a unified and collective approach to AI safety. AI safety research is siloed, hindering the development of safe and robust AI systems that work for everyone.
Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably won’t. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
Apply for GovAI’s DC Fellowship! Fellows will join GovAI in Washington, DC for 3 months to conduct paid research on a topic of their choice, with mentorship from leading experts in the field of AI policy. Application Deadline: May 25, 2025 at 23:59 ET. governance.ai/post/dc-fellow…
Consciousness is a fascinating topic. But personally, I'd rather resources be directed towards preventing (human) harms coming from people mistakenly believing an AI system is conscious.
New column: Anthropic is studying "model welfare" to determine if Claude or other AI systems are (or will soon be) conscious and deserve moral status. I talked to Kyle Fish, who leads the research, and thinks there's a ~15% chance that Claude or another AI is conscious today.
📢 Over the moon that Open Problems in Technical AI Governance has now been published at TMLR! See the updated version here: shorturl.at/joQJS
Should we use LLMs 🤖 to simulate human research subjects 🧑? In our new preprint, we argue sims can augment human studies to scale up social science as AI technology accelerates. We identify five tractable challenges and argue this is a promising and underused research method 🧵
📣 We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance!