Jiaxin Pei
@jiaxin_pei
Postdoc @StanfordHAI @stanfordnlp @DigEconLab, PhD from Umich. Incoming Assistant Professor @UTAustin NLP, HCI, Computational Social Science
Introducing 🥔POTATO: The Portable Text Annotation Tool. Potato comes with 20+ templates for many types of labeling tasks, editable UI, pre-screening questions, and many other features that can make data annotation easier and better. Github: bit.ly/3Z4EjBw (1/8)
AI Shopping/Sales Agents sound very cool! But what if both the buyer and seller use AI agents? Our recent study found that stronger agents can exploit weaker ones to get a better deal, and delegating negotiation to AI agents might lead to economic losses. arxiv.org/abs/2506.00073…

Come and work with us on consumer-authorized AI agents!! Happy to chat about this opportunity!
We're hiring at the @DigEconLab! Seeking a Postdoctoral Research Fellow to join our project on developing user-centered AI agents in consumer settings. Learn more: digitaleconomy.stanford.edu/about/post-doc…
We're hiring at the @DigEconLab! Seeking a Postdoctoral Research Fellow to join our project on developing user-centered AI agents in consumer settings. Learn more: digitaleconomy.stanford.edu/about/post-doc…
I will be at #acl2025 to present "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" ✨ Heartfelt thank you to my collaborators @jiaxin_pei @paul_rottger @pcimiano @david__jurgens Dirk Hovy more below
What do workers want from AI? Researchers from @StanfordHAI and @DigEconLab undertook a comprehensive study involving U.S. workers and AI experts. Here's what they found: stanford.io/3IsmHfg
After sharing our preprint on the Future of Work with AI Agents, we received strong interest in the WORKBank database. Today, we’re excited to release it publicly—along with a visualization tool to explore occupational and sector-level insights🧵
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
I love the meme lol
What happens when both consumers and merchants have AI agents acting on their behalf? Scholars, including Stanford @DigEconLab faculty lead Prof. @alex_pentland and Postdoctoral Fellow @jiaxin_pei, explore the art of the automated negotiation. hai.stanford.edu/news/the-art-o…
What happens when both consumers and merchants have AI agents acting on their behalf? Scholars, including Stanford @DigEconLab faculty lead Prof. @alex_pentland and Postdoctoral Fellow @jiaxin_pei, explore the art of the automated negotiation. hai.stanford.edu/news/the-art-o…
AI companions aren’t science fiction anymore 🤖💬❤️ Thousands are turning to AI chatbots for emotional connection – finding comfort, sharing secrets, and even falling in love. But as AI companionship grows, the line between real and artificial relationships blurs. 📰 “Can A.I.…
AI can get very persuasive these days, but what happens in a future where AI agents negotiate with each other? New research from @jiaxin_pei et al showed that when your agent sucks, you could lose money, and this could lead to a new kind of digital gap technologyreview.com/2025/06/17/111…
Many companies are trying to automate human workflow with AI Agents. But what kinds of tasks do human workers really want to automate, and can AI agents really automate everything? Check out our paper 👇
🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants A new benchmark for evaluating end-to-end, speech-in-speech-out voice assistants in real-world scenarios. We go beyond single tasks or metrics to test the capabilities required for voice assistants:…
I'm Hiring: Postdoc at Warwick Join me to delve into the study of Scientific Prize. Successful candidate will work closely with me at Warwick, and will have the opportunity to collaborate with leading scientists at Northwestern, Cornell and many more. warwick-careers.tal.net/vx/candidate/a…
There have been so many discussions lately about using sociodemographic prompting to simulate people's subjective perceptions! In this study, we explored an important question: If we fine-tune an LLM, does it really learn characteristics about demographics? Check out our paper!…
Can LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! 🧵