Myra Cheng
@chengmyra1
PhD candidate @StanfordNLP 🌱
Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.

In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇
.@stanfordnlp papers at @aclmeeting in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs @chengmyra1 @sunnyyuych @jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets @harshitj__ @ShichengGLiu…
Job seekers are using LLMs to boost their resumes. Are companies interviewing the best candidates ... or just the candidates using the best LLM? 🧐 Our new ICML paper presents a fair and accurate hiring algorithm under stochastic manipulations: 📄 arxiv.org/abs/2502.13221 🧵 1/5
Thrilled to join the UMich faculty in 2026! I'll also be recruiting PhD students this upcoming cycle. If you're interested in AI and formal reasoning, consider applying!
We’re happy to announce that @GabrielPoesia will be joining our faculty as an assistant professor in Fall 2026. Welcome to CSE! ▶️Learn more about Gabriel here: gpoesia.com #UMichCSE #GoBlue
Computer-vision research powers surveillance technology by @ria_kalluri, @chengmyra1, and colleagues out in @Nature! “Here we present … extensive evidence of the close relationship between the field of computer vision and surveillance.”
Very happy to see my invited article for Nature is out in the new issue. I wrote about computer vision, the AI surveillance industrial complex, and brilliant new research that shows how deeply embedded the computer vision field is in the surveillance pipeline.
What more could we understand about the fractal, “jagged” edges of AI system deployments if we had better ways to listen to the people who interact with them? What a joy to work w @jessicadai_ using individual experiences to inform AI evaluation (blog/ICML/arXiv links 👇)
made it to Athens for Facct!! reach out if you want to chat about perceptions of AI, metaphors, sycophancy, anthropomorphism, cats, or anything else :D #facctcats (he is reading about himself)
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of people’s mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦♀️ Ask to write articles → assumes your preferences 🤷🏻♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.…
New #ACL2025NLP Paper! 🎉 Curious what AI thinks about YOU? We interact with AI every day, offering all kinds of feedback, both implicit ✏️ and explicit 👍. What if we used this feedback to personalize your AI assistant to you? Introducing SynthesizeMe! An approach for…
Avoiding race talk can feel unbiased, but it often isn’t. This racial blindness can reinforce subtle bias in humans. Aligned LLMs do the same: when context is unclear, they suppress race and fail to trigger safety guardrails, as if the models are aligned, but blind. See 🧵below!
7/ 📢 Accepted to #ACL2025 Main Conference! See you in Vienna. Work done by @1e0sun, @ChengzhiM, @vjhofmann, @baixuechunzi . Paper: arxiv.org/abs/2506.00253 Project page: slhleosun.github.io/aligned_but_bl… Code & Data: github.com/slhleosun/alig…
🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9
mom pick me up im scared (exciting work!!)
What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵
2025 AI hot take: everyone should use FastText more. Word embeddings are awesome.
This @acm_chi paper surveys 319 knowledge workers. The authors find that GenAI reduces cog load for some tasks (info search) but creates new tasks (verification). This made me wonder to what extent automating search creates "rich-get-richer" effects. dl.acm.org/doi/10.1145/37…