Yue Yang
@YueYangAI
Incoming research scientist @allen_ai, PhD @upennnlp, interested in vision and language.
Successfully defended my PhD thesis and got hooded this week! Thanks to all the friends who supported me throughout this incredible journey! Excited to join PRIOR at @allen_ai next and continue exploring open vision-language research!

🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd! • 📉 Conformity ↑ when agents lack confidence but trust peers • 🧠 Presentation format shapes peer influence • 🎯 Controlled herding can boost collaboration outcomes 👉 Read more: arxiv.org/abs/2505.21588
🎉CoSyn is accepted by ACL2025!
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: yueyang1996.github.io/cosyn/ Dataset: huggingface.co/datasets/allen… Paper:…
#NAACL2025 How to compare cultural differences with social media data in scale? Our work uses lexica to annotate X 🇺🇸 & Weibo 🇨🇳 posts with valence (😄☹️) & arousal (🔥❄️) scores, revealing cross-cultural differences in emotional expression. aclanthology.org/2025.findings-…
#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD…
Exciting news! 🎉 Our paper “ViUniT: Visual Unit Tests for More Robust Visual Programming” got accepted at #CVPR2025
🎉Just Announced: "ViUniT: Visual Unit Tests for More Robust Visual Programming" has been accepted at #CVPR2025! Paper Link: arxiv.org/pdf/2412.08859 Project Page: artemisp.github.io/viunit/ Researcher’s walk-through 👇 In collaboration with @UPenn, we introduce ViUniT, a framework…
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of…
Articulate Anything has just been accepted to @iclr_conf #ICLR2025 ! Looking forward to seeing everyone in Singapore 🇸🇬 🙀❤️!
📦 Can frontier AI transform ANY physical object from ANY input modality into a high-quality digital twin that also MOVES? Excited to share our work,Articulate-Anything, exploring how large vision-language models (VLMs) can bridge the gap between the physical and digital…
📢Applications are open for summer'25 internships at the PRIOR (computer vision) team @allen_ai: Come join us in building large-scale models for: 📸 Open-source Vision-Language Models 💻 Multimodal Web Agents 🤖 Embodied AI + Robotics 🌎 Planet Monitoring Apply by December…
Excited to share ✨ Contextualized Evaluations ✨! Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓
🤔What model explanation method should you use? How to ensure it reflects the model’s true reasoning? 🌟 In our CL survey, Towards Faithful Model Explanation in NLP, we review 110+ explainability methods through the lens of faithfulness. Check out my presentation at #EMNLP2024!
✨Updates✨: • Dolomites was accepted to TACL: dolomites-benchmark.github.io! - Our data is now also up on HuggingFace: huggingface.co/datasets/cmala…. • I will be talking about Dolomites at EMNLP'24 in Miami (Session 11 on Nov 14 at 10:45 ET). Please say hi if you're around!
Excited to share new work done @GoogleDeepMind: 🏔️ DOLOMITES: Domain-Specific Long-Form Methodical Tasks, a new long-form generation benchmark for evaluating language models on **realistic** domain-specific tasks. Website: dolomites-benchmark.github.io Paper: arxiv.org/abs/2405.05938
🧵Streamlined AI-generated fake news with realistic "photo evidence" from Midjourney and caption from LLM is a misinformation superspreader. Introducing MiRAGeNews--a large dataset of 15,000 image-caption pairs that aims to train more robust detectors. arxiv.org/abs/2410.09045