Hyeonjeong Ha ✈️ ACL
@hyeonjeong_ai
Ph.D. student @IllinoisCS @UIUC_NLP | Previously @KAIST @kaist_ai | ML Research Intern @Apple
🎨 Can AI design truly novel concepts like humans? Check SYNTHIA, a breakthrough in T2I generation! 🤖 SYNTHIA composes affordances to create visually novel & functionally coherent designs. 📄arxiv.org/pdf/2502.17793 💻github.com/HyeonjeongHa/S… 🎥youtube.com/watch?v=KvsOx4…


Excited to be presenting on Monday, 7/28 from 11:00am–12:30pm at Hall 4/5 in ACL! If you’re interested in MLLM research, I’d love to chat—come say hi!🇦🇹👋
🎨 Can AI design truly novel concepts like humans? Check SYNTHIA, a breakthrough in T2I generation! 🤖 SYNTHIA composes affordances to create visually novel & functionally coherent designs. 📄arxiv.org/pdf/2502.17793 💻github.com/HyeonjeongHa/S… 🎥youtube.com/watch?v=KvsOx4…
🚀 Excited to share our work led by my amazing labmate @zhenhailongW, PAPO: Perception-Aware Policy Optimization, an extension of GRPO for multimodal reasoning! No extra labels. No reward models. Just internal supervision. 🔥 Learning to perceive while learning to reason.
Learning to perceive while learning to reason! We introduce PAPO: Perception-Aware Policy Optimization, a direct upgrade to GRPO for multimodal reasoning. PAPO relies on internal supervision signals. No extra annotations, reward models, or teacher models needed. 🧵1/3
🧠 How can AI evolve from statically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘢𝘣𝘰𝘶𝘵 𝘪𝘮𝘢𝘨𝘦𝘴 → dynamically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘪𝘮𝘢𝘨𝘦𝘴 as cognitive workspaces, similar to the human mental sketchpad? 🔍 What’s the 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗼𝗮𝗱𝗺𝗮𝗽 from tool-use → programmatic…
Excited to share our work on Energy-Based Transformers, led by my amazing labmate @AlexiGlad—a new frontier in unlocking generalized reasoning across modalities without rewards. Grateful to be part of this journey! ⚡️ 🧠 Think longer. Verify better. Generalize further.
How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
Thrilled to share that our paper has been accepted to #ACL2025 Main 🇦🇹 Huge thanks to my amazing collaborators and my advisor @hengjinlp 🙃 📄arxiv.org/abs/2502.17793 Happy to chat about our work as well as MLLM research projects 🙌

🐂🍺Introducing our recent preprint: Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training! We present PRIOR, a simple vision-language pre-training algorithm that addresses the challenge of irrelevant textual content in image-caption pairs. PRIOR enhances…
Very excited by our work on visual affordance learning in the wild for robotics! 🤩
How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this…
We are extremely excited to announce mCLM, a Modular Chemical Language Model that is friendly to automatable block-based chemistry and mimics bilingual speakers by “code-switching” between functional molecular modules and natural language descriptions of the functions. 1/2
🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.
We're thrilled to announce BLIP3-o, a breakthrough in unified multimodal models that excels at both image understanding and generation in a single autoregressive architecture! 💫 📊 Paper: bit.ly/3Saybpo 🤗 Models: bit.ly/4jhFaYM 🧠 Code:…
🚀 Computational persuasion of LLMs can be a game-changer—dive into our new survey to explore the taxonomy, spot the risks, and investigate further challenges in persuasive LLMs!
Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬 📄Arxiv: arxiv.org/pdf/2505.07775 💻 GitHub: github.com/beyzabozdag/Pe…
Today we're excited to introduce Vy, our AI that sees and acts on your computer. At Vercept, our mission is to reinvent how humans use computers–enabling you to accomplish orders of magnitude more than what you can do today. Vy is a first glimpse at AI that sees and uses your…
Why allocate the same number of visual tokens to a blank image and a complex landscape? Introducing DyMU: a training-free algorithm that makes any ViT visual encoder dynamic-length and plug-and-play with downstream VLMs. 🚀 🔗 Project Page: mikewangwzhl.github.io/dymu/
We have made a huge progress in language model reasoning. But our progress in multimodal reasoning (like MMMU) is very limited. Why? It's due to the lack of diverse, difficult and high-quality multimodal reasoning dataset! 🚀 New Paper Alert! 📢 We introduce VisualWebInstruct,…
Welcome to my #AAAI2025 Tutorial, "The Quest for A Science of LMs," today! Time: Feb 26, 2pm-3:45pm Location: Room 113A, Pennsylvania Convention Center Website: glaciohound.github.io/Science-of-LLM… Underline: underline.io/events/487/sch…
SearchDet was accepted to #CVPR2025 🎉 We retrieve images from the Web and generate heatmaps through simple feature subtraction to improve long-tail object detection 👁
🔥 Thrilled to announce our paper “SearchDet: Training-Free Long Tail Object Detection via Web-Image Retrieval” has been accepted to #CVPR2025! 🚀 We’re redefining object detection by leveraging web image retrieval – no extra training required! Paper - arxiv.org/abs/2409.18733