Jaehong Yoon
@jaeh0ng_yoon
Incoming Assistant Professor @ntusg CCDS | Postdoc @uncnlp, w/ @mohitban47 | Prv: @MLAI_KAIST @MSFTResearch | Trustworthy & Continually Adaptable Multimodal AI
Thrilled to share that I’ll be joining the College of Computing and Data Science at Nanyang Technological University (NTU) (@NTUsg) as an Assistant Professor, starting in August 2025 🇸🇬🥳 I’ll continue my research on building trustworthy and continually adaptable multimodal AI,…

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…
🚀 We introduce GrAInS, a gradient-based attribution method for inference-time steering (of both LLMs & VLMs). ✅ Works for both LLMs (+13.2% on TruthfulQA) & VLMs (+8.1% win rate on SPA-VL). ✅ Preserves core abilities (<1% drop on MMLU/MMMU). LLMs & VLMs often fail because…
WHY do you prefer something over another? Reward models treat preference as a black-box😶🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
Copyrighted 🚧, private 🛑, and sensitive ☢️ data remain major challenges for AI. FlexOlmo introduces an architectural mechanism to flexibly opt-in/opt-out segments of data in the training weights, **at inference time**. (Prior common solutions were to filter your data once…
CLiFT Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
🥳 Excited to share our work -- Retrieval-Augmented Generation with Conflicting Evidence -- on addressing conflict in RAG due to ambiguity, misinformation, and noisy/irrelevant evidence has been accepted to @COLM_conf #COLM2025! Our new benchmark RAMDocs proves challenging for…
🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a…
🚀 Check out our new paper Video-RTS — a data-efficient RL approach for video reasoning with video-adaptive TTS! While prior work relies on massive SFT (400K+ VQA and/or CoT samples 🤯), Video-RTS: ▶️ Replaces expensive SFT with pure RL using output-based rewards ▶️ Introduces a…
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…
🚨 Check our new paper, Video-RTS, a novel and data-efficient RL solution for complex video reasoning tasks, complete with video-adaptive Test-Time Scaling (TTS). 1⃣️Traditionally, such tasks have relied on massive SFT datasets. Video-RTS bypasses this by employing pure RL with…
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…
Video-RTS Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…
🥳Excited to share that I’ll be joining @unccs as postdoc this fall. Looking forward to work with @mohitban47 & amazing students at @unc_ai_group. I'll continue working on retrieval, aligning knowledge modules with LLM's parametric knowledge, and expanding to various modalities.
🎉Excited to announce VEEGIE has been accepted to #ICCV2025 ! VEGGIE is a unified MLLM + Diffusion framework for instructional video editing. It presents a systematic approach spanning data, model, benchmark, and evaluation design, and shows strong multi-skill editing +…
🚀 Excited to introduce a new member of the LRM (Large Reconstruction Models) family — 4D-LRM! 1. What is 4D-LRM? It’s a large-scale space-time model that reconstructs a dynamic object from any few views at any time to any view at any other time. 2. What does it do? 🔁 Learn…
Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from…
🚀 Check out our new paper MEXA — a training-free framework for general multimodal reasoning! Most existing approaches require heavy training and struggle to generalize across tasks and modalities. MEXA tackles this with: 🔹 A query-aware expert selector that chooses task- and…
New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…
🚨 New paper out! We present MEXA, a general framework for multimodal reasoning powered by adaptive tool use across diverse domain-specific experts. Designed for broad extensibility and seamless integration ➡️ ✅ Dynamically selects the most relevant expert model for each query…
New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…
🚨Excited to announce MEXA, a general multimodal reasoning framework which -- Dynamically selects query-related expert models. -- Deep reasoning on the expert outputs -- Training-free and easy to generalize to wide task/modality Check out the paper/thread for more details 👇!
New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…
New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…
🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like…
🎉Excited to share that I’ll be starting my CS PhD journey at @UNC @unccs this fall! 🎓 I’ll be working with the renowned @mohitban47 at @uncnlp — a dream comes true! ✨ Huge thanks to everyone who's helped me get here. Can't wait to begin this new life and research journey! 🧳🚀