Zhenhailong Wang
@zhenhailongW
Ph.D. in CS @UofIllinois, advised by Prof. Heng Ji @hengjinlp. Research intern at Tencent AI Lab, Microsoft Research Aisa, Salesforce AI Research, Amazon
Dive deeper into PAPO on Hugging Face! This framework improves multimodal reasoning without extra data or external reward models. Explore the paper, models, and datasets: Paper: huggingface.co/papers/2507.06… Models: hf.co/collections/PA… Data: hf.co/collections/PA…
How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…
🧠 How can AI evolve from statically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘢𝘣𝘰𝘶𝘵 𝘪𝘮𝘢𝘨𝘦𝘴 → dynamically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘪𝘮𝘢𝘨𝘦𝘴 as cognitive workspaces, similar to the human mental sketchpad? 🔍 What’s the 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗼𝗮𝗱𝗺𝗮𝗽 from tool-use → programmatic…
🧠Let’s teach LLMs to learn smarter, not harder💥[arxiv.org/pdf/2506.06972] 🤖How can LLMs verify complex scientific information efficiently? 🚀We propose modular, reusable atomic reasoning skills that reduce LLMs’ cognitive load to verify scientific claims with little data.…
📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
📣 SMARTAgent is accepted to ACL 2025 Findings! It’s increasingly important to form an agent’s metacognition, which we believe should guide its action and reasoning. We are continuing on this way!! Position paper will be released soon!
🚀Can your language model think strategically? 🧠 SMART: Boosting LM self-awareness to reduce Tool Overuse & optimize reasoning! 🌐 arxiv.org/pdf/2502.11435 📊 github.com/qiancheng0/Ope… Smaller models, bigger brains. Smarter tool use, better results! 🔥 #AI #LLM
🐂🍺Introducing our recent preprint: Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training! We present PRIOR, a simple vision-language pre-training algorithm that addresses the challenge of irrelevant textual content in image-caption pairs. PRIOR enhances…
We are extremely excited to announce mCLM, a Modular Chemical Language Model that is friendly to automatable block-based chemistry and mimics bilingual speakers by “code-switching” between functional molecular modules and natural language descriptions of the functions. 1/2
🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we…
Why allocate the same number of visual tokens to a blank image and a complex landscape? Introducing DyMU: a training-free algorithm that makes any ViT visual encoder dynamic-length and plug-and-play with downstream VLMs. 🚀 🔗 Project Page: mikewangwzhl.github.io/dymu/


🚀 ToolRL unlocks LLMs' true tool mastery! The secret? Smart rewards > more data. 📖 Introducing newest paper: ToolRL: Reward is all Tool Learning Needs Paper Link: arxiv.org/pdf/2504.13958 Github Link: github.com/qiancheng0/Too…
🚀 We benchmarked our INFOGENT framework against the latest LLM web search APIs from @OpenAI & @perplexity_ai—our approach proves to be competitive with these proprietary solutions! I'm currently on the industry job market for research scientist roles. Please do reach out if you…
🚀 Can LLMs aggregate information from diverse web sources ? We try to answer that in our latest work: INFOGENT: a modular, agent-based framework for information aggregation on the web! Website: gangiswag.github.io/infogent/ 🌐🔍 🧵 [1/n]
Tweet 1/5 🚀 Introducing MultiAgentBench: the first comprehensive benchmark for LLM agents evaluating both collaboration and competition! 🏞️ • arXiv: arxiv.org/abs/2503.01935 • GitHub: github.com/MultiagentBenc… • Hugging Face: huggingface.co/papers/2503.01…
🔥🔥PC-Agent Release!! we propose the PC-Agent framework to handle the complex interactive environment and complex tasks in PC scenarios. 📖Paper: huggingface.co/papers/2502.14… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #Multimodal #MLLM @_akhaliq
🔍New findings of knowledge overshadowing! Why do LLMs hallucinate over all true training data? 🤔Can we predict hallucinations even before model training or inference? 🚀Check out our new preprint: [arxiv.org/pdf/2502.16143] The Law of Knowledge Overshadowing: Towards…