Shiqi Chen
@shiqi_chen17
PhD student @CityUHongKong. NLPer. Visiting PhD @NorthwesternU and @HKUST. Former @SeaAIL.
🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper:…
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:…
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)
SynLogic is our effort to synthesize verifiable reasoning data for RL scaling. It covers 35 diverse logical reasoning tasks and supports synthesizing at controlled difficulty and quantity. It is not only about logical reasoning. We mixed math, coding, and SynLogic altogether to…
Lack of RL Logical Reasoning data? Excited to share SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond 🚀 Building strong logical reasoning through RLVR 📄Paper: huggingface.co/papers/2505.19… 💻 Code: github.com/MiniMax-AI/Syn… (1/n)
We present Thinking with Generated Images, demonstrating how a single unified LMM can be trained to perform vision generation tasks using both textual and visual intermediate steps, along with critique and refinement capabilities. We believe that spontaneous multimodal thinking…
What if AI could mentally sketch its thoughts, just like you? The missing piece of AI multimodal reasoning is here! What if AI could daydream in images? Closing the imagination gap between humans and AI! Introducing Thinking with Generated Images — a new paradigm where large…
Check out SynLogic😀 -- new logical reasoning data synthesizing framework/dataset, spanning diverse logical reasoning domains. It achieves SOTA📊 on logical reasoning tasks and even shows strong generalization to other areas like math and coding💡!
Lack of RL Logical Reasoning data? Excited to share SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond 🚀 Building strong logical reasoning through RLVR 📄Paper: huggingface.co/papers/2505.19… 💻 Code: github.com/MiniMax-AI/Syn… (1/n)
[ICML 2025] Reasoning and multimodal perception are among the most exciting capabilities of modern models. Surprisingly, we found that both abilities can emerge via model merging—without any training. It’s like a free lunch 🍱 for capabilities! Even more interestingly, this…
Share our another #ICML25 paper: “Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging” ! (1/5) We use model merging to enhance VLMs' reasoning by integrating math-focused LLMs—bringing textual reasoning into multi-modal models. Surprisingly, this…
Share our another #ICML25 paper: “Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging” ! (1/5) We use model merging to enhance VLMs' reasoning by integrating math-focused LLMs—bringing textual reasoning into multi-modal models. Surprisingly, this…
We will be presenting SkyLadder poster at the Open Science of Foundation Models Workshop (Hall 4 #5) at 3pm today. Please drop by and have a look!
🚀 Excited to share our new paper: SkyLadder: Better and Faster Pretraining via Context Window Scheduling! Have you ever noticed the ever-increasing⬆context window of pretrained language models? The first generation of GPT had a context length of 512, followed by 1024 for GPT2,…
LLM Agents + Multi-turn RL: Why your RL models always collapse even with a simple env? We work in controlled envs to reveal the failure patterns and discuss how possibly fix it. GitHub ⭐️1.4k, easy to add your own env, we are actively resolving issues and welcome…
Why does your RL training always collapse? In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it. 📄 github.com/RAGEN-AI/RAGEN… 🌐 ragen-ai.github.io 1/🧵👇
Excited to share our work at ICLR 2025 in 🇸🇬. @iclr_conf 🥳 Happy to chat about LLM reasoning & planning, agents, and AI4Science! 📍Sat 26 Apr 3 p.m. CST — 5:30 p.m Hall 3 + Hall 2B #554