Zhongzhi Li
@ZhongzhiLi4
Ph.D Student of Computer Science @MSFTResearch, Red Inc. Research Intern,Visiting Student of HKUST,in LARK Lab
Can MLLMs truly see the world like humans? 👁️ We conduct a preliminary study using our benchmark TET 🎯, and found that nearly every SOTA (Claude-4, Gemini, o1, etc.) scores ≈ 0 % on some perception tasks that humans solve effortlessly, revealing a fundamental perceptual gap.
Our paper can be found in: arxiv.org/abs/2507.16863 Project page: turingeyetest.github.io
🙋♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4
Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here:mastervito.github.io/MasterVito.SwS…
New training paradigm: instead of just predicting tokens, models reason about each prediction using RL The model thinks through context, considers alternatives, then makes a prediction. 14B model matches 32B baseline, though training costs are significantly higher.