Hui Shen
@HuiShen_umich
CS Ph.D student @ UMich | BS CIS @ Ohio State
📷 New Benchmark Release: PhyX - Physical Reasoning for Multimodal Models 👉 Project Page: phyx-bench.github.io 👉 Github: github.com/NastyMarcus/Ph… 👉 arXiv: arxiv.org/abs/2505.15929 👉 Huggingface Dataset: huggingface.co/datasets/Cloud…




🧐Will you tune the Base Model or the Instruct Model? We find that: 😿Tuning Instruct gets marginal improvements or even degenerations 🤔Paired Base and Instruct are HIGHLY similar in weights 👏latest paper⬇️ TL, DR: tune Base and directly graft the weights to Insturct 🧵[1/n]
🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss. 📄 Paper: arxiv.org/abs/2502.14317 💻 Code: github.com/menik1126/Para…
Had a blast working with @DarthZhu_ ! We try to analyze and use the modality-specific models extended from the same #LLM backbones to create omni ones. e.g., Qwen2-VL, -Video, -Audio, on #Qwen2 Tho most results are negative, we have some interesting findings here :)
😴 Extending modality based on an LLM has been a common practice when we are talking about multimodal LLMs. ❓ Can it generalize to omni-modality? We study the effects of extending modality and ask three questions: arxiv.org/abs/2506.01872 #LLM #MLLM #OmniModality
Thanks for sharing this work. Now that mathematical reasoning is making huge progress. It's time for us to focus on other reasoning paradigms. The recent PhyBench and our PhyX are two new benchmarks for physical reasoning.
PhyX Does Your Model Have the "Wits" for Physical Reasoning?