Shijie Xia
@ShijieX60925
CS Ph.D. student at SJTU.
I'm getting ~25% endless repetitions cases with Qwen3-30B-A3B (5-6K inputs) in my data processing, even with reasoning mode off. Known issue? @Alibaba_Qwen
The review quality in TMLR is better because: 1. The authors suggest the AE. This means that the AE is more likely to be the right fit for the paper. 2. The AE selects the best reviewers for the paper who may or may not be in the reviewer pool. ...
I've had good and garbage reviews at the 3 big conferences and TMLR. I don't recall a clear difference. Good to hear you've found TMLR to be good!
This is what I've always believed, when we go back to the optimization objective, many conclusions are quite ordinary
🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?
Just came across this fascinating paper on multimodal-R1. Really worth a read!
🔥 New paper drop! 🔥 🔍 In the fast-paced world of RL scaling, where leaderboard performance and rapid results take priority, the value of transparent, step-by-step exploration is often overlooked. Our latest work, MAYE, addresses this gap by introducing: 1️⃣ A from-scratch RL…
?🤣
Grok-3 just proved Riemann's hypothesis. We decided to pause its training to check its proof, and if the proof is correct, training won't be resumed, as the AI is deemed so smart that it becomes a danger to humanity.
There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the…
DPO is widely used to better align SFT LLMs with human preferences. However, its effectiveness is often limited by the KL-divergence constraint tied to the SFT model. In our new study, we closely examine DPO’s behavior, focusing on the significance of the reference policy through…
The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: github.com/GAIR-NLP/anole
New Plan: - Before I die, I will turn Illinois Tech into a new Waterloo-esque school with new funding, rigorous co-op programs, good classes, and prestige - I will donate enough to rename it to the Chicago Institute of Technology (ChiTech) and it will rival MIT by the time I die