Shijie Xia

@ShijieX60925

CS Ph.D. student at SJTU.

Shanghai

Joined July 2023

131Following

65Followers

Shijie Xia@ShijieX60925 · May 12

I'm getting ~25% endless repetitions cases with Qwen3-30B-A3B (5-6K inputs) in my data processing, even with reasoning mode off. Known issue? @Alibaba_Qwen

222

Shijie Xia@ShijieX60925 · May 11

The review quality in TMLR is better because: 1. The authors suggest the AE. This means that the AE is more likely to be the right fit for the paper. 2. The AE selects the best reviewers for the paper who may or may not be in the reviewer pool. ...

NNeel Nanda@NeelNanda5 · May 11

I've had good and garbage reviews at the 3 big conferences and TMLR. I don't recall a clear difference. Good to hear you've found TMLR to be good!

132

58.0K

Shijie Xia@ShijieX60925 · Apr 28

This is what I've always believed, when we go back to the optimization objective, many conclusions are quite ordinary

KKunhao Zheng@KunhaoZ · Apr 27

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

231

Shijie Xia@ShijieX60925 · Apr 7

Just came across this fascinating paper on multimodal-R1. Really worth a read!

YYan Ma@ManTle_Ma · Apr 4

🔥 New paper drop! 🔥 🔍 In the fast-paced world of RL scaling, where leaderboard performance and rapid results take priority, the value of transparent, step-by-step exploration is often overlooked. Our latest work, MAYE, addresses this gap by introducing: 1️⃣ A from-scratch RL…

144

Shijie Xia@ShijieX60925 · Nov 17

？🤣

HHieu Pham@hyhieu226 · Nov 17

Grok-3 just proved Riemann's hypothesis. We decided to pause its training to check its proof, and if the proof is correct, training won't be resumed, as the AI is deemed so smart that it becomes a danger to humanity.

260

Shijie Xia Retweeted

Jason Wei@_jasonwei · Nov 10

There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the…

174

2.0K

890

306.0K

Shijie Xia Retweeted

Yixin Liu@YixinLiu17 · Jul 23, 2024

DPO is widely used to better align SFT LLMs with human preferences. However, its effectiveness is often limited by the KL-divergence constraint tied to the SFT model. In our new study, we closely examine DPO’s behavior, focusing on the significance of the reference policy through…

13.0K

Shijie Xia Retweeted

Pengfei Liu@stefan_fee · Jul 9, 2024

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: github.com/GAIR-NLP/anole

105

527

394

329.0K

Shijie Xia Retweeted

alexander@mcneiilly · Jul 9, 2024

New Plan: - Before I die, I will turn Illinois Tech into a new Waterloo-esque school with new funding, rigorous co-op programs, good classes, and prestige - I will donate enough to rename it to the Chicago Institute of Technology (ChiTech) and it will rival MIT by the time I die

699

169

100.0K