Ziteng Sun

@SZiteng

Responsible and efficient AI. Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

NYC

Joined February 2015

402Following

504Followers

Ziteng Sun@SZiteng · Jul 17

Happening now at poster E-2804. Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling

AAhmad Beirami@abeirami · Jul 9

[Thu Jul 17] w/ @ananthbshankar & @jacobeisenstein, we present a reinforcement learning framework in view of test-time scaling. We show how to optimally calibrate & transform rewards to obtain optimal performance with a given test-time algorithm. x.com/SZiteng/status…

3.0K

Ziteng Sun Retweeted

Ziteng Sun@SZiteng · Feb 10

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time…

250

230

59.0K

Ziteng Sun Retweeted

Nived Rajaraman@Nived_Rajaraman · May 9

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

31.0K

Ziteng Sun@SZiteng · Apr 24

Today at 10am I will present @SZiteng's paper "block verification accelerates speculative decoding"

AAhmad Beirami@abeirami · Apr 18

Friday 10am, I will present @SZiteng's paper on 𝐛𝐥𝐨𝐜𝐤 𝐯𝐞𝐫𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐬𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐝𝐞𝐜𝐨𝐝𝐢𝐧𝐠 (w/ @th33rtha) x.com/SZiteng/status…

6.0K

Ziteng Sun Retweeted

Hongyang Zhang@hongyangzh · Mar 21

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x🚀than vanilla (on HF) - 1.4x🚀than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x🚀in latency even for large bs=64 (on SGLang) - A new…

302

194

39.0K

Ziteng Sun@SZiteng · Feb 14

⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and…

IInfini-AI-Lab@InfiniAILab · Feb 14

🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –…

455

225

64.0K