Jiao Sun

@sunjiao123sun_

Senior Research Scientist at Google DeepMind \n\n NLP PhD @ USC, Amazon ML Fellow \n\n ex-{Google Brain, Alexa AI} nlper, IIIS Tsinghua-Ren

Joined September 2019

565Following

12KFollowers

Pinned

Jiao Sun@sunjiao123sun_ · Jun 4

🚨 Agent Wars Incoming? What happens when both the buyer and seller use AI agents to make decisions? Turns out: 💡 If your agent’s dumber, you pay more. Stronger reasoning models dominate the game. Welcome to the future of asymmetric AI! 💸🤖💸 #AIagents #LLM #FutureOfMarkets

JJiaxin Pei@jiaxin_pei · Jun 3

AI Shopping/Sales Agents sound very cool! But what if both the buyer and seller use AI agents? Our recent study found that stronger agents can exploit weaker ones to get a better deal, and delegating negotiation to AI agents might lead to economic losses. arxiv.org/abs/2506.00073…

8.0K

Jiao Sun@sunjiao123sun_ · Jul 22

Jiaxin is simply awesome!! Go work with him on fun agent stuff!

JJiaxin Pei@jiaxin_pei · Jul 21

Life Update: I will join @UTiSchool as an Assistant Professor in Fall 2026 and will continue my work on LLM, HCI, and Computational Social Science. I'm building a new lab on Human-Centered AI Systems and will be hiring PhD students in the coming cycle!

3.0K

Jiao Sun@sunjiao123sun_ · Jul 8

Wow. As an AC/reviewer, it's disheartening to see how low-effort reviews can unfairly hurt students and erode the community we’ve worked so hard to build. @ReviewAcl — can you help track down this reviewer/AC and investigate? Before it’s too late!

WWenjie Jacky Mo@Wenjie_Jacky_Mo · Jul 8

@ReviewAcl @emnlpmeeting Urgent help needed. acFZ: initial score 3 🧊 Complete silence during discussion. ⏰ 4am PST, 9 min before deadline: quietly drops to 2. with “Thanks for the rebuttal. I have updated the score.” ⚠️ No explanation. No notice. No chance to respond. (0/n)

11.0K

Jiao Sun Retweeted

Jeff Dean@JeffDean · Jul 3

I really enjoyed our run in Cubbon Park, @divy93t! Plus ending with fresh coconut was awesome!

1.0K

69.0K

Jiao Sun@sunjiao123sun_ · Jun 20

Thrilled to share our new reasoning model, Polaris✨! The 4B version achieves a score of 79.4 on AIME 2025, surpassing Claude 4 Opus (75.5) We’re releasing the full RL recipe, data, and weights 🔓 — see all the details below

CChenxin An@AnChancy46881 · Jun 20

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…

3.0K

Jiao Sun Retweeted

Leo Liu@ZEYULIU10 · Jun 16

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

195

112

28.0K

Jiao Sun Retweeted

Tu Vu@tuvllms · Jun 3

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s…

145

16.0K

Jiao Sun Retweeted

Haoyi Qiu@HaoyiQiu · May 22

🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL 🧨 Safety…

8.0K

Jiao Sun@sunjiao123sun_ · May 12

I’ve seen many questions about how to choose ARR tracks for submissions aim at the new tracks at #emnlp2025. We actually wrote a blogpost along with the 2nd CFP exactly to address this: 2025.emnlp.org/track-changes/ Please help us share it widely! Good luck with your emnlp submissions!

FFei Liu@feiliu_nlp · May 10

Happy to see #EMNLP2025 introducing new tracks on AI/LLM Agents, Code Models, Safety & Alignment, Reasoning, LLM Efficiency, and more. Big thanks to the organizers for making this happen! @emnlpmeeting #NLProc Perfect venue for agentic research and language technologies.…

16.0K

Jiao Sun Retweeted

Jeff Dean@JeffDean · May 6

Today we’re sharing an early look at our latest Gemini update for I/O! Introducing the updated Gemini 2.5 Pro (I/O edition), which ranks #1 on WebDev Arena and surpasses our previous 2.5 Pro model by +147 Elo points. 🏆 blog.google/products/gemin…

122

1.0K

172

99.0K

Jiao Sun@sunjiao123sun_ · May 6

Was part of the force driving Gemini #1 on WebDev Arena. We are still cooking for better things to come! 👩‍🍳🤘

llmarena.ai@lmarena_ai · May 6

🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude…

8.0K

Jiao Sun Retweeted

Alexander Spangher@AlexanderSpangh · May 5

We had such a fun tutorial on Creative Planning at NAACL 2025!! I think it’s the first tutorial in ACL history to include LIVE musical performances from @VioletNPeng and @Songyan_Silas_Z!!

1.0K

Jiao Sun@sunjiao123sun_ · May 1

Come chat with DQ and me about text-image alignment in front of our poster! Tomorrow 4pm MT at hall 3!

DDeqing Fu@DeqingFu · May 1

#NAACL2025 Checkout DreamSync's poster tomorrow (May 1) at Hall 3, 4:00-5:30pm. Feel free to stop by to chat about multimodality, evaluation, and interpretability. We are also planning an interpretability lunch tomorrow. Find it on whova and join!

3.0K

Jiao Sun@sunjiao123sun_ · Apr 25

Today I gave a talk at USC reading group about “The Role of Evaluation in RL era” and thought the following topics could be interesting to work on at both school and industry. Hope it helps! Happy to talk more about these at #NAACL2025 in person if you are also coming! 🤘

sunjiao123sun_'s tweet image. Today I gave a talk at USC reading group about “The Role of Evaluation in RL era” and thought the following topics could be interesting to work on at both school and industry.

Hope it helps!

Happy to talk more about these at #NAACL2025 in person if you are also coming! 🤘

4.0K

Jiao Sun@sunjiao123sun_ · Apr 23

You don’t want to miss Violet!!

VViolet Peng@VioletNPeng · Apr 23

Excited to speak more about AI creativity at SSNLP today in Singapore ssnlp-website.github.io/ssnlp25/ Also look forward to hear what Qwen team has to say about their latest breakthrough! Friends in Singapore: let’s catch up!

2.0K

Jiao Sun@sunjiao123sun_ · Apr 17

TLDR: Low Price, High Performance! Proud to be part of Gemini team!! 👑

GGoogle DeepMind@GoogleDeepMind · Apr 17

Gemini 2.5 Flash just dropped. ⚡ As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 - making it ideal for tasks like building chat apps, extracting data and more. Try an early version in @Google AI Studio → ai.dev

4.0K

Jiao Sun@sunjiao123sun_ · Apr 4

I expected LLMs to have more faithful reasoning as they gained more from reasoning. Bigger capability gains suggested to me that models would use the stated reasoning more. Sadly, we only saw small gains to faithfulness from reasoning training, which also quickly plateau-ed.

AAnthropic@AnthropicAI · Apr 3

New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

6.0K

Jiao Sun Retweeted

Rylan Schaeffer@RylanSchaeffer · Apr 4

Interested in test time / inference scaling laws? Then check out our newest preprint!! 📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉 arxiv.org/abs/2502.17578 w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1…

228

154

93.0K

Jiao Sun@sunjiao123sun_ · Mar 29

Gemini 2.5 Pro is SOTA on pretty much everything

SSilas Alberti@silasalberti · Mar 29

Wow we just ran Gemini 2.5 Pro on our evals and it got a new state of the art. Congrats to the Gemini team! Sharing preliminary results here and working on bringing it into Devin:

352

27.0K

Jiao Sun Retweeted

Tu Vu@tuvllms · Mar 26

🚨 New paper 🚨 Excited to share my first paper w/ my PhD students!! We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model…

439

273

41.0K