Xingcheng Yao

@StuartYao22139

Member of technical staff at @Kimi_Moonshot, Prev @uclanlp, @Tsinghua_IIIS, @princeton_nlp.

Joined September 2024

262Following

235Followers

Pinned

Xingcheng Yao@StuartYao22139 · Jul 11

So big and so beautiful!! Really proud to be part of the team.

KKimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

1.0K

Pinned

Xingcheng Yao@StuartYao22139 · Jul 11

True. It's a Logit hard-capping with capping factors absorbed to the weight.

YYou Jiacheng@YouJiacheng · Jul 11

correction: actually there is a clamp_max on η. (equivalently, rescaling only happens if max(qk) > t)

2.0K

Pinned

Xingcheng Yao@StuartYao22139 · Jul 11

Meet Kimi K2, the BIG BEAUTIFUL MODEL that has the best agentic capability to-date. We are curious what you could build with Kimi K2. Proud of everyone in Kimi team. It's just a beginning. We'll keep building & keep shipping.

KKimi.ai@Kimi_Moonshot · Jul 11

239

16.0K

Xingcheng Yao@StuartYao22139 · Jul 21

Check this out: github.com/MoonshotAI/Kim…

�🐻熊狸@bigeagle_xd · Jul 21

just finished the tech report and pushed to github. good night.

4.0K

Xingcheng Yao@StuartYao22139 · Jul 17

So excited!

llmarena.ai@lmarena_ai · Jul 17

🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different…

1.0K

Xingcheng Yao@StuartYao22139 · Jul 12

Machine learning technique really has no boundary, so does @jianlin_S Orz.

jjianlin.su@Jianlin_S · Jul 12

QK-Clip: Taking Muon Further on the Scaleup Journey kexue.fm/archives/11126 Interpreting the Key Training Techniques Behind Kimi K2: QK-Clip and MuonClip.

552

Xingcheng Yao Retweeted

Yining Hong@yining_hong · Jun 19

Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…

160

46.0K

Xingcheng Yao Retweeted

Johnson Lin@zy27962986 · Feb 8

Interested in the combination of Inference time scaling + LLM Agent?🤖💭 Announcing QLASS (Q-guided Language Agent Stepwise Search, arxiv.org/abs/2502.02584), a framework that supercharges language agents at inference time. ⚡In this work, we build a process reward model to guide…

172

126

40.0K

Xingcheng Yao Retweeted

Johnson Lin@zy27962986 · Dec 12

🚀🚀🚀Want to develop a cutting-edge video generation model towards Sora? Please dive into Apple’s latest recipe and studies for scalable video generation models🔥🔥🔥. In this work, we aim at providing a transparent and detailed recipe 📖 for model architecture, training…

130

21.0K

Xingcheng Yao Retweeted

Cheng-Fu Joey Yang@cfyang58 · Dec 6

📣 New Paper: Verbalized Representation Learning (VRL) VRL bridges prompt engineering and representation learning to enable automatic interpretable feature extraction — all without gradient descent! 🔥 +29% over SOTA 📊 95% less data arxiv.org/abs/2411.18651 @uclanlp (1/n)

5.0K

Xingcheng Yao Retweeted

Xueqing Wu@xueqing_w · Dec 5

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

132

20.0K

Xingcheng Yao Retweeted

Yining Hong@yining_hong · Oct 31

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…

165

41.0K