Xingcheng Yao
@StuartYao22139
Member of technical staff at @Kimi_Moonshot, Prev @uclanlp, @Tsinghua_IIIS, @princeton_nlp.
So big and so beautiful!! Really proud to be part of the team.
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
True. It's a Logit hard-capping with capping factors absorbed to the weight.
correction: actually there is a clamp_max on η. (equivalently, rescaling only happens if max(qk) > t)
Meet Kimi K2, the BIG BEAUTIFUL MODEL that has the best agentic capability to-date. We are curious what you could build with Kimi K2. Proud of everyone in Kimi team. It's just a beginning. We'll keep building & keep shipping.
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
Check this out: github.com/MoonshotAI/Kim…
just finished the tech report and pushed to github. good night.
So excited!
🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different…
Machine learning technique really has no boundary, so does @jianlin_S Orz.
QK-Clip: Taking Muon Further on the Scaleup Journey kexue.fm/archives/11126 Interpreting the Key Training Techniques Behind Kimi K2: QK-Clip and MuonClip.
Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…
Interested in the combination of Inference time scaling + LLM Agent?🤖💭 Announcing QLASS (Q-guided Language Agent Stepwise Search, arxiv.org/abs/2502.02584), a framework that supercharges language agents at inference time. ⚡In this work, we build a process reward model to guide…
🚀🚀🚀Want to develop a cutting-edge video generation model towards Sora? Please dive into Apple’s latest recipe and studies for scalable video generation models🔥🔥🔥. In this work, we aim at providing a transparent and detailed recipe 📖 for model architecture, training…
📣 New Paper: Verbalized Representation Learning (VRL) VRL bridges prompt engineering and representation learning to enable automatic interpretable feature extraction — all without gradient descent! 🔥 +29% over SOTA 📊 95% less data arxiv.org/abs/2411.18651 @uclanlp (1/n)
Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172
🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…