Xinyu Yang
@Xinyu2ML
Ph.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them
🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level…
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n
Agentic foundation models have emerged as a promising direction towards AGI. However, we are still in the early stages of developing such models capable of multi-modal reasoning—an essential capability for enabling most real-world applications. We warmly invite you to submit your…
🚨 We’re thrilled to announce our ICCV 2025 Workshop: MMRAgI – Multi-Modal Reasoning for Agentic Intelligence! 🚨 🌐 Homepage: agent-intelligence.github.io/agent-intellig… 📥 Submit: openreview.net/group?id=thecv… 🗓️ Submission Deadline (Proceeding Track): June 24th 2025 23:59 AoE 🗓️ Submission Deadline…
A recurrent depth/Huginn-3.5B Update: I orginally wanted to post these more often, but I guess time is a river, and I just don't like posting all that much yet... The most interesting finding about the depth recurrent model has been this unassuming chart, actually:
I’m gonna be recruiting students thru both @LTIatCMU (NLP) and @CMU_EPP (Engineering and Public Policy) for fall 2026! If you are interested in reasoning, memorization, AI for science & discovery and of course privacy, u can catch me at ACL! Prospective students fill this form:
📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!
Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, & Qwen3 ⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to: - 70%…
I used to underestimate the importance of prompt engineering. However, after working on Multiverse, I’ve come to realize that the success of LLMs in solving highly challenging tasks is deeply tied to prompt design. For example, generating each training example for Multiverse…
Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025
Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
Maybe a better talk of Multiverse haha (After 10+ talk
recording link: hku.zoom.us/rec/share/TyPL… pwd: T4Y1Z.99
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
1/3✅Our LCFM at #ICML2025 workshop wrapped up successfully! 👏Huge thanks to our speakers for sharing cutting-edge insights: @tri_dao @PangWeiKoh @bmwshop @jiajunwu_cs @volokuleshov 👏And to our panelists for the inspiring discussion: @YuandongTian @MohitIyyer @bmwshop @Xinyu2ML
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
Join us next week, we are presenting the Multiverse the great HKUNLP seminar!
Xinyu Yang from CMU will be giving a talk titled "Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation" at Friday July 25 11am HKT (Thursday July 24 8pm PDT). Link to talk: hku.zoom.us/j/92651812689?…
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
Things are moving fast with our team (all good news!) We’re baking something really exciting. Sadly missing @icmlconf this year in person, but I’ll be giving a virtual oral at the R2-FM Workshop (Reliable & Responsible Foundation Models): 📍 Sat, 10:40–11:00 AM PT 📍 West…
Benchmarks say “perfect score.” 😇 A model scoring that high can still lose $100,000 on a single decision. 😈 Our position paper argues: safety must headline evaluation for LLM finance agents. [📖arxiv.org/abs/2502.15865] We outline a 3-layer audit recipe (model, workflow &…
Shouldn't that be placed in China or at least Asia given that the majority of attendees with visa issues are from China or other Asian countries.
We're excited to announce a second physical location for NeurIPS 2025, in Mexico City. By expanding our physical locations, we hope to address concerns around skyrocketing attendance and difficulties in obtaining travel visas that some attendees have experienced in the past few…
📢 Update: Announcing Dream's next-phase development. - Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. - DreamOn: targeting the variable-length generation problem in dLLM!
Autoregressive models are too restrictive by forcing a fixed generation order, while masked diffusion is wasteful as it fits all possible orders. Can our model dynamically decide the next position to generate based on context? Learn more in our ICML paper arxiv.org/abs/2503.05979
Asynchronous decoding: multiple LLM threads write different parts of an answer in parallel. In Feb we (MIT×Google) introduced PASTA—the first async-dec method that uses policy learning to optimize latency & quality end-to-end. See us @ E-2600, East Hall A-B, Tue 11pm #ICML.
A new approach from CSAIL & Google marks a shift toward teaching models to orchestrate their own parallel decoding strategy. The team's "Parallel Structure Annotation" (PASTA) enables LLMs to generate text in parallel, accelerating their response times: bit.ly/4eDsVVo