Tianqi Chen (@tqchenml)

Pinned

T

Tianqi Chen@tqchenml · Jun 7, 2024

Exciting to share we have been working on over the past one year. MLCEngine, a universal LLM deployment engine that brings the power of server optimizations and local deploymet into a single framework, checkout platforms support 👇 and blogpost blog.mlc.ai/2024/06/07/uni… more in a🧵

tqchenml's tweet image. Exciting to share we have been working on over the past one year. MLCEngine, a universal LLM deployment engine that brings the power of server optimizations and local deploymet into a single framework, checkout platforms support 👇 and blogpost blog.mlc.ai/2024/06/07/uni… more in a🧵

7

57

250

125

55.0K

Tianqi Chen Retweeted

L

Lijie(Derrick) Yang@LijieyYang · Jul 23

Officially graduated from @SCSatCMU 🎓(Allen Newell Award, Honorable Mention) and thrilled to be starting my PhD at @Princeton with Prof. Ravi Netravali 🚀! Huge thanks to my advisor Mark Stehlik, research mentors @JiaZhihao @tqchenml, and amazing CMU Catalyst collaborators!

1

18

1

1.0K

T

Tianqi Chen@tqchenml · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

59

184

1.0K

755

186.0K

Tianqi Chen Retweeted

A

Allen School@uwcse · Jul 9

#UWAllen @UW & @nvidia researchers earned a #MLSys2025 Best Paper Award for boosting #LLM performance with FlashInfer—and showed “what’s possible when academia, industry & the open-source community innovate together,” says @ye_combinator. #AI #UWdiscovers news.cs.washington.edu/2025/07/01/all…

0

7

19

2

3.0K

Tianqi Chen Retweeted

B

Banghua Zhu@BanghuaZ · Jun 27

Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed…

145

105

3.0K

293

246.0K

T

Tianqi Chen@tqchenml · Jun 25

Mark your calendars for #MLSys2026 in May, 2026 in Seattle. Submission deadline for papers is Oct 30 this year.

ZZhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

7

16

110

26

25.0K

T

Tianqi Chen@tqchenml · Jun 25

#MLSys2026 will be led by the general chair @luisceze and PC chairs @JiaZhihao and @achowdhery. The conference will be held in Bellevue on Seattle's east side. Consider submitting and bringing your latest works in AI and systems—more details at mlsys.org.

ZZhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

0

13

62

10

12.0K

Tianqi Chen Retweeted

Z

Zhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

2

31

107

39

45.0K

T

Tianqi Chen@tqchenml · Jun 23

I’ve been starting to collaborate with the folks who are building FlashInfer: nice project and pretty amazing set of people! @ye_combinator @tqchenml and everyone.

NNVIDIA AI Developer@NVIDIAAIDev · Jun 16

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs. Go under the hood with…

0

3

28

3

3.0K

Tianqi Chen Retweeted

C

Chris Donahue@chrisdonahuey · Jun 20

Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generation with real-time control. 👋 **Try Magenta RT on Colab TPUs**: colab.research.google.com/github/magenta… 👀 Blog post: g.co/magenta/rt 🧵 below

15

87

371

176

65.0K

Tianqi Chen Retweeted

Z

Zhihao Jia@JiaZhihao · Jun 19

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

13

121

769

574

79.0K

T

Tianqi Chen@tqchenml · Jun 16

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level…

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

3

20

63

7

10.0K

T

Tianqi Chen@tqchenml · Jun 16

Say hello to Multiverse — the Everything Everywhere All At Once of generative modeling. 💥 Lossless, adaptive, and gloriously parallel 🌀 Now open-sourced: multiverse4fm.github.io I was amazed how easily we could extract the intrinsic parallelism of even SOTA autoregressive…

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

2

21

71

24

11.0K

T

Tianqi Chen@tqchenml · Jun 16

Check out our work on parallel reasoning 🧠; We bring an AI-assisted curator that identifies parallel paths in sequential traces, then tune models into native parallel thinkers that runs efficiently with prefix sharing and batching. Really excited about this general direction

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

1

15

97

28

10.0K

Tianqi Chen Retweeted

I

Infini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

6

82

220

108

77.0K

T

Tianqi Chen@tqchenml · Jun 16

NVIDIA🤗SGLang🚀

NNVIDIA AI Developer@NVIDIAAIDev · Jun 16

.@lmsysorg (SGLang) now achieves 7,583 tokens per second per GPU running @deepseek_ai R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at…

0

2

12

0

2.0K

Tianqi Chen Retweeted

N

NVIDIA AI Developer@NVIDIAAIDev · Jun 16

.@lmsysorg (SGLang) now achieves 7,583 tokens per second per GPU running @deepseek_ai R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at…

8

35

176

24

16.0K

Tianqi Chen Retweeted

L

LMSYS Org@lmsysorg · Jun 16

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and…

4

23

106

28

52.0K

T

Tianqi Chen@tqchenml · Jun 16

SGLang is an early user of FlashInfer and witnessed its rise as the de facto LLM inference kernel library. It won best paper at MLSys 2025, and Zihao now leads its development @NVIDIAAIDev. SGLang’s GB200 NVL72 optimizations were made possible with strong support from the…

LLMSYS Org@lmsysorg · Jun 16

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and…

2

14

92

23

11.0K

T

Tianqi Chen@tqchenml · Jun 16

Checkout the technical deep dive on FlashInfer

NNVIDIA AI Developer@NVIDIAAIDev · Jun 16

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs. Go under the hood with…

0

4

28

4

3.0K

Tianqi Chen Retweeted

N

NVIDIA AI Developer@NVIDIAAIDev · Jun 16

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs. Go under the hood with…

8

28

89

24

11.0K