LMSYS Org (@lmsysorg)

Pinned

L

LMSYS Org@lmsysorg · May 5

🚀 Breaking: SGLang provides the first open-source implementation to serve @deepseek_ai V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input…

lmsysorg's tweet image. 🚀 Breaking: SGLang provides the first open-source implementation to serve @deepseek_ai V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs.
It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input…

10

82

391

176

154.0K

L

LMSYS Org@lmsysorg · 7 h

Kudos to NVIDIA's Pen Li for his incredible support in our collaboration with SGLang—from B200 and GB200 NVL72 to 128 H200 GPU k8s cluster, he has continuously helped the SGLang team gain access, which makes large-scale LLM serving on cutting-edge hardware possible!🤗

NNVIDIA Data Center@NVIDIADC · Jul 21

Proud to support this lightning-fast launch⚡️ ️ Accelerated through #NVIDIADGX Cloud and in partnership with Moonshot AI, @SGLang, and @Oracle Open Model Engine, we helped bring Kimi K2 to customers just days after its debut. Now, organizations can “Think Smart” and scale MoE…

0

1

7

1

1.0K

L

LMSYS Org@lmsysorg · Jul 21

Proud to support this lightning-fast launch⚡️ ️ Accelerated through #NVIDIADGX Cloud and in partnership with Moonshot AI, @SGLang, and @Oracle Open Model Engine, we helped bring Kimi K2 to customers just days after its debut. Now, organizations can “Think Smart” and scale MoE…

LLMSYS Org@lmsysorg · Jul 20

🚨SGLang Summer Fest Bonus Drop🚨 Proud to share a joint effort from Mooncake by @Kimi_Moonshot, @Oracle , and SGLang: Kimi K2 trillion-scale deployment—running on 128 H200 GPUs sponsored by @NVIDIAAIDev DGX Cloud. OME + SGLang = MoE inference at production scale.👇

1

12

71

3

5.0K

L

LMSYS Org@lmsysorg · 12 h

✅ We’re excited to support @Alibaba_Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…

QQwen@Alibaba_Qwen · 14 h

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

1

7

48

14

3.0K

L

LMSYS Org@lmsysorg · Jul 16

This work is done by my PhD student Zijian and researchers at Nvidia! @LigengZhu @songhan_mit Big congrats! We are also integrating VLM into AReaL.

LLMSYS Org@lmsysorg · Jul 16

🚀Summer Fest Day 4: Turbocharging Vision-Language Models with SGLang + NVILA 4.4× throughput, 2.2× faster response time! We've integrated NVILA into SGLang, enabling high-performance, scalable serving of vision-language models. This unlocks a 4.4× TPS boost and significantly…

0

3

12

3

2.0K

L

LMSYS Org@lmsysorg · Jul 16

NVILA is available in SGLang👏🏻

LLMSYS Org@lmsysorg · Jul 16

🚀Summer Fest Day 4: Turbocharging Vision-Language Models with SGLang + NVILA 4.4× throughput, 2.2× faster response time! We've integrated NVILA into SGLang, enabling high-performance, scalable serving of vision-language models. This unlocks a 4.4× TPS boost and significantly…

1

10

32

7

4.0K

LMSYS Org Retweeted

L

LMSYS Org@lmsysorg · Jul 14

🚀Summer Fest Day 3: Cost-Effective MoE Inference on CPU from Intel PyTorch team Deploying 671B DeepSeek R1 with zero GPUs? SGLang now supports high-performance CPU-only inference on Intel Xeon 6—enabling billion-scale MoE models like DeepSeek to run on commodity CPU servers.…

5

15

37

13

16.0K

L

LMSYS Org@lmsysorg · Jul 15

Congrats Mingfei and the team! Thanks to the excellent collaborations between @lmsysorg and Intel! #intel #sglang #xeon

LLMSYS Org@lmsysorg · Jul 14

🚀Summer Fest Day 3: Cost-Effective MoE Inference on CPU from Intel PyTorch team Deploying 671B DeepSeek R1 with zero GPUs? SGLang now supports high-performance CPU-only inference on Intel Xeon 6—enabling billion-scale MoE models like DeepSeek to run on commodity CPU servers.…

1

8

33

9

4.0K