Zhihao Jia (@JiaZhihao)

Pinned

Z

Zhihao Jia@JiaZhihao · Jun 19

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

JiaZhihao's tweet image. One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard.

🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

13

121

769

574

79.0K

Z

Zhihao Jia@JiaZhihao · Jul 16

NVILA is available in SGLang👏🏻

LLMSYS Org@lmsysorg · Jul 16

🚀Summer Fest Day 4: Turbocharging Vision-Language Models with SGLang + NVILA 4.4× throughput, 2.2× faster response time! We've integrated NVILA into SGLang, enabling high-performance, scalable serving of vision-language models. This unlocks a 4.4× TPS boost and significantly…

1

10

33

7

5.0K

Zhihao Jia Retweeted

W

Wentao Guo@WentaoGuo7 · Jul 10

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao

12

68

318

192

73.0K

Z

Zhihao Jia@JiaZhihao · Jul 7

Computer-Use Agents (CUAs) are improving every day but take up to tens of minutes to complete simple tasks. We built OSWorld-Human, a benchmark that measures efficiency - a first-step towards practical CUAs. Check out our blog post!

YYiying Zhang@yiying__zhang · Jul 7

Computer-use AI agents (CUAs) are powerful, but way too slow. A 2-minute human task can take a CUA over 20 minutes! At Wuklab, we're building faster CUAs. Recently, we created OSWorld-Human, a new benchmark to close the speed gap between humans and machines. Read our full blog…

0

2

1

647

Zhihao Jia Retweeted

F

Francis Y. Yan@FrancisYan_ · Jul 6

🚀 [OSDI ’25, Tue 11:10am] How do you “divide and conquer” large-scale resource allocation problems like GPU cluster scheduling or WAN traffic engineering? Our answer: “decouple and decompose” the underlying optimization using DeDe. (1/3)

4

5

48

13

3.0K

Zhihao Jia Retweeted

N

NovaSky@NovaSkyAI · Jun 26

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…

2

43

204

119

36.0K

Zhihao Jia Retweeted

A

Anjiang Wei@anjiangw · Jun 25

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 arxiv.org/pdf/2503.23145 💻 github.com/Anjiang-Wei/Co… 🌐 anjiang-wei.github.io/CodeARC-Websit… #LLM #Reasoning #LLM4Code #ARC

3

31

92

40

11.0K

Z

Zhihao Jia@JiaZhihao · Jun 25

Mark your calendars for #MLSys2026 in May, 2026 in Seattle. Submission deadline for papers is Oct 30 this year.

ZZhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

7

16

111

26

25.0K

Z

Zhihao Jia@JiaZhihao · Jun 25

#MLSys2026 will be led by the general chair @luisceze and PC chairs @JiaZhihao and @achowdhery. The conference will be held in Bellevue on Seattle's east side. Consider submitting and bringing your latest works in AI and systems—more details at mlsys.org.

ZZhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

0

13

62

10

12.0K

Z

Zhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

JiaZhihao's tweet image. 📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org.
We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

2

31

107

39

45.0K

Z

Zhihao Jia@JiaZhihao · Jun 19

wow cool

ZZhihao Jia@JiaZhihao · Jun 19

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

2

4

35

13

3.0K

Z

Zhihao Jia@JiaZhihao · Jun 19

wow 🤩 check this out!!!

ZZhihao Jia@JiaZhihao · Jun 19

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

1

10

73

30

13.0K

Z

Zhihao Jia@JiaZhihao · Jun 16

Say hello to Multiverse — the Everything Everywhere All At Once of generative modeling. 💥 Lossless, adaptive, and gloriously parallel 🌀 Now open-sourced: multiverse4fm.github.io I was amazed how easily we could extract the intrinsic parallelism of even SOTA autoregressive…

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

2

21

71

24

11.0K

Z

Zhihao Jia@JiaZhihao · Jun 16

Check out our work on parallel reasoning 🧠; We bring an AI-assisted curator that identifies parallel paths in sequential traces, then tune models into native parallel thinkers that runs efficiently with prefix sharing and batching. Really excited about this general direction

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

1

15

97

28

10.0K

Z

Zhihao Jia@JiaZhihao · Jun 11

@databricks 's Agent Bricks is powered by XGrammar for structured generation, and achieves high quality and efficiency. It helps you complete AI tasks without needing to worry about the algorithmic details. Give it a try!

MMatei Zaharia@matei_zaharia · Jun 11

Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent. databricks.com/blog/introduci…

0

4

12

1

2.0K

Zhihao Jia Retweeted

M

Matei Zaharia@matei_zaharia · Jun 11

Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent. databricks.com/blog/introduci…

6

46

247

107

40.0K

Z

Zhihao Jia@JiaZhihao · Jun 8

Super excited for some great launches at our largest Summit yet!

DDatabricks@databricks · Jun 8

#DataAISummit starts tomorrow! 20,000 attendees, 700+ sessions, keynotes, meetups, and training—all at the world’s largest data, analytics, and AI conference It’s not too late to join us. Register to attend in San Francisco or virtually: databricks.com/dataaisummit?u…

2

7

97

2

10.0K