Infini-AI-Lab

@InfiniAILab

Pittsburgh, PA

Joined September 2024

37Following

1KFollowers

Infini-AI-Lab@InfiniAILab · Jul 23

Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run…

InfiniAILab's tweet image. Huge thanks to @tinytitans_icml for an amazing workshop — see you next year!
Honored to receive a Best Paper Award 🏆

Let’s unlock the potential of sparsity!
Next up: scaling to hundreds/thousands of rollouts?
Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run…

18.0K

Infini-AI-Lab Retweeted

Azalia Mirhoseini@Azaliamirh · Jun 26

Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny…

225

127

18.0K

Infini-AI-Lab@InfiniAILab · Jun 25

#MLSys2026 will be led by the general chair @luisceze and PC chairs @JiaZhihao and @achowdhery. The conference will be held in Bellevue on Seattle's east side. Consider submitting and bringing your latest works in AI and systems—more details at mlsys.org.

ZZhihao Jia@JiaZhihao · Jun 25

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to…

12.0K

Infini-AI-Lab@InfiniAILab · Jun 21

This is cool!!!

PPiotr Nawrot@p_nawrot · Jun 18

We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while automatically inheriting vLLM’s optimizations and model support. As a PhD student, I've learned that sometimes the bottleneck in research isn't ideas — it's…

7.0K

Infini-AI-Lab@InfiniAILab · Jun 20

Great to see a lot of interest! It takes some time to construct the superpositional encoding correctly, and to make it compatible with popular positional embeddings. So it is not super obvious😁. More interestingly, our experiments show that such superpositional encodings…

YYann LeCun@ylecun · Jun 18

It is intuitively obvious that reasoning in continuous embedding space is dramatically more powerful than reasoning in discrete token space. This paper from @tydsh and team show that it is the case theoretically.

9.0K

Infini-AI-Lab Retweeted

Together AI@togethercompute · Jun 18

🐳DeepSeek-R1 just got more accessible Introducing our new cost-optimized endpoint for DeepSeek-R1 0528: ✨ High-quality reasoning ✨ $0.55/$2.19 per million tokens ✨ No quality compromises Perfect for developers needing powerful reasoning at accessible pricing 💰

121

33.0K

Infini-AI-Lab@InfiniAILab · Jun 19

wow 🤩 check this out!!!

ZZhihao Jia@JiaZhihao · Jun 19

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized…

13.0K

Infini-AI-Lab@InfiniAILab · Jun 19

Recordings: youtube.com/watch?v=TPz3OF… Slides: asap-seminar.github.io/assets/slides/…

SSonglin Yang@SonglinYang4 · Jun 17

Tomorrow at 2 PM Eastern Time, the ASAP seminar will feature @Xinyu2ML presenting an exciting work on parallel reasoning. (Xinyu is also a co-organizer of the seminar series—and said he'll be hosting himself, lol.)

7.0K

Infini-AI-Lab Retweeted

Songlin Yang@SonglinYang4 · Jun 17

17.0K

Infini-AI-Lab@InfiniAILab · Jun 17

@Xinyu2ML will be presenting this amazing work at ASAP seminar tomorrow! Do not miss his talk

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

6.0K