Wentao Guo

@WentaoGuo7

CS PhD student @PrincetonCS, Previously CS MEng + BS @CornellCIS

Joined November 2021

159Following

289Followers

Pinned

Wentao Guo@WentaoGuo7 · Jul 10

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao

WentaoGuo7's tweet image. 🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯

With @tedzadouri and @tri_dao

318

190

72.0K

Pinned

Wentao Guo Retweeted

Songlin Yang@SonglinYang4 · Feb 21

Introducing the first open-source implementation of native sparse attention: github.com/fla-org/native…. Give it a spin and cook your NSA model! 🐳🐳🐳

123

761

421

72.0K

Wentao Guo Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

687

5.0K

4.0K

693.0K

Wentao Guo Retweeted

Infini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

220

108

77.0K

Wentao Guo Retweeted

Tenny Yin@tennyyin · Jun 11

🔎Can robots search for objects like humans? Humans explore unseen environments intelligently—using prior knowledge to actively seek information and guide search. But can robots do the same? 👀 🚀Introducing WoMAP (World Models for Active Perception): a novel framework for…

196

123

26.0K

Wentao Guo Retweeted

Infini-AI-Lab@InfiniAILab · Jun 6

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)…

246

162

77.0K

Wentao Guo Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · May 28

Hardware-Efficient Attention for Fast Decoding Princeton optimizes decoding by maximizing arithmetic intensity (FLOPs/byte) for better memory–compute efficiency: - GTA (Grouped-Tied Attention) Ties key/value states + partial RoPE → 2× arithmetic intensity vs. GQA, ½ KV cache,…

196

126

17.0K

Wentao Guo Retweeted

Sasha Rush@srush_nlp · Feb 24

Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (@SonglinYang4 MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A

564

529

142.0K

Wentao Guo Retweeted

Songlin Yang@SonglinYang4 · Feb 21

32.0K

Wentao Guo@WentaoGuo7 · Feb 14

⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and…

IInfini-AI-Lab@InfiniAILab · Feb 14

🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –…

455

225

64.0K

Wentao Guo Retweeted

Tri Dao@tri_dao · Dec 10

I'm at NeurIPS this week. Will be at some of these posters & workshops. Please reach out if you want to talk about ML & systems in general, esp efficient training & inference and new architectures

139

8.0K

Wentao Guo@WentaoGuo7 · Dec 11

Come and join us ~ happening now :)

cchen zhuoming@chenzhuoming911 · Dec 11

Our work Sequoia (spotlight) is presented at East Exhibit Hall A-C #4910 by co-authors. Nice to see you! @NeurIPS2024 @avnermay @BeidiChen

5.0K

Wentao Guo@WentaoGuo7 · Nov 7

I've been using Komo for a while, much better experience than Google for complex queries. AI doesn't replace search, but unlock new ways to search like drawing insights of each of the sources and reducing hallucinations

KKomo@komo__ai · Nov 7

Struggle with AI making things up, even when citing sources? We hear you. Today, we're excited to release Komo 2.0, to help you verify and collaborate with AI more effectively.

9.0K