Wenhao Chai

@wenhaocha1

Ph.D. Student @PrincetonCS. Prev @UW @Stanford @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I work on computer vision, but it's not all I do.

Seattle, WA

Joined January 2022

2KFollowing

1KFollowers

Wenhao Chai@wenhaocha1 · 12 h

This appears to be a well-defined and good problem. Take a look!

TTyler Zhu@tyleryzhu · Jul 20

We're also introducing a new interpretability track (more details soon) and two guest tracks: 1. KiVA image understanding: like ARC-AGI but grounded in cog sci w/ difficulty levels 2. Physics-IQ video generation: can your img2video model generate physically plausible scenes?

297

Wenhao Chai@wenhaocha1 · Jul 25

Go LONG VIDEO! Our MovieChat in early 2023 just build a very naive prototype for memory-augmented long video context understanding. Super excited to see it comes true at scale and in application. github.com/rese1f/MovieCh…

SShawn Shen@shawnshenjx · Jul 24

I’m Shawn, founder of Memories.ai, former researcher at Meta and CS PhD at University of Cambridge. Today we’re launching : we built the world’s first Large Visual Memory Model - to give AI human-like visual memories. Why visual memory? AI to…

2.0K

Wenhao Chai@wenhaocha1 · Jul 24

Dataset Distillation as Data Compression: A Rate-Utility Perspective arxiv.org/abs/2507.17221 Read this paper tonight, get me some sense: Dataset Distillation ≈ Visual Tokenization? Dataset Distillation: Replace full dataset with few synthetic samples Visual Tokenizer: Replace…

3.0K

Wenhao Chai@wenhaocha1 · Jul 22

This amazing team from Kuaishou did a good job in LiveCodeBench Pro: 40B model but almost matches o3-mini performance. Take a look at their tech report! leaderboard: livecodebenchpro.com

KKwaiAICoder@KwaiAICoder · Jul 21

🚀 Excited to introduce KAT-V1 (Kwaipilot-AutoThink) – a breakthrough 40B large language model from the Kwaipilot team! KAT-V1 dynamically switches between reasoning and non-reasoning modes to address the “overthinking” problem in complex reasoning tasks. Key Highlights: 📌 40B…

718

Wenhao Chai@wenhaocha1 · Jul 22

arxiv.org/abs/2507.13338 This is really a great paper connect multiple concept like spectral norm, res connection, and layer norm and more techs under Lipschitz condition. truly well-written and easy to follow.

359

Wenhao Chai@wenhaocha1 · Jul 21

From formal language to natural language and yet still made remarkable progress! That’s way far better than what I could do now (and past), I’ve been out of math competitions for ages. so the question is: is natural language actually better than lean, or they just try to build…

GGoogle DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

628

Wenhao Chai Retweeted

Lambda@LambdaAPI · Jul 17

Workshop Highlights from @CVPR LOVE @CVPR'25 Challenge wrapped up with incredible participation 🎉 🔹 Academic talks from leading researchers 🔹 Winners crowned in both Track 1A & 1B 🔹 Prizes awarded by @LambdaAPI Reports from winners is now live! Check it out 🔗 Track…

884

Wenhao Chai Retweeted

Zirui Wu @ACL2025 🇦🇹@WilliamZR7 · Jul 15

We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.

113

14.0K

Wenhao Chai Retweeted

Zhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

110

11.0K

Wenhao Chai@wenhaocha1 · Jul 16

We should also turn our attention to the Dream series — an amazing research group that's steadily building the foundation for dLLMs.

LLingpeng Kong@ikekong · Jul 15

What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.

1.0K

Wenhao Chai@wenhaocha1 · Jul 11

great work, always like controlled (even just toy) experiments. I'm afraid we can't have OOD generalization in current overparameterization ML system. I’ve recently grown fond of pattern/concept narratives. In fact, I don’t believe that data-driven neural networks are capable of…

KKeyon Vafa@keyonV · Jul 11

Our paper aims to answer two questions: 1. What's the difference between prediction and world models? 2. Are there straightforward metrics that can test this distinction? Our paper is about AI. But it's helpful to go back 400 years to answer these questions.

1.0K

Wenhao Chai@wenhaocha1 · Jul 11

Single-pass Adaptive Image Tokenization for Minimum Program Search arxiv.org/abs/2507.07995 Find this paper super impressive and their team have been working on meaningful vision tokenzier for a long time! What I take from this paper: not all images have the same complexity, we…

wenhaocha1's tweet image. Single-pass Adaptive Image Tokenization for
Minimum Program Search
arxiv.org/abs/2507.07995

Find this paper super impressive and their team have been working on meaningful vision tokenzier for a long time! What I take from this paper: not all images have the same complexity, we…

2.0K

Wenhao Chai@wenhaocha1 · Jul 10

As an amateur photographer, I can’t wait to see more agents help with retouching and postprocessing!

ZZhengzhong Tu@_vztu · Jul 10

🤨Ever dream of a tool that can magically restore and upscale any (low-res) photo to crystal-clear 4K? 🔥Introducing "4KAgent: Agentic Any Image to 4K Super-Resolution", the most capable upscaling generalist designed to handle broad image types. 🔗4kagent.github.io 1/🧵

507

Wenhao Chai Retweeted

Martin Ziqiao Ma@ziqiao_ma · Jul 10

📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI! 👉 …vision-language-embodied-ai.github.io 🦾Co-organized with an incredible team → @fredahshi · @maojiayuan · @DJiafei · @ManlingLi_ · David Hsu · @Kordjamshidi 🌌 Why Space & SpaVLE? We…

8.0K

Wenhao Chai@wenhaocha1 · Jul 10

Very good ckpt hub for hybrid design research. Thanks!!

RRui-Jie (Ridger) Zhu@RidgerZhu · Jul 10

Hybrid architectures mix linear & full attention in LLMs. But which linear attention is best? This choice has been mostly guesswork. In our new work, we stop guessing. We trained, open-sourced 72 MODELS (340M & 1.3B) to dissect what truly makes a hybrid model tick🧶

460

Wenhao Chai@wenhaocha1 · Jul 9

Great work in hybrid model design in reasoning tasks! Compared to natural language, reasoning process is far more dense and informative, which is more challenging and meaningful.

LLiliang Ren@liliang_ren · Jul 9

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput…

582

Wenhao Chai@wenhaocha1 · Jul 9

I’ve been thinking about whether it’s possible to charge a submission fee for each paper, which would be refunded once the author successfully completes their reviewing duties. Any surplus could be used to reward outstanding reviewers. This would not only incentivize a better…

XXin Eric Wang@xwang_lk · Jul 8

Some say reviewing should be voluntary, so authors shouldn't be obligated to review. But authors also receive reviews as a free service—so we should give back, especially given the growing number of submissions. I support requiring authors to review or opt for a buyout (e.g.,…

1.0K

Wenhao Chai@wenhaocha1 · Jul 7

Just created a Gallery to display all generation results on RISEBench (by powerful models including GPT-4o Image, Gemini-2.0, Bagel, etc.). Please contact me if you want the results of your new model to be included! Tech Report: arxiv.org/abs/2504.02826

HHaodong Duan@KennyUTC · Apr 4

OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE). GPT-4o Image Generation only scores 36% on this challenging task! Technical Report: huggingface.co/papers/2504.02… #GPT4o

982

Wenhao Chai@wenhaocha1 · Jul 7

I believe if we view xsfm in associative memory, then the next token prediction is a single-step energy model, if it’s UT or depth recurrent, it’s a multi-step energy model. another interesting things is that when we do multi-step like UT, it’s very hard to optimize via…

AAlexi Gladstone@AlexiGlad · Jul 7

How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…

935

Wenhao Chai@wenhaocha1 · Jul 5

Find this paper really insightful. arxiv.org/abs/2507.02754 By definition, 0-simplex is linear model, 1-simplex is xsfm. N-simplex means at the operator each token has the relation with N other tokens group. I found alphafold use 1- and 2-simplex hybrid for their data. So I really…

2.0K