Linli Yao

@Elsa_er_

Ph.D. Candidate in Computer Science @PKU1898 | http://B.Sc. & http://M.Sc. at RUC | Researching Vision-Language and Large Multimodal Models.

Beijing, China

Joined November 2021

51Following

22Followers

Pinned

Linli Yao@Elsa_er_ · May 8

🚀 Efficient Streaming Video Understanding Introducing TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Project Page: timechat-online.github.io Paper: arxiv.org/pdf/2504.17343 Code: github.com/yaolinli/TimeC…

Elsa_er_'s tweet image. 🚀 Efficient Streaming Video Understanding

Introducing TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Project Page: timechat-online.github.io
Paper: arxiv.org/pdf/2504.17343
Code: github.com/yaolinli/TimeC…

2.0K

Linli Yao Retweeted

Yukang Chen@yukangchen_ · Jul 11

Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights: 🧠 Dataset: LongVideo-Reason — 52K QAs with reasoning. ⚡ System: MR-SP - 2.1× faster RL for long videos. 📈 Scalability: Hour-long videos (3,600 frames) RL…

264

189

29.0K

Linli Yao@Elsa_er_ · Jul 7

🎉 Happy to share that our TimeChat-Online work has been accepted to ACM Multimedia 2025! 🔗 Check out the project page: timechat-online.github.io ⭐️ Star our repo if you like it: github.com/yaolinli/TimeC… 🤖 #VideoLLM 🎬 #StreamingAI 📊 #ACMMM2025

Elsa_er_'s tweet image. 🎉 Happy to share that our TimeChat-Online work has been accepted to ACM Multimedia 2025!
🔗 Check out the project page: timechat-online.github.io
⭐️ Star our repo if you like it: github.com/yaolinli/TimeC…

🤖 #VideoLLM 🎬 #StreamingAI 📊 #ACMMM2025

132

Linli Yao Retweeted

Zhaochen Su@SuZhaochen0110 · Jul 2

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

161

15.0K

Linli Yao@Elsa_er_ · Jun 11

🎥 Check out our new demo video that shows how TimeChat-Online makes real-time video understanding efficient, fun, and intuitive! 🌐 Demo: github.com/yaolinli/TimeC… 🔗 Project: timechat-online.github.io 👇 Try it out and let us know what you think! #StreaimingVideo #MultimodalAI

LLinli Yao@Elsa_er_ · May 8

611

Linli Yao Retweeted

Qingxiu Dong@qx_dong · Jun 10

⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens

149

953

827

105.0K

Linli Yao Retweeted

Lei Li@_TobiasLee · Jun 5

MiMo-VL technical report, models, and evaluation suite are out! 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL) Report: arxiv.org/abs/2506.03569 Evaluation Suite: github.com/XiaomiMiMo/lmm… Looking back, it's incredible that we delivered such compact yet powerful vision-language…

4.0K

Linli Yao Retweeted

Wenhu Chen@WenhuChen · May 23

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement…

394

322

81.0K

Linli Yao@Elsa_er_ · May 22

Thanks a lot for the invitation, Ruihong! Honored to have had the opportunity : ) For anyone interested, here are the slides: drive.google.com/file/d/1-deNsR…

RRuihong Qiu@RuihongQiu · May 22

In our Data Science Seminar @UQSchoolEECS today, we are very happy to have @yupenghou97 from @UCSD @ucsd_cse to talk about the hot topic on generative RecSys and tokenization.

3.0K

Linli Yao@Elsa_er_ · May 19

🎉 Delighted to share that our paper GenS has been accepted to ACL 2025 Findings 🤗 It’s been a real pleasure working with my wonderful collaborators! #ACL2025 #Multimodal #VideoLLM Code: github.com/yaolinli/GenS Dataset: huggingface.co/datasets/yaoli…

LLinli Yao@Elsa_er_ · Apr 30

📢 Introducing GenS: Generative Frame Sampler for Long Video Understanding! 🎯 It can identify query-relevant frames in long videos (minutes to hours) for accurate VideoQA 👉Project page: generative-sampler.github.io

1.0K

Linli Yao@Elsa_er_ · May 14

cool~

XXuan-Son Nguyen@ngxson · May 12

Real-time webcam demo with @huggingface SmolVLM and @ggml_org llama.cpp server. All running locally on a Macbook M3

Linli Yao Retweeted

Li Junnan@LiJunnan0409 · Apr 30

Introducing 🔥GenS🔥 (Generative Frame Sampler) — a plug-and-play module that greatly enhances long video understanding in existing LMMs (10+ gain for GPT-4o on LongVideoBench), by selecting fewer yet more informative frames. generative-sampler.github.io arxiv.org/abs/2503.09146

1.0K

Linli Yao Retweeted

Jaemin Cho@jmin__cho · Nov 8

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document…

319

280

64.0K

Linli Yao Retweeted

Saining Xie@sainingxie · Jun 26, 2024

Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]

249

1.0K

833

331.0K