Weiming Ren

@wmren993

CS PhD student @UWaterloo @UWCheritonCS

Joined November 2023

56Following

63Followers

Weiming Ren Retweeted

Yuansheng Ni@YuanshengNi · Jun 5

📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the…

10.0K

Weiming Ren Retweeted

Dongfu Jiang@DongfuJiang · Jun 1

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

343

245

47.0K

Weiming Ren Retweeted

Wenhu Chen@WenhuChen · May 23

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement…

394

322

81.0K

Weiming Ren@wmren993 · May 21

🧠📽️ New benchmark release: VideoEval-Pro! Long Video Understanding (LVU) is critical for building truly intelligent multimodal systems — think surveillance analysis, instructional video QA, or summarizing hour-long meetings. But here's the problem👇 🧩 Nearly all existing LVU…

WWeiming Ren@wmren993 · May 21

Excited to share VideoEval-Pro, a robust and comprehensive evaluation suite for long video understanding (LVU) models. 📊1,289 open-ended questions from 465 long videos (avg. 38 mins) 🎯Diverse task types: perception and reasoning tasks based on local and holistic video contents

4.0K

Weiming Ren Retweeted

Wenhu Chen@WenhuChen · Apr 1

🎬 Automated filmmaking is the future — You need dialogue, expressive talking heads, synchronized body motion, and multi-character interactions. 🚀 Today, in collaboration with @AIatMeta, we’re excited to introduce MoCha: Towards Movie-Grade Talking Character Synthesis 🔊…

9.0K

Weiming Ren Retweeted

Cong Wei@CongWei1230 · Apr 1

🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis Please unmute to hear the demo audio. ✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input. ✨We propose…

217

106

36.0K

Weiming Ren Retweeted

Benjamin Schneider@BenRSchneider · Mar 24

Excited to share what I've been working on lately: ABC - A multimodal embedding model trained for embedding specific aspects of an image. ABC is perfect for visual embedding tasks that require a little more control over the embedding. Details on the training pipeline 👇

3.0K

Weiming Ren@wmren993 · Mar 18

🚨 New Paper Alert! 🚨 Thrilled to announce VAMBA: a powerful hybrid Mamba-Transformer architecture designed specifically for hour-long video understanding tasks! VAMBA can receive more than 1000 frames on a single GPU card efficiently! 🎯 Why do we need hour-long video models?…

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Mar 17

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers VAMBA is a hybrid Mamba-Transformer model for long video understanding that uses Mamba-2 blocks to encode video tokens with linear complexity. It handles over 1024 frames without token reduction, reducing GPU…

167

36.0K