Yuansheng Ni

@YuanshengNi

Keep Learning!

Joined October 2022

88Following

58Followers

Pinned

Yuansheng Ni@YuanshengNi · Jun 5

📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the…

10.0K

Yuansheng Ni Retweeted

Ningyu Zhang@ZJU@zxlzr · Jul 25

🚀 This year, we’ve rolled out a series of updates to EasyEdit1—and dropped EasyEdit2 to steer LLM behavior on the fly! 🔧✨ 👉 Code: github.com/zjunlp/EasyEdit What’s new? • Datasets: Integrated AKEW, LEME & UNKE • Methods: NAMET, CORE, UNKE, AnyEdit & Reference-free Preference…

2.0K

Yuansheng Ni Retweeted

Yu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

221

133

39.0K

Yuansheng Ni@YuanshengNi · Jun 6

Thanks for sharing our work!

AAK@_akhaliq · Jun 5

VisCoder Fine-Tuning LLMs for Executable Python Visualization Code Generation

180

Yuansheng Ni Retweeted

Wenhu Chen@WenhuChen · Jun 5

🚨 New Paper Alert 🚨 We found that Supervised Fine-tuning on ONE problem can achieve similar performance gain as RL on ONE problem with 20x less compute! Paper: arxiv.org/abs/2506.03295 Recently, people have shown that RL can work even with ONE example. This indicates that the…

321

258

41.0K

Yuansheng Ni Retweeted

Dongfu Jiang@DongfuJiang · Jun 1

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

343

245

47.0K

Yuansheng Ni Retweeted

AK@_akhaliq · May 26

PhyX Does Your Model Have the "Wits" for Physical Reasoning?

156

29.0K

Yuansheng Ni Retweeted

Hui Shen@HuiShen_umich · May 24

📷 New Benchmark Release: PhyX - Physical Reasoning for Multimodal Models 👉 Project Page: phyx-bench.github.io 👉 Github: github.com/NastyMarcus/Ph… 👉 arXiv: arxiv.org/abs/2505.15929 👉 Huggingface Dataset: huggingface.co/datasets/Cloud…

879

Yuansheng Ni Retweeted

Weiming Ren@wmren993 · May 21

Excited to share VideoEval-Pro, a robust and comprehensive evaluation suite for long video understanding (LVU) models. 📊1,289 open-ended questions from 465 long videos (avg. 38 mins) 🎯Diverse task types: perception and reasoning tasks based on local and holistic video contents

5.0K

Yuansheng Ni Retweeted

Wenhu Chen@WenhuChen · May 23

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement…

394

322

81.0K

Yuansheng Ni Retweeted

Dongfu Jiang@DongfuJiang · May 24

Introducing QuickVideo, 🚀 speeding up the end-to-end time from the mp4 bit stream to VideoLLM inference by at least 2.5 times for hour-long video understanding (e.g 1024 frames) on a single 40GB GPU. 🤔What are the key challenges of hour-long video understanding? 1.…

169

107

24.0K

Yuansheng Ni Retweeted

Wenhu Chen@WenhuChen · Apr 15

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing…

333

291

44.0K

Yuansheng Ni Retweeted

Wenhu Chen@WenhuChen · Apr 15

🔥 How do you build a state-of-the-art Vision-Language Model with direct RL? We’re excited to introduce VL-Rethinker, a new paradigm for multimodal reasoning trained directly with Reinforcement Learning. 📈 It sets new SOTA on key math+vision benchmarks: - MathVista: 80.3 → 🥇…

292

235

24.0K

Yuansheng Ni@YuanshengNi · Apr 10

🚀Big WebDreamer update! We train 💭Dreamer-7B, a small but strong world model for real-world web planning. 💥Beats Qwen2-72B ⚖️Matches #GPT-4o Trained on 3M synthetic examples — and yes, all data + models are open-sourced.

YYu Gu@yugu_nlp · Nov 21

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and…

15.0K