Yihe Deng

@Yihe__Deng

CS PhD candidate @UCLA, Student Researcher @GoogleAI | Prev. Research Intern @MSFTResearch @AWS | LLM post-training, synthetic data

Joined November 2021

1KFollowing

3KFollowers

Pinned

Yihe Deng@Yihe__Deng · Jul 24

🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and…

Yihe__Deng's tweet image. 🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and…

173

11.0K

Pinned

Yihe Deng@Yihe__Deng · Mar 3

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: github.com/yihedeng9/rlhf… 🚀 I’m hoping these summaries are helpful, and I’d love to hear…

YYihe Deng@Yihe__Deng · Nov 16

😄I did a brief intro of RLHF algorithms for the reading group presentation of our lab. It was a good learning experience for me and I want to share the github repo here holds the slides as well as the list of interesting papers: github.com/yihedeng9/rlhf… Would love to hear about…

100

11.0K

Pinned

Yihe Deng Retweeted

Yong Lin@Yong18850571 · Feb 12

🚀 Exciting news! Our Goedel-Prover paper is now live on arXiv: arxiv.org/pdf/2502.07640 🎉 We're currently developing the RL version and have a stronger checkpoint than before (currently not included in the report)!🚀🚀🚀 Plus, we’ll be open-sourcing 1.64M formalized…

136

21.0K

Yihe Deng Retweeted

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

248

118

56.0K

Yihe Deng@Yihe__Deng · Jul 14

Our paper "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance" will be presented as a spotlight at ICML! I won't make it to Vancouver, but please say hi to my co-author @linxizhao4 there :) arxiv.org/pdf/2402.08680

LLinxi Zhao@linxizhao4 · Jul 14

I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via…

1.0K

Yihe Deng Retweeted

Linxi Zhao@linxizhao4 · Jul 14

3.0K

Yihe Deng Retweeted

Linxi Zhao@linxizhao4 · May 27

🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts…

5.0K

Yihe Deng Retweeted

Siyan Zhao@siyan_zhao · Apr 11

Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.

107

569

379

82.0K

Yihe Deng@Yihe__Deng · Mar 24

Thanks @_akhaliq for sharing our work!

AAK@_akhaliq · Mar 24

OpenVLThinker An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

14.0K

Yihe Deng@Yihe__Deng · Mar 24

🗞️Arxiv: arxiv.org/abs/2503.17352

YYihe Deng@Yihe__Deng · Mar 21

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%…

8.0K

Yihe Deng@Yihe__Deng · Feb 25

We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /…

OOpenAI@OpenAI · Feb 25

We're also sharing the system card, detailing how we built deep research, assessed its capabilities and risks, and improved safety. openai.com/index/deep-res…

498

48.0K

Yihe Deng Retweeted

Siyan Zhao@siyan_zhao · Feb 24

Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to user preferences in long-context conversations! ⚠️We find that cutting-edge LLMs struggle to follow user preferences—even in short contexts. This isn't just…

133

12.0K

Yihe Deng Retweeted

Ge Zhang@GeZhang86038849 · Feb 21

[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM…

214

29.0K

Yihe Deng Retweeted

Ziniu Li@ZiniuLi · Feb 20

🌟 Can better cold start strategies improve RL training for LLMs? 🤖 I’ve written a blog that delves into the challenges of fine-tuning LLMs during the cold-start phase and how the strategies applied there can significantly impact RL performance in complex reasoning tasks that…

168

105

14.0K

Yihe Deng Retweeted

DeepSeek@deepseek_ai · Feb 18

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With…

898

2.0K

16.0K

5.0K

2.5M

Yihe Deng Retweeted

Wanjia Zhao@WanjiaZhao1203 · Feb 11

Introducing #SIRIUS🌟: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. 📜Preprint: arxiv.org/pdf/2502.04780 💻code: github.com/zou-group/siri… 1/N

325

187

35.0K