Shengyu Feng

@ShawnSYFeng

PhD student @LTIatCMU

Joined September 2021

168Following

115Followers

Pinned

Shengyu Feng Retweeted

Yiping Lu@2prime_PKU · Jul 25

Anyone knows adam?

264

435

5.0K

493

557.0K

Shengyu Feng Retweeted

Andriy Burkov@burkov · Jul 25

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of…

156

1.0K

911

73.0K

Shengyu Feng Retweeted

Alexandr Wang@alexandr_wang · Jul 25

We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs! Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team. Let's go 🚀

329

456

8.0K

1.0K

3.3M

Shengyu Feng Retweeted

Dmitry Rybin@DmitryRybin1 · Jul 25

RL+LLM researchers actively use LLM distribution Entropy to measure training dynamics. This number is misleading. John Von-Neumann and Lev Landau gave us the correct answer 100 years ago while studying mixed quantum states in Hilbert spaces. Usual Entropy treats all tokens as…

103

1.0K

999

75.0K

Shengyu Feng Retweeted

Rohan Paul@rohanpaul_ai · Jul 25

Beautiful @GoogleResearch paper. LLMs can learn in context from examples in the prompt, can pick up new patterns while answering, yet their stored weights never change. That behavior looks impossible if learning always means gradient descent. The mechanisms through which this…

329

2.0K

3.0K

269.0K

Shengyu Feng Retweeted

Alex Gu@minimario1729 · Jul 16

ai for math workshop papers released, it's a fun batch🚀 openreview.net/group?id=ICML.…

2.0K

Shengyu Feng@ShawnSYFeng · Jul 13

Will present two papers at #icml25! Happy to chat! Main (7/16): Regularized Langevin Dynamics for Combinatorial Optimization. (icml.cc/virtual/2025/p…) AI4MATH Workshop (7/18): A Comprehensive Evaluation of Contemporary ML-based Solvers for Combinatorial Optimization.

335

Shengyu Feng Retweeted

Dmitry Krotov@DimaKrotov · Jul 10

In physics there is an elegant method for computing the correlation functions called generating function. The idea is simple - instead of computing correlators one by one - you define a function of a parameter and compute the average of that new function. Individual correlators…

184

2.0K

145.0K

Shengyu Feng@ShawnSYFeng · Jul 9

Given an image of a car and a caption of a horse, will VLMs recognize the corresponding unimodal information? No, we show they usually struggle with conflicting inputs. We look into their internal representations to understand why and how this happens, and find that we can…

EEtha Tianze Hua@EthaHua · Jul 9

Check out our new paper: “How Do Vision-Language Models Process Conflicting Information Across Modalities?”! Vision-language models often struggle with conflicting inputs - we show how their internal representations and key attention heads reveal when and how this happens, and…

747

Shengyu Feng@ShawnSYFeng · Jun 25

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window…

ttobi lutke@tobi · Jun 19

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

533

2.0K

14.0K

9.0K

2.3M

Shengyu Feng Retweeted

Lucas Beyer (bl16)@giffmana · Jun 26

hey all, couple quick notes: 1) yes, we will be joining Meta. 2) no, we did not get 100M sign-on, that's fake news. Excited about what's ahead though, will share more in due time! cc @__kolesnikov__ and @XiaohuaZhai.

386

114

4.0K

599

702.0K

Shengyu Feng Retweeted

Tivadar Danka@TivadarDanka · Jun 23

Understanding graph theory will seriously enhance your engineering skills; you must absolutely be familiar with them. Here's a graph theory quickstart, in collaboration with @alepiad. Read on:

162

1.0K

2.0K

106.0K

Shengyu Feng Retweeted

Xiao Ma@infoxiao · Jun 18

🎨 Gemini 2.5 tech report just dropped! So proud to have led the development of RL*F (Reinforcement Learning from Human and Critic Feedback) - our breakthrough in AI training inspired by... art school crits? Here's the thing: How do you teach taste? Style? Things without clear…

276

168

29.0K

Shengyu Feng Retweeted

Shanda Li 黎善达@Shanda_Li_2000 · Jun 17

Can LLM solve PDEs? 🤯 We present CodePDE, a framework that uses LLMs to automatically generate solvers for PDE and outperforms human implementation! 🚀 CodePDE demonstrates the power of inference-time algorithms and scaling for PDE solving. More in 🧵: #ML4PDE #AI4Science

16.0K

Shengyu Feng Retweeted

varepsilon@var_epsilon · Jun 17

read the first letter of every name in the gemini contributors list

113

3.0K

345

211.0K

Shengyu Feng@ShawnSYFeng · Jun 17

Research with amazing collaborators @JizeJiang, @MeitangLi, and @JingchengYang, guided by great advisors and supported by the generous help of talented researchers @BowenJin13, @XingyuFu2, and many open-source contributors (easyr1, verl, vllm... etc).

JJize Jiang@JizeJiang · Jun 17

Excited to introduce VTool-R1! We’ve trained VLMs to “think visually” using RL, blending Python-based 🖼️visual edits with💡textual Chain-of-Thought reasoning. Our trained qwen2.5-VL-32B surpasses GPT-4o on ChartQA & TableVQA, and even the compact qwen2.5-VL-7B significantly…

4.0K

Shengyu Feng Retweeted

Yuchen Jin@Yuchenj_UW · Jun 10

o3-pro is the slowest and most overthinking model. A simple 'Hi' cost me $80. 🥲

241

200

11.0K

602

793.0K

Shengyu Feng Retweeted

Lindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

139

27.0K

Shengyu Feng Retweeted

Junhong Shen@JunhongShen1 · Jun 10

🔥Unlocking New Paradigm for Test-Time Scaling of Agents! We introduce Test-Time Interaction (TTI), which scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976

166

79.0K

Shengyu Feng Retweeted

Vaishnavh Nagarajan@_vaishnavh · Jun 2

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

165

112

27.0K