Fangcong Yin

@fangcong_y10593

CS PhD Student @UTAustin studying NLP. Prev: @CornellCIS

Joined October 2023

639Following

262Followers

Pinned

Fangcong Yin@fangcong_y10593 · Jun 2

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

fangcong_y10593's tweet image. Solving complex problems with CoT requires combining different skills.

We can do this by:
🧩Modify the CoT data format to be “composable” with other skills
🔥Train models on each skill
📌Combine those models

Lead to better 0-shot reasoning on tasks involving skill composition!

11.0K

Fangcong Yin Retweeted

Leqi Liu@leqi_liu · Jul 10

What if you could understand and control an LLM by studying its *smaller* sibling? Our new paper proposes the Linear Representation Transferability Hypothesis: internal representations of different-sized models can be translated via a simple linear (affine) map.

158

106

13.0K

Fangcong Yin Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

169

597

204

138.0K

Fangcong Yin@fangcong_y10593 · Jun 19

There’s been hot debate about (The Illusion of) The Illusion of Thinking. My take: it’s not that models can’t reason — they just aren’t perfect at long-form generation yet. We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoning…

XXi Ye@xiye_nlp · Jan 14

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem？ 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize…

4.0K

Fangcong Yin Retweeted

Dongwei Jiang@Dongwei__Jiang · Jun 16

🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

155

11.0K

Fangcong Yin Retweeted

Leo Liu@ZEYULIU10 · Jun 16

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

195

112

28.0K

Fangcong Yin@fangcong_y10593 · Jun 12

Check out our new work on query-focused retrieval heads of LLMs! It is cool to see how interpretability insights can be used to improve zero-shot reasoning and re-ranking over long context.

XXi Ye@xiye_nlp · Jun 12

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a…

660

Fangcong Yin Retweeted

Liyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

11.0K

Fangcong Yin@fangcong_y10593 · Apr 23

Check out my work at @bespokelabsai We release Bespoke-MiniChart-7B, a new SOTA in chart understanding of its size Chart understanding is really fun and challenging and requires reasoning skills beyond math reasoning It's a great starting point for open chart model development!

BBespoke Labs@bespokelabsai · Apr 23

Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀 Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires…

2.0K

Fangcong Yin Retweeted

Manya Wadhwa@ManyaWadhwa1 · Apr 22

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

126

18.0K

Fangcong Yin@fangcong_y10593 · Jan 14

Your long context model might be good at understanding long inputs, but can they generate long outputs? Check out our new benchmark on Long Procedural Generation!

XXi Ye@xiye_nlp · Jan 14

672

Fangcong Yin@fangcong_y10593 · Jan 6

Interesting perspective, thanks for sharing! As one of the authors of the “CoT mainly helps on math/logic paper”, I agree with a lot of this, especially the connection to generator/validator gaps. One of our aims going into this project was to find datasets beyond math/logic…

JJason Wei@_jasonwei · Jan 3

An underrated but occasionally make-or-break skill in AI research (that didn’t really exist ten years ago) is the ability to find a dataset that actually exercises a new method you are working on. Back in the day when the bottleneck in AI was learning, many methods were…

5.0K

Fangcong Yin Retweeted

Zhiyu Zoey Chen@ZhiyuChen4 · Dec 14

I'm shocked to see racism happening in academia again, at the best AI conference @NeurIPSConf. Targeting specific ethnic groups to describe misconduct is inappropriate and unacceptable. @NeurIPSConf must take a stand. We call on Rosalind Picard @MIT @medialab to retract and…

134

275

2.0K

282

414.0K