Yangjun Ruan

@YangjunR

Visiting @stanfordAILab | ML Ph.D. student @UofT & @VectorInst

Palo Alto, CA

Joined February 2021

703Following

967Followers

Pinned

Yangjun Ruan@YangjunR · Mar 26

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

YangjunR's tweet image. New paper on synthetic pretraining!

We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”.
arxiv.org/abs/2503.18866

Here’s how it works🧵

105

480

385

48.0K

Pinned

Yangjun Ruan@YangjunR · Mar 30

Reasoning to Learn from Latent Thoughts Author's Explanation: x.com/YangjunR/statu… Overview: This paper enhances LLM pretraining data efficiency under data constraints by inferring latent thoughts underlying web text, significantly improving math performance from 5.7% to…

YYangjun Ruan@YangjunR · Mar 26

2.0K

Yangjun Ruan Retweeted

LLM Evals Workshop @NeurIPS@LLM_eval · Jul 22

We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3

12.0K

Yangjun Ruan Retweeted

Chris J. Maddison@cjmaddison · Jul 16

What makes a great scientist? Most AI scientist benchmarks miss the key skill: designing and analyzing experiments. 🧪 We're introducing SciGym: the first simulated lab environment to benchmark #LLM on experimental design and analysis capabilities. #AI4SCIENCE #ICML25

142

14.0K

Yangjun Ruan Retweeted

Mira Murati@miramurati · Jul 15

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…

485

589

6.0K

2.0K

1.8M

Yangjun Ruan@YangjunR · Jul 11

Some real metrics in the wild.

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

620

Yangjun Ruan Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

169

595

202

138.0K

Yangjun Ruan Retweeted

Anthropic@AnthropicAI · Jun 27

New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went.

251

1.0K

12.0K

6.0K

2.5M

Yangjun Ruan Retweeted

Lilian Weng@lilianweng · May 17

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

434

3.0K

2.0K

213.0K

Yangjun Ruan Retweeted

Aran Komatsuzaki@arankomatsuzaki · May 14

Putting It All into Context: Simplifying Agents with LCLMs Putting all the core code in the context often leads to better performance on SWE-bench than using agent scaffolding

150

111

26.0K

Yangjun Ruan Retweeted

John Yang@jyangballin · May 7

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

133

653

379

97.0K

Yangjun Ruan@YangjunR · Apr 25

I have been very impressed by how much more efforts Tristan has spent on extensive validation of his perplexity correlation idea, even *after* the paper has been accepted. Check out this simple idea that works!

TTristan Thrush@TristanThrush · Apr 24

At #ICLR, check out Perplexity Correlations: a statistical framework to select the best pretraining data with no LLM training! I can’t make the trip, but @tatsu_hashimoto will present the poster for us! Continue reading for the latest empirical validations of PPL Correlations:

1.0K

Yangjun Ruan@YangjunR · Apr 16

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

OOpenAI@OpenAI · Apr 16

OpenAI o3 and o4-mini openai.com/live/

434

1.0K

12.0K

6.0K

3.8M

Yangjun Ruan Retweeted

Karan Dalal@karansdalal · Apr 7

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…

187

940

6.0K

3.0K

1.4M

Yangjun Ruan Retweeted

Rohan Paul@rohanpaul_ai · Apr 5

Language models learn inefficiently from compressed web text, requiring excessive data. This paper augments pretraining data with inferred "latent thoughts" (reasoning, context) underlying the text, improving data efficiency. Training on text paired with synthetic thoughts…

279

184

15.0K

Yangjun Ruan Retweeted

Ken Liu@kenziyuliu · Apr 3

An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵

326

197

87.0K

Yangjun Ruan Retweeted

The AI Timeline@TheAITimeline · Mar 30

🚨This week's top AI/ML research papers: - GPT-4o System Card: Native Image Generation - Anthropic's On the Biology of a LLM - Gemma 3 Technical Report - Qwen2.5-Omni Technical Report - Reasoning to Learn from Latent Thoughts - Defeating Prompt Injections by Design - Scaling…

141

1.0K

651

102.0K