Yiding Jiang
@yidingjiang
PhD student @mldcmu @SCSatCMU. Formerly intern @MetaAI, AI resident @GoogleAI. BS from @Berkeley_EECS. Trying to understand stuff.
Selecting good pretraining data is crucial, but rarely economical. Introducing ADO, an online solution to data selection with minimal overhead. 🧵 1/n

Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)
On Monday, I'll be presenting a tutorial on jailbreaking LLMs + the security of AI agents with @HamedSHassani and @aminkarbasi at ICML. I'll be in Vancouver all week -- send me a DM if you'd like to chat about jailbreaking, AI agents, robots, distillation, or anything else!
My favorite reading of the week by @yidingjiang: Next era is not about learning from data but deciding what data to learn from. yidingjiang.github.io/blog/post/expl…
Good blog on "era of exploration" - Data scarcity is the new bottleneck. LLMs consume data far faster than humans can produce it. We're running out of high-quality training data. - Pretraining solved exploration by accident. Pretraining effectively pays a massive, upfront…
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an exploration problem 🔍. This perspective has some interesting implications for where AI is heading. Wrote down some thoughts: yidingjiang.github.io/blog/post/expl…
Prequential coding is such a lovely lens for thinking about curriculum learning.
Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with @AllanZhou17 ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…
How should we order training examples? In a new blogpost (w/ @yidingjiang), we explore a compression-based perspective: order your dataset to minimize its prequential codelength.
Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with @AllanZhou17 ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…
Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with @AllanZhou17 ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
Looking beyond the next token TRELAWNEY inserts future tokens <T>...</T> during training to teach models to plan ahead—boosting reasoning, coherence, and control. Highlights: - NO ARCHITECTURE CHANGES. JUST SMARTER DATA. - works with standard decoding - enables controllable…
Excited to be presenting ADO next week at #ICLR2025! Check out a new blogpost we wrote that summarizes the key ideas and results (link below):
Selecting good pretraining data is crucial, but rarely economical. Introducing ADO, an online solution to data selection with minimal overhead. 🧵 1/n
Check out our online data selection alg ADO at ICLR 2025! And take a look at this blog post by @yidingjiang and @AllanZhou17 summarizing the key ideas: bland.website/notes/ado/
Selecting good pretraining data is crucial, but rarely economical. Introducing ADO, an online solution to data selection with minimal overhead. 🧵 1/n
Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N