Andrew Ilyas
@andrew_ilyas
Stein Fellow @ Stanford Stats (current) | Assistant Prof @ CMU (incoming) | PhD @ MIT (prev)
“How will my model behave if I change the training data?” Recent(-ish) work w/ @logan_engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).

🚨Past work shows: dropping just 0.1% of the data can change the conclusions of important studies. We show: Many approximations can fail to catch this. 📢Check out our new TMLR paper (w/ David Burt, @ShenRaphael , Tin Nguyen, and @ta_broderick ) 👇 openreview.net/forum?id=m6EQ6…
This work has been a long time coming and I'm so grateful to my collaborators for helping make this work possible. Main takeaway: AI supply chains matter! We've seen them emerge (rapidly) in the past few years and they will have implications on *all* of us, inside and outside…
Building AI systems is now a fragmented process spanning multiple organizations & entities. In new work (w/ @aspenkhopkins @cen_sarah @andrew_ilyas @imstruckman @LVidegaray), we study the implications of these emerging networks → what we call *AI supply chains* 🧵
(1/2) Successfully defended my thesis today! I was incredibly fortunate to work with my amazing supervisors Yaoliang Yu, Sun Sun, and @thegautamkamath. Huge thanks also to my committee members @elliot_creager, @hongyangzh, and @SebastienGambs.
📢Sign up to attend and present a poster/spotlight talk at the Summer Workshop on Collaborative Learning and Data Sharing! 🌟 Hear from speakers with experience spanning ML, theory, databases, law, policy, and more, including Yiling Chen, @raulcfernandez, Niva Elkin-Koren,…
Ever wondered which data from large datasets (like OXE) actually helps when training/tuning a policy for specific tasks? We present DataMIL, a framework for measuring how each training sample influences policy performance, hence enabling effective data selection 🧵
Building AI systems is now a fragmented process spanning multiple organizations & entities. In new work (w/ @aspenkhopkins @cen_sarah @andrew_ilyas @imstruckman @LVidegaray), we study the implications of these emerging networks → what we call *AI supply chains* 🧵
Was really fun working on this project, and not just because of the results: huge shoutout to @logan_engstrom (who is graduating soon 👀) and our amazing **undergraduate student** Ben Chen!
Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ @andrew_ilyas Ben Chen @axel_s_feldmann @wsmoses @aleks_madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)
Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ @andrew_ilyas Ben Chen @axel_s_feldmann @wsmoses @aleks_madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)