Fanny Yang
@FannyYangETH
Assistant Professor @CSatETH Zurich and @ETH_AI_Center, heading the Statistical Machine Learning group. Postdoc @Stanford. PhD @Berkeley_EECS. Violinist.
For those at #ICLR working on memorization/privacy/copyright check out @JavierAbadM 's oral and poster presentations today on a simple method to prevent LLMs from completing copyrighted work.
Presenting our work at #ICLR this week! Come by the poster or oral session to chat about copyright protection and AI/LLM safety 📌 𝐏𝐨𝐬𝐭𝐞𝐫: Friday, 10 a.m. – 12.30 p.m. | Booth 537 📌 𝐎𝐫𝐚𝐥: Friday, 3.30 – 5 p.m. | Room Peridot @FraPintoML @DonhauserKonst @FannyYangETH
Very nice empirical work by @FraPintoML @yaxi_hu @AmartyaSanyal while they were in our group: a simple method that leverages public unlabeled data well to make semi-private learning more effective in practice
1/n 🧵Happy to be presenting our recent work on Semi-Private Learning at @satml_conf in #Toronto @UofT with @yaxi_hu @AmartyaSanyal @FannyYangETH. Work done at @ETH_AI_Center @MPI_IS 📜Paper: openreview.net/forum?id=Ps1IH… 🧑💻Code: github.com/FrancescoPinto…
Register now (first-come first-served) for the "Math of Trustworthy ML workshop" at #LagoMaggiore, Switzerland, Oct 12-16 this year, with a great speaker lineup and the opportunity to present your work as a poster session or contributed talk. Details @ mmlworkshop.ethz.ch

If you're at #ICLR in Singapore and interested in treatment effect estimation and causal inference, try to catch @pdebartols from our group today who has published a series of interesting work exploring how to leverage multiple data sources to enhance causal inference
Landed in Singapore for #ICLR—excited to see old & new friends! I’ll be presenting: 📌 RAMEN @ Main Conference on Saturday 10 am (@JavierAbadM @yixinwang_ @FannyYangETH) 📌 Causal Lifting @ XAI4Science Workshop on Sunday (@riccardocadeii @ilkerdemirel_ @FrancescoLocat8 )
x.com/AerniMichael/s… I'm also excited to present this paper about LLMs inadvertently leaking training data on Thursday afternoon (tomorrow!)
LLMs may be copying training data in everyday conversations with users! In our latest work, we study how often this happens compared to humans. 👇🧵
Interested in theory for high-dimensional statistical problems? The workshop "Youth in High Dimensions" in #Trieste July 7-9 has an amazing speaker lineup and offers great networking opportunities indico.ictp.it/event/10849 - co-organized w/ @MMondelli @sebastiangoldt JeanBarbier
Eager to hear feedback from anyone who applies causal inference about this recent work with this amazing group of people @pdebartols @JavierAbadM @Guanbo17 @DonhauserKonst @RayDuch and Issa Dahabreh.
Looking for a more efficient way to estimate treatment effects in your randomized experiment? We introduce H-AIPW: a novel estimator that combines predictions from multiple foundation models with real experimental data. arxiv.org/abs/2502.04262
How robustly can we predict when multiple environments are not heterogeneous enough? Invariant-based methods in the setting of "partially identifiable" robust risks rank very differently - more details from Julia Kostin @ 11am PST, East Exhibit Hall A-C #4606 #Neurips2024
Join us at LeT-All's NeurIPS Social on Thursday 7:30pm and chat with awesome mentors like @aminkarbasi , Cristóbal Guzmàn, @gavinrbrown1 , @vkontonis , @subGaussian (Ankit Pensia), @ramyavinayak , @MAliakbarpour , Erin Grant, @firebat03 (Josh Alman), @SurbhiGoel_ , Vatsal Sharan
📢 Join us at #NeurIPS2024 for an in-person LeT-All mentorship event! 📅 When: Thurs, Dec 12 | 7:30-9:30 PM PST 🔥 What: Fireside chat w/ Misha Belkin (UCSD) on Learning Theory Research in the Era of LLMs, + mentoring tables w/ amazing mentors. Don’t miss it if you’re at NeurIPS!
What's the cost of representing small subgroups when the outlier proportion may be larger? Find out from @dmitrievdaniil7 at the poster session Wed, 4:30 pm PT, #5710 #NeurIPS2024. Joint with @rares_buhai S. Tiegel, @alexxwolters, G. Novikov, @AmartyaSanyal, D. Steurer
Excited to present at #NeurIPS2024 our work on robust mixture learning! How hard is mixture learning when (a lot of) outliers are present? We show that it's easier than it seems! Join us at the poster session (Wed, 16:30 PT, West Ballroom A-D #5710).
Watch this special debate live tomorrow at 10:30 a.m. PT — part of our workshop on Unknown Futures of Generalization. Register to attend in person, access the livestream, or view the recording before it's captioned for publication: simons.berkeley.edu/web-registrati…
They even made a dramatic poster to go with it 🤣
Tomorrow morning at the @SimonsInstitute : Sparks versus embers, can LLMs solve major open mathematical conjectures? (FWIW I agree with everything in the Embers paper, so I guess the debate will be about the conclusions to draw from current evidence!) simons.berkeley.edu/talks/sebastie…
If you're interested to work on mathematical foundations for trustworthy machine learning, consider applying to this fellowship @ETH_AI_Center!
The #ETHAICenter application is now open! Interested in doing research on interdisciplinary AI topics? Join our Fellowship programs: APPLY by 19 November 2024: ai.ethz.ch/apply #PhD #PhDProgram #MachineLearning #AI #BigData #DataScience #DeepLearning #PostDoc
If you're working in the area of statistics for machine learning, consider submitting to this #neurips workshop until September 13!
📣Announcing the 2024 NeurIPS Workshop on Statistical Frontiers in LLMs and Foundation Models 📣 Submissions open now, deadline September 15th sites.google.com/berkeley.edu/b… If your work intersects with statistics and black-box models, please submit! This includes: ✅ Bias ✅…
📣Announcing the 2024 NeurIPS Workshop on Statistical Frontiers in LLMs and Foundation Models 📣 Submissions open now, deadline September 15th sites.google.com/berkeley.edu/b… If your work intersects with statistics and black-box models, please submit! This includes: ✅ Bias ✅…
Excited to be at #ICML2024 to present on Thurs morning (11:30am, Hall C4-9, Poster 2217) our work on FRAPPÉ, a generic procedure for turning any multi-task objective into a modular bilevel optimization problem, with implications for fairness, alignment, or multi-domain learning.
Very happy that this paper was accepted to #COLT2024 on the separation between private and non-private online learnability. Joint work with @dmitrievdaniil7 and Kristof Szabo. There are still few open problems that remain to fully settle this question :)
Excited about this new preprint with Daniil and @Kristo arxiv.org/abs/2402.16778 We show a Ω(log T) lower bound for Differentially Private Online learning, even for finite Littlestone classes. This shows a separation between DP and non-DP online learning in mistake bound model.
📣 Postdoctoral fellowships, up to 24 months to conduct cutting-edge research in Europe. If you would like the Rational Intelligence Lab @CISPA to host you, reach out to me directly. 🤝 🗓️ Deadline: 11 September 2024 🙏 RT please 🥺 …sklodowska-curie-actions.ec.europa.eu/news/msca-open…
If you're @aistats_conf and interested in causal inference, feel free to talk to my wonderful students @pdebartols & @JavierAbadM at the poster session today!
Come to our AISTATS poster (#96) this afternoon (5-7pm) to learn more about hidden confounding!
Michael and Jie did an amazing job on their first PhD project, by finding and fixing common pitfalls in empirical ML privacy evaluations. It turns out, if you evaluate things properly, DP-SGD is also the best *heuristic* defense when you instantiate it with large epsilon values.
Heuristic privacy defenses claim to outperform DP-SGD in real-world settings. With no guarantees, can we trust them? We find that existing evaluations can underestimate privacy leakage by orders of magnitude! Surprisingly, high-accuracy DP-SGD (ϵ >> 1000) still wins. 🧵