Elvis Dohmatob
@dohmatobelvis
Professor of CSC at @Concordia (CRC chair) & @Mila_Quebec. Visiting prof @AIatMeta. Previously @AIatMeta, @criteo, @inria. Interested in the principles of ML.
Papers accepted at @iclr_conf 2025: - An Effective Theory of Bias Amplification arxiv.org/abs/2410.17263 - Pitfalls of Memorization arxiv.org/abs/2412.07684 - Strong Model Collapse arxiv.org/abs/2410.04840 - Beyond Model Collapse arxiv.org/abs/2406.07515 With @KempeLab,…
I'll also be giving an invited talk at the associative memory workshop tomorrow Saturday at 9:30 nfam.vizhub.ai/schedule/, on provable scaling laws for a toy LLM mirrored around associative memories
If you're attending #ICLR25 @iclr_conf, consider stopping by to chat about some of our recent works (listed in the comments) On a personal note, I'm also looking forward to chatting with prospective PhD students!
Simple identities with deep consequences: (1) ReLU(x) - ReLU(-x) = x for all x in R. (2) sum_{i=1}^j i*(-1)^{j-i} = floor((j+1)/2). Using these, it can be shown that that k-sparse parity function is exactly representable by a 2-layer ReLU network of width k+3 = O(k).
Mathematical foundations of neural scaling laws: Here are slides for lectures I gave at the the recent MLSS 2025 summer school in Dakar. Every LLM theorist and practitioner should know these things! The summer school: mlss-senegal.github.io My slides: drive.google.com/drive/folders/…
Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here github.com/diffprog/slide…
@dohmatobelvis explaining the math behind scaling laws #MlssSenegal2025
If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!
Stein's Lemma: E[f'(x)] = E[f(x)x] for x ~ N(0,1). Corollary (Gaussian integration by parts): E[g'(x)h(x)] = E[g(x)(xh(x) - h'(x))]. Proof. Take f(x):=g(x)*h(x) and observe that f'(x)=g(x)h'(x)+g'(x)h(x). We deduce: E[g(x)h'(x)+g'(x)h(x)]=E[g(x)h(x)x] and the result follows. QED
Superposition means that models represent more features than dimensions they have, which is true for LLMs since there are too many things to represent in language. We find that superposition leads to a power-law loss with width, leading to the observed neural scaling law. (1/n)
[Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/
![dohmatobelvis's tweet image. [Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/](https://pbs.twimg.com/media/GqBeqQtXsAAgJqH.jpg)
Our work on beating scaling laws via deliberate practice, a modified synthetic data-generation scheme wherein harder / more entropic examples are favored, has been accepted at @icmlconf #ICML2025.
Deliberate practice is accepted to #ICML2025 as a spotlight (top 2.6%!) 🚀
1) New preprint! arxiv.org/abs/2504.10754 Most of ML theory (i.e fine-grained analysis to explain / reveal interesting phenomenon) can be automated, e.g via free probability theory. In our recent work with Arjun Subramonian, we provide a small lightweight tool to do just this
We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work, predominantly AI-generated content (ya, the authors plugged our paper into an LLM and generated another paper), IRB violations, etc. Revealed during a long…
Jesus Christ... openreview.net/forum?id=et5l9…
Job Alert: Are you thinking of doing a PhD in ML, with a theoretical/algorithmic flavor? I'm hiring talented and passionate students to work with me on: ML theory; Neural scaling laws; Synthetic data (the good, the bad and the ugly); Explainable and trustworth AI (adversarial…