Elvis Dohmatob

@dohmatobelvis

Professor of CSC at @Concordia (CRC chair) & @Mila_Quebec. Visiting prof @AIatMeta. Previously @AIatMeta, @criteo, @inria. Interested in the principles of ML.

Joined May 2012

531Following

3KFollowers

Pinned

Elvis Dohmatob@dohmatobelvis · Mar 17

Papers accepted at @iclr_conf 2025: - An Effective Theory of Bias Amplification arxiv.org/abs/2410.17263 - Pitfalls of Memorization arxiv.org/abs/2412.07684 - Strong Model Collapse arxiv.org/abs/2410.04840 - Beyond Model Collapse arxiv.org/abs/2406.07515 With @KempeLab,…

173

114

23.0K

Pinned

Elvis Dohmatob@dohmatobelvis · Apr 25

I'll also be giving an invited talk at the associative memory workshop tomorrow Saturday at 9:30 nfam.vizhub.ai/schedule/, on provable scaling laws for a toy LLM mirrored around associative memories

EElvis Dohmatob@dohmatobelvis · Apr 24

If you're attending #ICLR25 @iclr_conf, consider stopping by to chat about some of our recent works (listed in the comments) On a personal note, I'm also looking forward to chatting with prospective PhD students!

876

Elvis Dohmatob@dohmatobelvis · Jul 20

Simple identities with deep consequences: (1) ReLU(x) - ReLU(-x) = x for all x in R. (2) sum_{i=1}^j i*(-1)^{j-i} = floor((j+1)/2). Using these, it can be shown that that k-sparse parity function is exactly representable by a 2-layer ReLU network of width k+3 = O(k).

951

Elvis Dohmatob@dohmatobelvis · Jul 8

Mathematical foundations of neural scaling laws: Here are slides for lectures I gave at the the recent MLSS 2025 summer school in Dakar. Every LLM theorist and practitioner should know these things! The summer school: mlss-senegal.github.io My slides: drive.google.com/drive/folders/…

2.0K

Elvis Dohmatob Retweeted

Mathieu Blondel@mblondel_ml · Jul 1

Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here github.com/diffprog/slide…

5.0K

Elvis Dohmatob Retweeted

Eugene Ndiaye@eugene_ndiaye · Jun 26

@dohmatobelvis explaining the math behind scaling laws #MlssSenegal2025

2.0K

Elvis Dohmatob Retweeted

Eugene Ndiaye@eugene_ndiaye · Jun 25

#MlssSenegal2025 Full Schedule 🤓😎

4.0K

Elvis Dohmatob Retweeted

Simon Willison@simonw · Jun 16

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

539

2.0K

597.0K

Elvis Dohmatob@dohmatobelvis · May 20

Stein's Lemma: E[f'(x)] = E[f(x)x] for x ~ N(0,1). Corollary (Gaussian integration by parts): E[g'(x)h(x)] = E[g(x)(xh(x) - h'(x))]. Proof. Take f(x):=g(x)*h(x) and observe that f'(x)=g(x)h'(x)+g'(x)h(x). We deduce: E[g(x)h'(x)+g'(x)h(x)]=E[g(x)h(x)x] and the result follows. QED

1.0K

Elvis Dohmatob Retweeted

Yizhou Liu@YizhouLiu0 · May 16

Superposition means that models represent more features than dimensions they have, which is true for LLMs since there are too many things to represent in language. We find that superposition leads to a power-law loss with width, leading to the observed neural scaling law. (1/n)

542

483

54.0K

Elvis Dohmatob@dohmatobelvis · May 3

[Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/

dohmatobelvis's tweet image. [Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/

2.0K

Elvis Dohmatob@dohmatobelvis · May 3

Our work on beating scaling laws via deliberate practice, a modified synthetic data-generation scheme wherein harder / more entropic examples are favored, has been accepted at @icmlconf #ICML2025.

RReyhane Askari@ReyhaneAskari · May 1

Deliberate practice is accepted to #ICML2025 as a spotlight (top 2.6%!) 🚀

4.0K

Elvis Dohmatob@dohmatobelvis · Apr 24

1) New preprint! arxiv.org/abs/2504.10754 Most of ML theory (i.e fine-grained analysis to explain / reveal interesting phenomenon) can be automated, e.g via free probability theory. In our recent work with Arjun Subramonian, we provide a small lightweight tool to do just this

5.0K

Elvis Dohmatob Retweeted

ICLR 2026@iclr_conf · Apr 12

41.0K

Elvis Dohmatob@dohmatobelvis · Apr 12

We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work, predominantly AI-generated content (ya, the authors plugged our paper into an LLM and generated another paper), IRB violations, etc. Revealed during a long…

AAndreas Kirsch 🇺🇦@BlackHC · Apr 11

Jesus Christ... openreview.net/forum?id=et5l9…

382

272

326.0K

Elvis Dohmatob@dohmatobelvis · Mar 30

Job Alert: Are you thinking of doing a PhD in ML, with a theoretical/algorithmic flavor? I'm hiring talented and passionate students to work with me on: ML theory; Neural scaling laws; Synthetic data (the good, the bad and the ugly); Explainable and trustworth AI (adversarial…

11.0K