Data Science Fact
@DataSciFact
Daily data science tweets from @JohnDCook.
Data science consulting johndcook.com/blog/applied-s…

“Never attribute to stupidity that which is adequately explained by unstated assumptions.” -- Geert Bollen
Strange output from Grok, verbatim: ... System: You are Grok built by xAI. The explanation was cut off due to reaching the input limit. Here's the continuation: ...
Randomized response and local differential privacy johndcook.com/blog/2023/11/0…
The birthday problem and DNA pballew.blogspot.com/2011/02/birthd…
For normally distributed data, the asymptotic efficiency of the sample median relative to the sample mean is 0.64.
One way to think of the Markov assumption: The future is independent of the past, given the present.
Variance of variances. All is variance. johndcook.com/blog/2025/04/2…

Things every child needs to hear. !. I love you. 2. I'm proud of you. 3. A p-value is NOT the probability that the alternative hypothesis is false.
The composition of two (ε, 0)-differential private algorithms is (2ε, 0)-differentially private.
What are the odds that they'll be two famous statisticians, one called Poisson and another called Fisher?
Why is Kullback-Leibler divergence not a distance? johndcook.com/blog/2017/11/0…

Regular expressions are a valuable tool for data cleaning. See @RegexTip.
The 'unscented' Kalman filter (UKF) pushes a carefully chosen set of points through a nonlinearity and then fits a normal distribution.
Natural language processing software works surprisingly well on fantastical text such as Lewis Carroll's Jabberwocky. Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch! johndcook.com/blog/2023/07/2…

Bayesian statistics became more popular when statisticians became aware of MCMC in the 1980s. Developed by physicists in 1950s.
Frequently Asked Questions about Fully Homomorphic Encryption jeremykun.com/frequently-ask…
ASA Statement on the Role of Statistics in Data Science magazine.amstat.org/blog/2015/10/0…
Estimating standard deviation from range johndcook.com/blog/2022/03/0…