A simpler way of understanding (and teaching!) basic statistics

Last week, I published a cheat sheet and a post on how most common statistical tests are simple linear models. This started out as a hobby project last summer, but a few weeks ago, I realized that this was actually really important. So I spent many evenings polishing, and with my heart pounding, I tweeted:

I've made this cheat sheet and I think it's important. Most stats 101 tests are simple linear models – including "non-parametric" tests. It's so simple we should only teach regression. Avoid confusing students with a zoo of named tests. https://t.co/9PFR1ly3lW 1/n
— Jonas K. Lindeløv (@jonaslindeloev) March 27, 2019

It got a great reception and gathered more than half a million views on twitter within the first day.

On the bright side, this shows that people care about understanding statistics, and communicating it effectively. On the flip side, it may also reflect the fact that too many statistics courses consist of the rote-learning-rules-of-thumb-and-decision-trees which I seek to combat.

I was particularly excited to see support from notable scholars in statistics, including Russel Poldrack, Andy Field, and many others. However, my personal peak was when Andrew Gelman wrote a post about it! Or as my colleague put it:

@jonaslindeloev reaching demi-god state: Gelman posting about the very useful "all tests are regression" blogpost: https://t.co/jPqRZDjwwq
— Riccardo Fusaroli (@fusaroli) March 28, 2019

I have also been extremely pleased that the community has joined in to improve it even further via the GitHub repository. It has been refined a great deal over the last week as a result. Follow the repository on GitHub if you want to stay up to date. Even better, raise an issue or submit a pull request. That would make me so happy!

Future guides

Much of what I demonstrate in that post has been known, published, and taught here and there for quite a while. I think that my main contribution was to lower the bar for understanding it, believing it, and teaching it.

Naturally, the steady stream of likes and retweets has conditioned me to try more of this. Here’s the plan for future notebooks in the expected order of publication:

Update my notebook on Bayes Factors and put it on GitHub.
Finish a notebook on Utility Theory in time for the Bayes@Lund conference, where I will be presenting it.
Do a new notebook on Repeated Measures as mixed models, including RM-ANOVA, Split-plot ANOVA, McNemar, and Friedman.
In some way extending the post/cheat sheet (or making a new one) on how three statistical assumptions play out in all of these models.

A book?

I am also contemplating writing a book. There are a lot of good books out there already, and I don’t intend to compete with them. Rather, the book I have in mind should be mind-blowingly short and applied, covering 90% of a traditional textbook, including model checks and Bayesian inference, in 1/20th of the space.

A third of those pages would have to be code examples and paper-and-pencil tasks so that it’s easily transferable to the real-world problems people encounter.

Other than everything-is-linear approach, there are a few other general-purpose tricks that can be pulled off to radically simplify statistical modeling. Having this all in a condensed yet accessible format would make it easier for the reader to get the larger picture.