Statistics

The R Formula Cheatsheet

R’s formula syntax is extremely powerful but can be confusing for beginners.1 This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, .

Random Variables Cheatsheet

To do well in an econometrics or statistics course at any level, you need to have a large number of simple properties of random variables at your fingertips. Some years back I made a handout containing the most important properties for my undergraduate students at the University of Pennsylvania.

A New Way of Looking at Least Squares

Have you got a ruler handy? Fantastic! Then hold out your right hand, extend your thumb and little finger as far as they’ll go, and measure the distance in centimeters, rounding to the nearest half centimeter.

The Wilson Confidence Interval for a Proportion

This is the second in a series of posts about how to construct a confidence interval for a proportion. (Simple problems sometimes turn out to be surprisingly complicated in practice!

Lessons from the Oxford Vaccination Survey

Back in November a colleague pointed me to a website describing the recent COVID-19 Student Vaccination Survey carried out by my employer, the University of Oxford. At the time I briefly tweeted my concerns at the University: Sorry @UniofOxford, but this is wildly misleading.

Don't Use the Textbook CI for a Proportion

In a previous post I showed an example in which the “textbook” confidence interval for a proportion performs poorly despite a fairly large sample size. My aim in that post was to convince you that the oft-repeated advice concerning \(n > 30\) and the central limit theorem is worthless.

Thirty isn't the magic number

The simplest version of the central limit theorem (CLT) says that if \(X_1, \dots, X_n\) are iid random variables with mean \(\mu\) and finite variance \(\sigma^2\) \[ \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \rightarrow_d N(0,1) \] where \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\).