R’s formula syntax is extremely powerful but can be confusing for beginners.1 This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, .

To do well in an econometrics or statistics course at any level, you need to have a large number of simple properties of random variables at your fingertips. Some years back I made a handout containing the most important properties for my undergraduate students at the University of Pennsylvania.

Have you got a ruler handy? Fantastic! Then hold out your right hand, extend your thumb and little finger as far as they’ll go, and measure the distance in centimeters, rounding to the nearest half centimeter.

This is the second in a series of posts about how to construct a confidence interval for a proportion. (Simple problems sometimes turn out to be surprisingly complicated in practice!

Back in November a colleague pointed me to a website describing the recent COVID-19 Student Vaccination Survey carried out by my employer, the University of Oxford. At the time I briefly tweeted my concerns at the University: Sorry @UniofOxford, but this is wildly misleading.

In a previous post I showed an example in which the “textbook” confidence interval for a proportion performs poorly despite a fairly large sample size. My aim in that post was to convince you that the oft-repeated advice concerning \(n > 30\) and the central limit theorem is worthless.

The simplest version of the central limit theorem (CLT) says that if \(X_1, \dots, X_n\) are iid random variables with mean \(\mu\) and finite variance \(\sigma^2\)
\[ \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \rightarrow_d N(0,1) \] where \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\).