The R Formula Cheatsheet
R’s formula syntax is extremely powerful but can be confusing for beginners.1
This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, ., -, 1, :, *, ^, and I().
You may never use some of these in practice, but it’s nice to know that they exist.
It was many years before I realized that I could simply type y ~ x * z instead of the lengthier y ~ x + z + x:z, for example.
While R formulas crop up in a variety of places, they are probably most familiar as the first argument of lm().
For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict y using a number of regressors x, z, and w.
| Symbol | Purpose | Example | In Words |
|---|---|---|---|
~ | separate LHS and RHS of formula | y ~ x | regress y on x |
+ | add variable to a formula | y ~ x + z | regress y on x and z |
. | denotes “everything else” | y ~ . | regress y on all other variables in a data frame |
- | remove variable from a formula | y ~ . - x | regress y on all other variables except x |
1 | denotes intercept | y ~ x - 1 | regress y on x without an intercept |
: | construct interaction term | y ~ x + z + x:z | regress y on x, z, and the product x times z |
* | shorthand for levels plus interaction | y ~ x * z | regress y on x, z, and the product x times z |
^ | higher order interactions | y ~ (x + z + w)^3 | regress y on x, z, w, all two-way interactions, and the three-way interactions |
I() | “as-is” - override special meanings of other symbols from this table | y ~ x + I(x^2) | regress y on x and x squared |
Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎