The R Formula Cheatsheet
R’s formula syntax is extremely powerful but can be confusing for beginners.1 This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, ., -, 1, :, *, ^, and I(). You may never use some of these in practice, but it’s nice to know that they exist. It was many years before I realized that I could simply type y ~ x * z instead of the lengthier y ~ x + z + x:z, for example. While R formulas crop up in a variety of places, they are probably most familiar as the first argument of lm(). For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict y using a number of regressors x, z, and w.
| Symbol | Purpose | Example | In Words |
|---|---|---|---|
~ |
separate LHS and RHS of formula | y ~ x |
regress y on x |
+ |
add variable to a formula | y ~ x + z |
regress y on x and z |
. |
denotes “everything else” | y ~ . |
regress y on all other variables in a data frame |
- |
remove variable from a formula | y ~ . - x |
regress y on all other variables except x |
1 |
denotes intercept | y ~ x - 1 |
regress y on x without an intercept |
: |
construct interaction term | y ~ x + z + x:z |
regress y on x, z, and the product x times z |
* |
shorthand for levels plus interaction | y ~ x * z |
regress y on x, z, and the product x times z |
^ |
higher order interactions | y ~ (x + z + w)^3 |
regress y on x, z, w, all two-way interactions, and the three-way interactions |
I() |
“as-is” - override special meanings of other symbols from this table | y ~ x + I(x^2) |
regress y on x and x squared |
Footnotes
Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎