The R Formula Cheatsheet

computing
Author

Francis J. DiTraglia

Published

April 19, 2023

R’s formula syntax is extremely powerful but can be confusing for beginners.1 This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, ., -, 1, :, *, ^, and I(). You may never use some of these in practice, but it’s nice to know that they exist. It was many years before I realized that I could simply type y ~ x * z instead of the lengthier y ~ x + z + x:z, for example. While R formulas crop up in a variety of places, they are probably most familiar as the first argument of lm(). For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict y using a number of regressors x, z, and w.

Symbol Purpose Example In Words
~ separate LHS and RHS of formula y ~ x regress y on x
+ add variable to a formula y ~ x + z regress y on x and z
. denotes “everything else” y ~ . regress y on all other variables in a data frame
- remove variable from a formula y ~ . - x regress y on all other variables except x
1 denotes intercept y ~ x - 1 regress y on x without an intercept
: construct interaction term y ~ x + z + x:z regress y on x, z, and the product x times z
* shorthand for levels plus interaction y ~ x * z regress y on x, z, and the product x times z
^ higher order interactions y ~ (x + z + w)^3 regress y on x, z, w, all two-way interactions, and the three-way interactions
I() “as-is” - override special meanings of other symbols from this table y ~ x + I(x^2) regress y on x and x squared

Footnotes

  1. Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎