The R Formula Cheatsheet

R’s formula syntax is extremely powerful but can be confusing for beginners.1 This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, ., -, 1, :, *, ^, and I(). You may never use some of these in practice, but it’s nice to know that they exist. It was many years before I realized that I could simply type y ~ x * z instead of the lengthier y ~ x + z + x:z, for example. While R formulas crop up in a variety of places, they are probably most familiar as the first argument of lm(). For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict y using a number of regressors x, z, and w.

SymbolPurposeExampleIn Words
~separate LHS and RHS of formulay ~ xregress y on x
+add variable to a formulay ~ x + zregress y on x and z
.denotes “everything else”y ~ .regress y on all other variables in a data frame
-remove variable from a formulay ~ . - xregress y on all other variables except x
1denotes intercepty ~ x - 1regress y on x without an intercept
:construct interaction termy ~ x + z + x:zregress y on x, z, and the product x times z
*shorthand for levels plus interactiony ~ x * zregress y on x, z, and the product x times z
^higher order interactionsy ~ (x + z + w)^3regress y on x, z, w, all two-way interactions, and the three-way interactions
I()“as-is” - override special meanings of other symbols from this tabley ~ x + I(x^2)regress y on x and x squared

  1. Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎

Related