# The R Formula Cheatsheet

R’s formula syntax is extremely powerful but can be confusing for beginners.^{1}
This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: `~, +, ., -, 1, :, *, ^`

, and `I()`

.
You may never use some of these in practice, but it’s nice to know that they exist.
It was many years before I realized that I could simply type `y ~ x * z`

instead of the lengthier `y ~ x + z + x:z`

, for example.
While R formulas crop up in a variety of places, they are probably most familiar as the first argument of `lm()`

.
For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict `y`

using a number of regressors `x`

, `z`

, and `w`

.

Symbol | Purpose | Example | In Words |
---|---|---|---|

`~` | separate LHS and RHS of formula | `y ~ x` | regress `y` on `x` |

`+` | add variable to a formula | `y ~ x + z` | regress `y` on `x` and `z` |

`.` | denotes “everything else” | `y ~ .` | regress `y` on all other variables in a data frame |

`-` | remove variable from a formula | `y ~ . - x` | regress `y` on all other variables except `x` |

`1` | denotes intercept | `y ~ x - 1` | regress `y` on `x` without an intercept |

`:` | construct interaction term | `y ~ x + z + x:z` | regress `y` on `x` , `z` , and the product `x` times `z` |

`*` | shorthand for levels plus interaction | `y ~ x * z` | regress `y` on `x` , `z` , and the product `x` times `z` |

`^` | higher order interactions | `y ~ (x + z + w)^3` | regress `y` on `x` , `z` , `w` , all two-way interactions, and the three-way interactions |

`I()` | “as-is” - override special meanings of other symbols from this table | `y ~ x + I(x^2)` | regress `y` on `x` and `x` squared |

Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎