# Street Fighting Numerical Analysis - Part 1

Computing is a crucial part of modern applied and theoretical econometrics but most economists, myself included, have little if any formal training numerical analysis and computer science.
This means that we often learn things *the hard way*: by making boneheaded mistakes and spending hours browsing stackoverflow to try to figure out what went wrong.
In preparation for my upcoming course on Empirical Research Methods^{1} I’ve started trying to collect and organize the various nuggets of computational wisdom that I’ve picked up over the years. This post is the first of several that I plan to write on that theme.
Its origin is an enigmatic bug that I detected in a seemingly trivial line of my R code involving `rep()`

.

# Is R broken?

For no particular reason, let’s use the R function `rep()`

to print out the string `"econometrics.blog"`

four times:

`rep("econometrics.blog", times = 4)`

```
## [1] "econometrics.blog" "econometrics.blog" "econometrics.blog"
## [4] "econometrics.blog"
```

Since 0.2 multiplied by 20 equals 4, it comes no surprise that replacing `times = 4`

with `times = 0.2 * 20`

gives the same result:

`rep("econometrics.blog", times = 0.2 * 20)`

```
## [1] "econometrics.blog" "econometrics.blog" "econometrics.blog"
## [4] "econometrics.blog"
```

Now let’s try `times = (1 - 0.8) * 20`

instead. Since 0.2 equals (1 - 0.8) this couldn’t possibly make a difference, could it? Distressingly, it does: we obtain only *three* copies of `"econometrics.blog"`

`rep("econometrics.blog", times = (1 - 0.8) * 20)`

`## [1] "econometrics.blog" "econometrics.blog" "econometrics.blog"`

What on earth is going on here? Has R made some kind of mistake? Let’s try a sanity check. First we’ll calculate `(1 - 0.8) * 20`

and call it `x`

. Then we’ll check that `x`

really does equal four:

```
x <- (1 - 0.8) * 20
x
```

`## [1] 4`

What a relief: surely setting `times = x`

should give us four copies of `"econometrics.blog"`

. Alas, it does not:

`rep('econometrics.blog', times = x) `

`## [1] "econometrics.blog" "econometrics.blog" "econometrics.blog"`

Clearly using open-source software like R is a bad idea and I should switch to STATA.^{2}

# Numeric Types in R

Because R is a dynamically-typed programming language, we can almost always ignore the question of precisely how it stores numeric values “under the hood.” In fact R has *two* numeric types: integer and double. Integers are rare in practice. The operator `:`

returns an integer vector

```
y <- 1:5
typeof(y)
```

`## [1] "integer"`

and the length of a vector is always an integer

```
n <- length(y)
typeof(y)
```

`## [1] "integer"`

but nearly every other numeric value you encounter in R will be stored as a double, i.e. a double precision floating point number:

`typeof(4)`

`## [1] "double"`

`typeof(4.0)`

`## [1] "double"`

`typeof(cos(0))`

`## [1] "double"`

To *force* R to store a value as an integer rather than double, we can either append an `L`

```
z <- 4L
typeof(z)
```

`## [1] "integer"`

or *coerce*, i.e. convert, a double to an integer using `as.integer()`

```
a <- 4
typeof(a)
```

`## [1] "double"`

```
b <- as.integer(a)
typeof(b)
```

`## [1] "integer"`

The trade-off between integers and doubles is between precision and range. Calculations carried out with integers are always *exact*, but integers can only be used to represent a fairly limited number of values. Calculations with doubles, on the other hand, are not *always* exact, but doubles can store a much larger range of values, including decimals.

This post isn’t the right place to delve into the details of floating point numbers, of which doubles are an instance, but there are two points worth emphasizing. First, it’s generally safe to store a value that is “truly” an integer, e.g. `4`

, as double. As explained in the help file `integer {base}`

current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.

This explains why you may never have encountered the `L`

suffix in the wild. Because doubles can represent very large integers *exactly*, calculations with whole numbers stored as doubles will also be exact. Notice that R automatically converts integers that are “too big” into doubles:

```
x <- 999999999L
typeof(x)
```

`## [1] "integer"`

```
y <- 9999999999L
typeof(y)
```

`## [1] "double"`

While converting integers to doubles in innocuous, you need to be *careful* when converting doubles to integers. This turns out to be the root of our problem with `rep()`

from above. Both `0.2 * 20`

and `(1 - 0.8) * 20`

are doubles, and both *appear* to equal `4`

```
x <- 0.2 * 20
x
```

`## [1] 4`

`typeof(x)`

`## [1] "double"`

```
y <- (1 - 0.8) * 20
y
```

`## [1] 4`

But `x`

and `y`

are coerced to *different* integer values:

`as.integer(x)`

`## [1] 4`

`as.integer(y)`

`## [1] 3`

The function `rep()`

expects its second argument `times`

to be an integer. If we supply a double instead, then `rep()`

makes a conversion in the same way as the function `as.integer()`

, namely by *truncating*. Far down in the help file for `rep()`

we find this crucial caveat:

Non-integer values of

`times`

will be truncated towards zero. If times is a computed quantity it is prudent to add a small fuzz or use`round()`

.

But wait: aren’t `x`

and `y`

*precisely equal* to each other? How can one truncate to `4`

while the other truncates to `3`

? As it turns out, appearances can be deceiving:

`identical(x, y)`

`## [1] FALSE`

To find out *why* these values aren’t equal, we need to learn a bit more about how computers approximate real numbers using doubles.

# What you see isn’t always what you get.

R has various handy built-in constants, including \(\pi\)

`pi`

`## [1] 3.141593`

Notwithstanding bill number 246 of the 1897 sitting of the Indiana General Assembly, \(\pi\) is an irrational number. By default, however, R only shows us a small number of its digits. To display twenty digits of \(\pi\), we can specify the argument `digits`

to the function `print()`

like so

`print(pi, digits = 20)`

`## [1] 3.141592653589793116`

To see *even more* digits, we can use the function `sprintf()`

. Let’s try to display 60 digits of \(\pi\):

`sprintf("%.60f", pi)`

`## [1] "3.141592653589793115997963468544185161590576171875000000000000"`

Why do the last twelve decimal points display as zero? The answer is that computers cannot represent real numbers to infinite precision. At some point, the remaining digits of \(\pi\) get chopped off, and we’re left with an approximation that’s more than sufficient for any practical application.

At first glance, the number 0.8 seems nothing like \(\pi\). It is, after all, a rational number: 4/5. But `sprintf()`

reveals that there’s more here than meets the eye:

`sprintf("%.54f", 0.8)`

`## [1] "0.800000000000000044408920985006261616945266723632812500"`

The fraction 4/5 cannot be represented exactly as a double; it can only be *approximated*. The same is true of 1/5. Because `0.2`

and `0.8`

can only be represented approximately, `1 - 0.8`

and `0.2`

turn out *not* to be equal from the computer’s perspective:

`identical(1 - 0.8, 0.2)`

`## [1] FALSE`

This in turn explains why `(1 - 0.8) * 20`

and `0.2 * 20`

truncate to different integer values:

`sprintf("%.54f", 20 * c(1 - 0.8, 0.2))`

```
## [1] "3.999999999999999111821580299874767661094665527343750000"
## [2] "4.000000000000000000000000000000000000000000000000000000"
```

# Which decimals have an exact representation?

The fraction `1/3`

lacks a finite decimal expansion. You might guess that R would simply store `1/3`

as a zero, followed by a decimal, followed by a large number of `3`

s. But in fact it does not:

`sprintf("%.54f", 1/3)`

`## [1] "0.333333333333333314829616256247390992939472198486328125"`

Every digit from the `1`

onward is *wrong*. The fraction `1/10`

, on the other hand, clearly *does* have a finite decimal expansion, `0.1`

, but R gets this one wrong as well:

`sprintf("%.54f", 1/10)`

`## [1] "0.100000000000000005551115123125782702118158340454101562"`

At the same time, it handles `1/4`

with perfect accuracy:

`sprintf("%.54f", 1/4)`

`## [1] "0.250000000000000000000000000000000000000000000000000000"`

What’s going on? Here’s a clue: the fraction `1/32`

can also be represented exactly as a double. See if you can figure out why before reading further.

The reason why `1/3`

lacks a terminating decimal expansion is that it can’t be written as a counting number divided by a power of ten. In other words, we can’t find values \(m, n \in \mathbb{N}\) such that \(m/10^n\) equals \(1/3\). In contrast, `3/4`

has a finite decimal expansion because it equals `75/100`

, corresponding to \(m = 75\) and \(n = 2\). Of course I’ve left off a crucial qualification: wherever I wrote “finite decimal expansion” above, I should have added “in base 10.” *The same number could have a terminating decimal expansion in one base and not in another.*

Although R displays numbers on the screen in base 10, it represents and computes with them in binary. So the question becomes: which values have a terminating decimal expansion in base 2? To find out, simply replace `10`

with `2`

in the expression from above. A rational number has a terminating decimal expansion base 2 if it can be written as \(m/2^n\) for some \(m, n \in \mathbb{N}\). Since `1/4`

equals \(1/2^2\), it has an exact representation. Since `3/4`

can be written as `3/2^2`

it *also* has an exact representation. In contrast, `1/5`

lacks an exact representation because there are no natural numbers \(m,n\) such that \(5 = 2^n/m\). We can get *close* by making \(n\) large and choosing \(m\) carefully, but we can never satisfy this equation exactly.

# Take-home Lesson

High-level programming languages like R and Python are extremely convenient: they allow us to focus on the big picture rather than writing line after line of boilerplate code.
But we should never forget that computers *cannot* represent all numeric values with perfect accuracy. Sometimes this matters. In R, coercing from integer to double is safe but the reverse can be risky, as we have gleaned from a deceptively simple example involving `rep()`

. To learn more about the subtleties of R, I highly recommend The R Inferno by Patrick Burns and Advanced R by Hadley Wickham. Despite what you may have heard, R is quite a capable language, but it does have some quirks!