<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>econometrics.blog</title>
<link>https://www.econometrics.blog/</link>
<atom:link href="https://www.econometrics.blog/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Sun, 26 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Complex Step Differentiation</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/complex-step-differentiation/</link>
  <description><![CDATA[ 




<p>Sometimes we need a good approximation to the derivative <img src="https://latex.codecogs.com/png.latex?f'(x)"> of a real-valued function <img src="https://latex.codecogs.com/png.latex?f"> at some real value <img src="https://latex.codecogs.com/png.latex?x">. So here’s a fun fact that you may not know. If <img src="https://latex.codecogs.com/png.latex?%5CDelta"> is a small positive number and <img src="https://latex.codecogs.com/png.latex?f"> can be evaluated at a complex argument, then <img src="https://latex.codecogs.com/png.latex?%0Af'(x)%20%5Capprox%20%5Cfrac%7B%5Ctext%7BIm%7D%5Bf(x%20+%20%5CDelta%20i)%5D%7D%7B%5CDelta%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?i"> is the imaginary unit and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BIm%7D(z)"> denotes the imaginary part of a complex number. This unexpected but highly accurate approximation is called <strong>complex step differentiation</strong>. The method dates back to <a href="https://epubs.siam.org/doi/10.1137/0704019">Lyness and Moler (1967)</a>; <a href="https://researchrepository.wvu.edu/faculty_publications/426/">Squire and Trapp (1998)</a> give a concise modern exposition.</p>
<p>To see why it works, Taylor-expand <img src="https://latex.codecogs.com/png.latex?f(x%20+%20%5CDelta%20i)"> around <img src="https://latex.codecogs.com/png.latex?x"> yielding <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Af(x%20+%20%5CDelta%20i)%20&amp;=%20f(x)%20+%20f'(x)%5C;%20i%5CDelta%20+%20f''(x)%20%5Cfrac%7B(i%5CDelta)%5E2%7D%7B2!%7D%20+%20f'''(x)%20%5Cfrac%7B(i%5CDelta)%5E3%7D%7B3!%7D%20+%20%5Ccdots%5C%5C%0A&amp;=%20%5Cleft%5Bf(x)%20-%20f''(x)%20%5Cfrac%7B%5CDelta%5E2%7D%7B2%7D%5Cright%5D+%20%5Cleft%5Bf'(x)%5C;%20%5CDelta%20-%20f'''(x)%20%5Cfrac%7B%5CDelta%5E3%7D%7B3!%7D%5Cright%5Di%20+%20%5Ccdots%0A%5Cend%7Baligned%7D%0A"> since <img src="https://latex.codecogs.com/png.latex?i%5E2%20=%20-1"> and <img src="https://latex.codecogs.com/png.latex?i%5E3%20=%20-i">. Taking the imaginary part and dividing by <img src="https://latex.codecogs.com/png.latex?%5CDelta">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Ctext%7BIm%7D%5Bf(x%20+%20%5CDelta%20i)%5D%7D%7B%5CDelta%7D%20=%20f'(x)%20-%20f'''(x)%20%5Cfrac%7B%5CDelta%5E2%7D%7B3!%7D%20+%20%5Ccdots%0A"> so complex step differentiation approximates <img src="https://latex.codecogs.com/png.latex?f'(x)"> to order <img src="https://latex.codecogs.com/png.latex?O(%5CDelta%5E2)">.</p>
<p>Sure it’s a cute trick. But why go to the trouble of introducing complex numbers? Complex step differentiation turns out to be much better behaved from a numerical perspective than the simple difference approximation <img src="https://latex.codecogs.com/png.latex?%0Af'(x)%20%5Capprox%20%5Cfrac%7Bf(x%20+%20%5CDelta)%20-%20f(x)%7D%7B%5CDelta%7D%0A"> or the symmetric difference <img src="https://latex.codecogs.com/png.latex?%0Af'(x)%20%5Capprox%20%5Cfrac%7Bf(x%20+%20%5CDelta)%20-%20f(x%20-%20%5CDelta)%7D%7B2%5CDelta%7D.%0A"> In the remainder of this post, I’ll explain why. Before beginning, I’ll start with a brief overview of complex numbers in R. If this material is already familiar to you, skip ahead to the following section.</p>
<section id="complex-numbers-in-r" class="level2">
<h2 class="anchored" data-anchor-id="complex-numbers-in-r">Complex Numbers in R</h2>
<p>R provides basic functionality for working with complex numbers, documented in the helpful <code>?complex</code>. The imaginary unit in R is denoted by <code>1i</code>. Don’t try <code>i</code> by itself because that won’t work:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">i <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this doesn't work</span></span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error:
! object 'i' not found</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>i <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this works</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0+1i</code></pre>
</div>
</div>
<p>To write a complex number, simply use <code>+</code> to separate the real and imaginary parts, e.g.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>i</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3+4i</code></pre>
</div>
</div>
<p>A common error when learning R is to write code with “implied multiplication” e.g.&nbsp;<code>4x</code> (wrong) rather than <code>4 * x</code> (right). But don’t be tempted to try <code>4 * i</code> since, again, <code>i</code> is <em>not</em> the imaginary unit in R:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> i <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this doesn't work </span></span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error:
! object 'i' not found</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>i <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this works</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0+4i</code></pre>
</div>
</div>
<p>To demonstrate R’s basic functionality for working with complex numbers, let’s specify a complex number <img src="https://latex.codecogs.com/png.latex?z"> in Cartesian coordinates: <img src="https://latex.codecogs.com/png.latex?z%20=%203%20+%204i">. The real part is <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BRe%7D(z)%20=%203">, the imaginary part is <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BIm%7D(z)%20=%204">, the modulus is <img src="https://latex.codecogs.com/png.latex?%7Cz%7C%20=%20%5Csqrt%7B3%5E2%20+%204%5E2%7D%20=%205">, and the complex conjugate is <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bz%7D%20=%203%20-%204i">. All of these operations are available in R:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">z <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>i</span>
<span id="cb11-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Re</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Im</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Mod</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 5</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(z) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># same as Mod(z) for a complex number z</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 5</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Conj</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3-4i</code></pre>
</div>
</div>
<p>All of the basic operations <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, and <code>^</code> work on complex numbers and are vectorized</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>i</span>
<span id="cb21-2">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0+3i</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 6+5i</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -5-15i</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -1.3-0.9i</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">z<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -117+44i</code></pre>
</div>
</div>
<p>as are functions such as <code>log()</code>, <code>exp()</code>, <code>sin()</code>, <code>cos()</code>, etc. If you supply them a complex input, they will return a complex output; if you supply them a real input, they will return a real output. This behavior explains a common gotcha: <code>sqrt(-1)</code> doesn’t work because there’s no real number whose square root is <code>-1</code> but <code>sqrt(-1 + 0i)</code> does work:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this doesn't work</span></span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning in sqrt(-1): NaNs produced</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] NaN</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>i)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this works</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0+1i</code></pre>
</div>
</div>
<p>Another common gotcha is trying to apply functions like <code>max()</code> and <code>min()</code> or operators like <code>&lt;</code> and <code>&gt;</code> to complex numbers. Since <a href="https://en.wikipedia.org/wiki/Ordered_field">complex numbers aren’t ordered</a>, this doesn’t work:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(z, w))</span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error in `min()`:
! invalid 'type' (complex) of argument</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error in `z &lt; w`:
! invalid comparison with complex values</code></pre>
</div>
</div>
<p>but testing for equality / non-equality does work:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] FALSE</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
</div>
</section>
<section id="complex-step-differentiation-in-action" class="level2">
<h2 class="anchored" data-anchor-id="complex-step-differentiation-in-action">Complex Step Differentiation in Action</h2>
<p>Let’s test out complex step differentiation with a simple example. Let <img src="https://latex.codecogs.com/png.latex?f(x)%20=%20x%5E%7B9/2%7D"> and suppose we want to compute <img src="https://latex.codecogs.com/png.latex?f'(1.5)">. This one is easy to compute analytically: <img src="https://latex.codecogs.com/png.latex?%0Af'(x)%20=%20%5Cfrac%7B9%7D%7B2%7D%20x%5E%7B7/2%7D%20%5Cimplies%20f'(1.5)%20=%204.5%20%5Ctimes(1.5)%5E%7B3.5%7D%0A"> so we can check how accurate different numerical approximations turn out to be. Computing the derivative directly gives</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1">direct <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.5</span>)</span>
<span id="cb44-2">direct</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 18.60081</code></pre>
</div>
</div>
<p>Now we’ll compare this value against three numerical derivatives: the “simple difference” approach, the “symmetric difference” approach, and the complex step approach.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1">f <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) x<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.5</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the function to differentiate</span></span>
<span id="cb46-2"></span>
<span id="cb46-3">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the point where we'll evaluate f'(x)</span></span></code></pre></div></div>
</div>
<p>Since R functions are vectorized, we can evaluate the quality of each approximation over many values of <img src="https://latex.codecogs.com/png.latex?%5CDelta"> at once by setting up a vector of progressively smaller positive values:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1">delta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span>
<span id="cb47-2"></span>
<span id="cb47-3">simple <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> delta) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> delta</span>
<span id="cb47-4"></span>
<span id="cb47-5">symmetric <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> delta) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> delta)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> delta)</span>
<span id="cb47-6"></span>
<span id="cb47-7">complex_step <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Im</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> delta <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>i)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> delta</span></code></pre></div></div>
</div>
<p>Now we’ll make two tables: the first containing raw results—the approximate value of <img src="https://latex.codecogs.com/png.latex?f'(x)">—and the second containing the relative error of the approximation in percentage points. When reading the first table, recall from above that direct calculation gives <img src="https://latex.codecogs.com/png.latex?f'(1.5)%20%5Capprox%2018.60081">.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb48-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(delta, simple, symmetric, complex_step) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb48-2">  knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: right;">delta</th>
<th style="text-align: right;">simple</th>
<th style="text-align: right;">symmetric</th>
<th style="text-align: right;">complex_step</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1e-01</td>
<td style="text-align: right;">20.89450</td>
<td style="text-align: right;">18.72139</td>
<td style="text-align: right;">18.48027</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-02</td>
<td style="text-align: right;">18.81903</td>
<td style="text-align: right;">18.60202</td>
<td style="text-align: right;">18.59961</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-03</td>
<td style="text-align: right;">18.62253</td>
<td style="text-align: right;">18.60082</td>
<td style="text-align: right;">18.60080</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-04</td>
<td style="text-align: right;">18.60298</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-05</td>
<td style="text-align: right;">18.60103</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-06</td>
<td style="text-align: right;">18.60083</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-07</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-08</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-09</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-10</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-11</td>
<td style="text-align: right;">18.60077</td>
<td style="text-align: right;">18.60081</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-12</td>
<td style="text-align: right;">18.60201</td>
<td style="text-align: right;">18.60245</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-13</td>
<td style="text-align: right;">18.58069</td>
<td style="text-align: right;">18.58513</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-14</td>
<td style="text-align: right;">18.56293</td>
<td style="text-align: right;">18.56293</td>
<td style="text-align: right;">18.60081</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-15</td>
<td style="text-align: right;">20.42810</td>
<td style="text-align: right;">20.42810</td>
<td style="text-align: right;">18.60081</td>
</tr>
</tbody>
</table>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1">get_rel_error <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x, truth) <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> truth) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(truth)</span>
<span id="cb49-2"></span>
<span id="cb49-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(</span>
<span id="cb49-4">  delta,</span>
<span id="cb49-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">simple =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_rel_error</span>(simple, direct),</span>
<span id="cb49-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">symmetric =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_rel_error</span>(symmetric, direct),</span>
<span id="cb49-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">complex_step =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_rel_error</span>(complex_step, direct)</span>
<span id="cb49-8">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb49-9">  knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)))</span></code></pre></div></div>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: right;">delta</th>
<th style="text-align: right;">simple</th>
<th style="text-align: right;">symmetric</th>
<th style="text-align: right;">complex_step</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1e-01</td>
<td style="text-align: right;">12.3311</td>
<td style="text-align: right;">0.6483</td>
<td style="text-align: right;">0.6480</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-02</td>
<td style="text-align: right;">1.1732</td>
<td style="text-align: right;">0.0065</td>
<td style="text-align: right;">0.0065</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-03</td>
<td style="text-align: right;">0.1167</td>
<td style="text-align: right;">0.0001</td>
<td style="text-align: right;">0.0001</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-04</td>
<td style="text-align: right;">0.0117</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-05</td>
<td style="text-align: right;">0.0012</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-06</td>
<td style="text-align: right;">0.0001</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-07</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-08</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-09</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-10</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-11</td>
<td style="text-align: right;">0.0003</td>
<td style="text-align: right;">0.0000</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-12</td>
<td style="text-align: right;">0.0064</td>
<td style="text-align: right;">0.0088</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-13</td>
<td style="text-align: right;">0.1082</td>
<td style="text-align: right;">0.0843</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="even">
<td style="text-align: right;">1e-14</td>
<td style="text-align: right;">0.2037</td>
<td style="text-align: right;">0.2037</td>
<td style="text-align: right;">0.0000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">1e-15</td>
<td style="text-align: right;">9.8237</td>
<td style="text-align: right;">9.8237</td>
<td style="text-align: right;">0.0000</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>The simple difference approach is clearly less accurate than both the symmetric difference and complex step approach, particularly at larger values of <img src="https://latex.codecogs.com/png.latex?%5CDelta">. And while it may seem that there’s not much to choose between these latter two approximations, look carefully at what happens as <img src="https://latex.codecogs.com/png.latex?%5CDelta"> gets smaller. Eventually the relative error of the simple and symmetric difference approaches <strong>starts to increase</strong>! For example, the symmetric difference approximation is better with <img src="https://latex.codecogs.com/png.latex?%5CDelta%20=%200.1"> (relative error <img src="https://latex.codecogs.com/png.latex?%3C1%5C%25">) than it is at <img src="https://latex.codecogs.com/png.latex?%5CDelta%20=%2010%5E%7B-15%7D"> (relative error <img src="https://latex.codecogs.com/png.latex?%5Capprox%2010%5C%25">).</p>
<p>How can this be? The derivative is <em>defined</em> as the limit of the simple difference approximation as <img src="https://latex.codecogs.com/png.latex?%5CDelta%20%5Crightarrow%200">. So how can the approximation get <em>worse</em> if <img src="https://latex.codecogs.com/png.latex?%5CDelta"> becomes smaller?</p>
</section>
<section id="catastrophic-cancellation" class="level2">
<h2 class="anchored" data-anchor-id="catastrophic-cancellation">Catastrophic Cancellation</h2>
<p>The culprit is <a href="https://en.wikipedia.org/wiki/Catastrophic_cancellation">catastrophic cancellation</a>. As explained in an <a href="../../post/street-fighting-numerical-analysis-part-1/">earlier post</a>, computers cannot represent all real numbers exactly. Instead they use <a href="https://ngrok.com/blog/quantization#how-do-computers-store-numbers">floating point numbers</a> as an approximation. So when we compute a difference like <img src="https://latex.codecogs.com/png.latex?f(x%20+%20%5CDelta)%20-%20f(x)"> what we’re <em>really</em> computing is the difference of two approximations. The problem is that even when <img src="https://latex.codecogs.com/png.latex?A"> is a very good approximation to <img src="https://latex.codecogs.com/png.latex?f(x%20+%20%5CDelta)"> and <img src="https://latex.codecogs.com/png.latex?B"> is a very good approximation to <img src="https://latex.codecogs.com/png.latex?f(x)">, the difference <img src="https://latex.codecogs.com/png.latex?A%20-%20B"> may be a <em>poor approximation</em> to <img src="https://latex.codecogs.com/png.latex?f(x%20+%20%5CDelta)%20-%20f(x)">. This is an unfortunate property of subtraction, nicely illustrated in <a href="https://www.johndcook.com/blog/2025/07/20/interest-tech-note/">this post</a> by John D. Cook.</p>
<p>A double in R is accurate to around 16 decimal places. For small values of <img src="https://latex.codecogs.com/png.latex?%5CDelta">, <img src="https://latex.codecogs.com/png.latex?f(x%20+%20%5CDelta)"> and <img src="https://latex.codecogs.com/png.latex?f(x)"> agree in nearly all of those digits. Subtracting them eliminates the digits that match, leaving behind what is effectively rounding noise. Dividing by <img src="https://latex.codecogs.com/png.latex?%5CDelta"> then magnifies this noise. For example, since <img src="https://latex.codecogs.com/png.latex?f'(x)"> in our example is around <img src="https://latex.codecogs.com/png.latex?18.60081"> when we compute <img src="https://latex.codecogs.com/png.latex?f(x%20+%2010%5E%7B-15%7D)%20-%20f(x)"> we should obtain a result of approximately <img src="https://latex.codecogs.com/png.latex?1.860081%5Ctimes%2010%5E%7B-14%7D"> but instead we get</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%.20f"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "0.00000000000002042810"</code></pre>
</div>
</div>
<p>The first couple of digits are in the ballpark, but the remaining ones are just noise.<br>
The symmetric difference formula has the same problem because it too relies on a subtraction of two approximate values in the numerator:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%.20f"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-15</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "0.00000000000004085621"</code></pre>
</div>
</div>
<p>Here we’d expect a result of around <img src="https://latex.codecogs.com/png.latex?3.720162%20%5Ctimes%2010%5E%7B-14%7D"> and, again, every digit after the first two is just noise.</p>
<p>In contrast, complex step differentiation <em>has no subtraction in the numerator</em>. This means that it is not subject to catastrophic cancellation. As <img src="https://latex.codecogs.com/png.latex?%5CDelta"> becomes smaller, the approximation just keeps improving—down to roughly machine precision. The cost is that <img src="https://latex.codecogs.com/png.latex?f"> must be <a href="https://en.wikipedia.org/wiki/Analytic_function">analytic</a> at <img src="https://latex.codecogs.com/png.latex?x"> and evaluable at complex inputs. Most smooth functions you’ll encounter qualify, but those involving <img src="https://latex.codecogs.com/png.latex?%7Cx%7C">, <img src="https://latex.codecogs.com/png.latex?%5Cmax">, <img src="https://latex.codecogs.com/png.latex?%5Cmin">, or indicator functions generally don’t.</p>
</section>
<section id="dont-roll-your-own" class="level2">
<h2 class="anchored" data-anchor-id="dont-roll-your-own">Don’t Roll Your Own</h2>
<p>In this post I computed the complex step derivative by hand. That’s useful for understanding the method, but it’s a bad idea in practice. Whenever you can, you should rely on high-quality existing libraries to implement numerical methods. Numerical analysis is a deeply complicated subject and we are but lowly econometricians! Fortunately for us, the <code>grad()</code> function from the <code>numDeriv</code> R package implements complex step differentiation as one of its three methods:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(numDeriv)</span>
<span id="cb54-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grad</span>(f, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'complex'</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 18.60081</code></pre>
</div>
</div>
<p>So the next time you need to differentiate something numerically, remember the value in making things more <em>complex</em> than they need to be!</p>


</section>

 ]]></description>
  <category>computing</category>
  <guid>https://www.econometrics.blog/post/complex-step-differentiation/</guid>
  <pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>A Makeover for econometrics.blog</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/makeover-for-econometrics-blog/</link>
  <description><![CDATA[ 




<p>Today <a href="../..">econometrics.blog</a> got a long-overdue update from <a href="https://yihui.org/blogdown/">blogdown</a> to <a href="https://quarto.org">Quarto</a>. Thanks to <a href="https://www.anthropic.com/product/claude-code">Claude Code</a>, the transition was seamless: all existing links are preserved, along with the <a href="https://utteranc.es">Utterances</a> comment sections.</p>
<p>During the upgrade I also corrected an embarrassingly large number of typos across the site and added privacy-respecting site analytics using <a href="https://goatcounter.com">GoatCounter</a>. To find out whether econometrics has gone viral at long last, check out the <a href="https://econometrics.goatcounter.com">public dashboard</a>. Where possible, references to the econometrics literature now point to a paper’s <a href="https://ideas.repec.org">IDEAS/RePEc</a> entry, where you can find citation information, a link to the journal website, and any freely-available working paper versions.</p>
<p>I hope you enjoy the new look. If you notice anything that’s not working properly, please let me know by submitting a <a href="https://github.com/fditraglia/econometrics.blog/issues">GitHub issue</a>.</p>



 ]]></description>
  <category>meta</category>
  <guid>https://www.econometrics.blog/post/makeover-for-econometrics-blog/</guid>
  <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Chris Sims - RIP</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/chris-sims-rip/</link>
  <description><![CDATA[ 




<p>I was saddened to hear of Chris Sims’s passing yesterday. Although I’m not a macroeconometrician, his work has strongly influenced the way I think about econometrics. I covered his famous <a href="../../post/sims-and-uhlig-1991-replication/">helicopter tour</a> paper on this blog a while back. Some of my other favorites are unpublished notes or slides from his <a href="https://www.princeton.edu/~sims">website</a>, many of them with a philosophical bent. <a href="http://sims.princeton.edu/yftp/IV/IV.pdf">Thinking about instrumental variables</a> is a paper I read in grad school that really clarified why things can go so badly wrong in IV estimation. I read <a href="http://sims.princeton.edu/yftp/UndrstndgNnBsns/GewekeBookChpter.pdf">Understanding Non-Bayesians</a> for the first time a couple of years ago and wished I had read it sooner. It articulates a view of Bayesian econometrics that I find particularly compelling. <a href="http://sims.princeton.edu/yftp/WassermanExmpl/WassermanR4a.pdf">Robins-Wasserman, Round N</a> and <a href="http://sims.princeton.edu/yftp/SharpEmet/SharpEmet.pdf">Sharp Econometrics</a> have also shaped the approach I’m taking in some <a href="https://laurayuliu.com/research/BDML_DL/BDML.pdf">recent work with Laura Liu</a>.</p>
<p>I didn’t know Chris well, but I met him a couple of times early in my career. One of those meetings is particularly memorable. I was invited to give a seminar at Princeton on relatively short notice and was feeling apprehensive. The material that I had ready to present was preliminary and a little unusual: it included elements of my paper on <a href="http://ditraglia.com/pdf/DiTraglia-Garcia-Jimeno-2019.pdf">disciplining beliefs</a> as well as some more traditional results that eventually made their way into <a href="http://ditraglia.com/pdf/binary-regressor-final.pdf">this paper</a>.</p>
<p>When I saw that my first meeting of the day was with a Nobel laureate whom I’d never met before, I was even more nervous! But Chris was great; within a couple of minutes I felt completely comfortable talking with him and was genuinely surprised to see how interested he was in hearing my views on econometrics. I don’t mean to say that my views were particularly insightful or interesting. I think Chris just genuinely enjoyed engaging with people about econometrics and puzzling through the issues that he found interesting and important.</p>
<p>Near the beginning of my seminar, someone in the audience asked me a slightly pointed question—not rude but definitely skeptical of my approach. For whatever reason, I got flustered and flubbed my response. I couldn’t seem to articulate my point, and then bungled my response to a follow-up question as well. Just when I started to worry that my seminar was going off the rails, Chris chimed in and clarified the point I had been struggling to articulate. I remember feeling incredibly relieved, like someone had just pressed a reset button for my talk. It was just a small thing, but it meant a lot to me at a time in my career when I was feeling anything but confident about my ideas and my work.</p>



 ]]></description>
  <category>meta</category>
  <guid>https://www.econometrics.blog/post/chris-sims-rip/</guid>
  <pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Overlapping Confidence Intervals: Part II</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/overlapping-confidence-intervals-part-ii/</link>
  <description><![CDATA[ 




<p>In my earlier post on <a href="../../post/overlapping-confidence-intervals/">Overlapping Confidence Intervals</a> I asked what we can learn from the overlap, or lack thereof, between confidence intervals for two population means constructed using independent samples. To recap: if the individual confidence intervals for groups A and B <em>do not</em> overlap, there must be a statistically significant difference between the population means for the two groups. In other words, the interval for the difference of means will not include zero. If the individual intervals <em>do overlap</em>, on the other hand, anything goes. The interval for the difference of means may or may not include zero. Indeed, it’s even possible for the individual intervals for both A and B to include zero while the interval for their difference does not!</p>
<p>In <a href="../../post/overlapping-confidence-intervals/">Part I</a> we found a way to rephrase our statistical problem about overlapping confidence intervals as a familiar geometry problem involving <em>right triangles</em>. We then solved this geometry problem using the Pythagorean Theorem and Triangle Inequality. To build the connection between confidence intervals and right triangles, however, we assumed that our two estimators were <em>uncorrelated</em>. Unfortunately this assumption fails in many interesting real-world applications. Today we’ll ask what happens to our earlier conclusions about overlapping intervals if we allow for correlation.</p>
<section id="a-word-about-notation" class="level2">
<h2 class="anchored" data-anchor-id="a-word-about-notation">A Word About Notation</h2>
<p>While I phrased my first post about overlapping intervals in terms of sample and population means for two groups, the idea is general. Nothing substantive changes if we replace the parameters <img src="https://latex.codecogs.com/png.latex?(%5Cmu_A,%20%5Cmu_B)"> with <img src="https://latex.codecogs.com/png.latex?(%5Calpha,%20%5Cbeta)">, the estimators <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D,%20%5Cbar%7BB%7D)"> with <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Calpha%7D,%20%5Chat%7B%5Cbeta%7D)">, and the standard errors <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Ctext%7BSE%7D(%5Cbar%7BA%7D),%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%5Cright)"> with <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D),%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%5Cright)">. As long as the two estimators are uncorrelated and (approximately) normally distributed, the results from <a href="../../post/overlapping-confidence-intervals/">Part I</a> apply to the individual intervals for <img src="https://latex.codecogs.com/png.latex?%5Calpha"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> versus the interval for the difference <img src="https://latex.codecogs.com/png.latex?%5Calpha-%5Cbeta">. Today we will ask what happens when the estimators are potentially <em>correlated</em>. To make it clear that our results are general, I’ll use the more agnostic <img src="https://latex.codecogs.com/png.latex?(%5Calpha,%5Cbeta)"> notation throughout.</p>
</section>
<section id="a-motivating-example" class="level2">
<h2 class="anchored" data-anchor-id="a-motivating-example">A Motivating Example</h2>
<p>Consider a randomized controlled trial with <em>two active treatments</em>. In our paper on <a href="https://ditraglia.com/pdf/pawn-paper.pdf">pawn lending</a>, for example, my co-authors and I compare default rates between borrowers assigned to the <em>status quo</em> pawn contract (control), a new <em>structured</em> contract (Treatment A), and a <em>choice arm</em> (Treatment B) in which they were free to choose whichever contract they preferred. To learn the causal effect of the structured contract, we compare the mean default rates of borrowers who received Treatment A against the corresponding rate in the control group. Call this difference of means <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Calpha%7D">. Similarly, to learn the effect of choice, we make the analogous comparison between Treatment B and the control group. Call this difference of means <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D">. Now suppose you’re reading a paper that reports both of these estimators and their standard errors. To find out which treatment is more effective, you need to compare <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Calpha%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D">, but these estimators <em>must</em> be correlated because they both involve the control group average: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Chat%7B%5Calpha%7D%20&amp;=%20%5Ctext%7B(Treatment%20A%20mean)%7D%20-%20%5Ctext%7B(Control%20Group%20mean)%7D%5C%5C%0A%5Chat%7B%5Cbeta%7D%20&amp;=%20%5Ctext%7B(Treatment%20B%20mean)%7D%20-%20%5Ctext%7B(Control%20Group%20mean)%7D.%0A%5Cend%7Baligned%7D%0A"> Granted, if you had access to the raw data, you could easily solve this problem without worrying about the correlation. In the difference <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D">, the control group mean cancels out <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D%20=%20%5Ctext%7B(Treatment%20A%20mean)%7D%20-%20%5Ctext%7B(Treatment%20B%20mean)%7D.%0A"> So if we <em>had</em> the raw data for treatments A and B we’d be back to a familiar independent samples comparison of means, as in <a href="../../post/overlapping-confidence-intervals/">Part I</a>. But if you’re reading a paper that only reports <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Calpha%7D,%20%5Chat%7B%5Cbeta%7D)"> and their standard errors, you <em>cannot</em> directly calculate the standard error for the difference. The common variation from the control group mean is baked into the way both <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)"> are computed, even though this variation is <em>irrelevant</em> for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)">.</p>
</section>
<section id="allowing-correlation" class="level2">
<h2 class="anchored" data-anchor-id="allowing-correlation">Allowing Correlation</h2>
<p>So how do we compute <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)"> if the two estimators are correlated? Recall that the <em>standard error</em> is the standard deviation of that estimator’s sampling distribution, and a standard deviation is merely the square root of the corresponding variance.<sup>1</sup> Using the <a href="../../post/random-variables-cheatsheet/">properties of variance and covariance</a>, <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%20=%20%5Ctext%7BVar%7D(%5Chat%7B%5Calpha%7D)%20+%20%5Ctext%7BVar%7D(%5Chat%7B%5Cbeta%7D)%20-%202%5Ctext%7BCov%7D(%5Chat%7B%5Calpha%7D,%20%5Chat%7B%5Cbeta%7D).%0A"> Defining <img src="https://latex.codecogs.com/png.latex?%5Crho%20%5Cequiv%20%5Ctext%7BCorr%7D(%5Chat%7B%5Calpha%7D,%5Chat%7B%5Cbeta%7D)">, it follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%5E2%20=%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%5E2%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%5E2%20-%202%5Crho%20%5Ccdot%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%20%5Ccdot%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D).%0A"> If <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%200"> this reduces to our formula from <a href="../../post/overlapping-confidence-intervals/">Part I</a>: <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%5E2%20=%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%5E2%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%5E2"> so we can equate the LHS with the length of the hypotenuse of a right triangle whose legs have lengths <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)">. If <img src="https://latex.codecogs.com/png.latex?%5Crho%20%5Cneq%200">, however, this connection to the Pythagorean Theorem no longer holds. Nevertheless, there are <em>still triangles</em> hiding in this standard error formula! To reveal them, we need a more general theorem about triangles.</p>
</section>
<section id="the-law-of-cosines" class="level2">
<h2 class="anchored" data-anchor-id="the-law-of-cosines">The Law of Cosines</h2>
<p>Consider a triangle whose sides have lengths <img src="https://latex.codecogs.com/png.latex?a,b"> and <img src="https://latex.codecogs.com/png.latex?c">. Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> be the angle between the sides whose lengths are <img src="https://latex.codecogs.com/png.latex?a"> and <img src="https://latex.codecogs.com/png.latex?b">. Then by the <em>Law of Cosines</em><br>
<img src="https://latex.codecogs.com/png.latex?%0Ac%5E2%20=%20a%5E2%20+%20b%5E2%20-%202%5Ccos(%5Ctheta)%20%5Ccdot%20ab%0A"> This equality holds for <em>any triangle</em>. For a right triangle whose legs have lengths <img src="https://latex.codecogs.com/png.latex?a"> and <img src="https://latex.codecogs.com/png.latex?b">, we have <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20=%2090%C2%B0"> so the Law of Cosines reduces to the Pythagorean Theorem <img src="https://latex.codecogs.com/png.latex?%0Ac%5E2%20=%20a%5E2%20+%20b%5E2.%0A"> When <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cneq%2090%C2%B0">, the “correction term” <img src="https://latex.codecogs.com/png.latex?-2%5Ccos(%5Ctheta)%5Ccdot%20ab"> shows how the length of <img src="https://latex.codecogs.com/png.latex?c"> differs from that of a right triangle with legs of lengths <img src="https://latex.codecogs.com/png.latex?a"> and <img src="https://latex.codecogs.com/png.latex?b">. When <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%3C%2090%C2%B0"> the cosine is positive so the correction term <em>shortens</em> <img src="https://latex.codecogs.com/png.latex?c">; when <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%3E%2090%C2%B0"> the cosine is negative so the correction term <em>lengthens</em> <img src="https://latex.codecogs.com/png.latex?c">. Regardless of the angle <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, however, the Triangle Inequality still holds: <img src="https://latex.codecogs.com/png.latex?c%20%3C%20a%20+%20b"> because the shortest distance between two points is a straight line.<sup>2</sup></p>
</section>
<section id="from-geometry-to-statistics" class="level2">
<h2 class="anchored" data-anchor-id="from-geometry-to-statistics">From Geometry to Statistics</h2>
<p>The cosine of an angle is always between negative one and one. Can you think of anything else that shares this property? That’s right: correlation! So let’s put the Law of Cosines and our standard error formula from above side-by-side: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Ac%5E2%20&amp;=%20a%5E2%20+%20b%5E2%20-%202%5Ccos(%5Ctheta)%20%5Ccdot%20ab%5C%5C%20%5C%5C%0A%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%5E2%20&amp;=%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%5E2%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%5E2%20-%202%5Crho%20%5Ccdot%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%20%5Ccdot%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D).%0A%5Cend%7Baligned%7D%0A"> The analogy is <em>perfect</em>. We can view <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)"> as the lengths of two sides of a triangle and <img src="https://latex.codecogs.com/png.latex?%5Crho"> as the cosine of the angle between these sides. This makes <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)"> the length of the third side, indicated in blue in the following diagram. When <img src="https://latex.codecogs.com/png.latex?%5Crho"> is <em>positive</em>, the standard error of the difference is <em>smaller</em> than it would be under independence; if <img src="https://latex.codecogs.com/png.latex?%5Crho"> is <em>negative</em>, the standard error of the difference is <em>larger</em> than it would be under independence.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/overlapping-confidence-intervals-part-ii/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="1152"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="the-grand-finale" class="level2">
<h2 class="anchored" data-anchor-id="the-grand-finale">The Grand Finale</h2>
<p>Since we’ve equated <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)">, <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)"> with the sides of a triangle, the Triangle Inequality gives <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%20%3C%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%0A"> assuming that <img src="https://latex.codecogs.com/png.latex?%7C%5Crho%7C%20%3C%201">. And now we’re on familiar ground. Let <img src="https://latex.codecogs.com/png.latex?z"> be the appropriate quantile of a normal distribution, i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?z%20%5Capprox%202"> for a 95% confidence interval. Just as we argued in <a href="../../post/overlapping-confidence-intervals/">Part I</a>, the individual confidence intervals for <img src="https://latex.codecogs.com/png.latex?%5Calpha"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> overlap precisely when <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)/z%20%3C%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)"> and there is a significant difference between the two parameters when <img src="https://latex.codecogs.com/png.latex?(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)/z%20%3E%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)">. The condition for overlap <em>and</em> a significant difference is <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D)%20%3C%20%5Cfrac%7B%5Chat%7B%5Calpha%7D%20-%20%5Chat%7B%5Cbeta%7D%7D%7Bz%7D%20%3C%20%5Ctext%7BSE%7D(%5Chat%7B%5Calpha%7D)%20+%20%5Ctext%7BSE%7D(%5Chat%7B%5Cbeta%7D)%0A"> which holds by the Triangle Inequality. So we’re back to exactly the same situation we were in when <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%200">! As long as <img src="https://latex.codecogs.com/png.latex?%5Crho%20%5Cneq%20-1,%201"> the same results concerning confidence interval overlap apply when <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Calpha%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D"> are correlated as when they are uncorrelated:</p>
<ol type="1">
<li>Overlap doesn’t tell us <em>anything</em> about whether there is a significant difference, but</li>
<li>a lack of overlap implies that there <em>must</em> be a significant difference.</li>
</ol>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Many people, including most econometricians, reserve the term <em>standard error</em> for an <em>estimate</em> of this standard deviation. I prefer to call this estimate the <em>estimated standard error</em>. Clearly my convention is better and everyone should adopt it :)↩︎</p></li>
<li id="fn2"><p>Here I assume that this is a <em>genuine triangle</em> rather than three points on the same line.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <guid>https://www.econometrics.blog/post/overlapping-confidence-intervals-part-ii/</guid>
  <pubDate>Sat, 22 Nov 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Overlapping Confidence Intervals</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/overlapping-confidence-intervals/</link>
  <description><![CDATA[ 




<p>Perhaps you’ve seen a claim like this in an applied paper: “the estimated effect for Group A is statistically significant, but the estimated effect for Group B is not; this treatment helps As but not Bs.” But this reasoning is <em>flawed</em>.</p>
<p>To see why, consider the following example from <a href="https://sites.stat.columbia.edu/gelman/research/published/signif4.pdf">Gelman &amp; Stern</a>. We have data from two independent samples: Group A and Group B. For Group A our estimated effect is 25 with a standard error of 10, yielding an approximate 95% confidence interval of <img src="https://latex.codecogs.com/png.latex?25%20%5Cpm%2020"> or <img src="https://latex.codecogs.com/png.latex?(5,%2045)">. This interval does not include zero, so the effect for Group A is statistically significant at the 5% level. For Group B our estimated effect is 10 with a standard error of 10, yielding a confidence interval of <img src="https://latex.codecogs.com/png.latex?10%20%5Cpm%2020"> or <img src="https://latex.codecogs.com/png.latex?(-10,%2030)">. This interval <em>does</em> include zero, so the effect for Group B is not statistically significant at the 5% level. But there is no statistically significant difference between the groups: the difference of means is <img src="https://latex.codecogs.com/png.latex?25%20-%2010%20=%2015"> but the standard error for the difference is <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7B10%5E2%20+%2010%5E2%7D%20=%20%5Csqrt%7B200%7D%20%5Capprox%2014.14">. Thus, the 95% confidence interval for the difference is <img src="https://latex.codecogs.com/png.latex?15%20%5Cpm%2028"> or <img src="https://latex.codecogs.com/png.latex?(-13,%2043)">, which comfortably includes zero.</p>
<p>To quote the title of the aforementioned paper: <strong>The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant.</strong> Meditate on this lesson and repeat it ten times before going to bed every night.</p>
<p>After you’ve done this, I have a puzzle for you to ponder. In our example from above, the intervals for A and B <em>overlap</em> and the difference is <em>not significant</em>. Is this a general rule? In other words, does overlap in the two intervals imply no significant difference between the groups? While we’re at it, what about the opposite case? If the two intervals <em>do not</em> overlap, does that mean that there <em>is</em> a significant difference between the groups?</p>
<section id="setting-the-stage" class="level1">
<h1>Setting the Stage</h1>
<section id="intervals-for-a-b-and-their-difference." class="level2">
<h2 class="anchored" data-anchor-id="intervals-for-a-b-and-their-difference.">Intervals for A, B and their difference.</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D"> be our estimator of the population mean <img src="https://latex.codecogs.com/png.latex?%5Cmu_A"> for Group A, and let <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)"> be its standard error. Similarly, let <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D"> be our estimator of the population mean <img src="https://latex.codecogs.com/png.latex?%5Cmu_B"> for Group B, with standard error <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BB%7D)">. If the samples we used to construct <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D"> are independent, then <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(%5Cbar%7BA%7D,%20%5Cbar%7BB%7D)%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20=%20%5Csqrt%7B%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%5E2%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%5E2%7D"> as in the example from above. If <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D"> are approximately normally distributed, say by appealing to the central limit theorem, then we can construct 95% confidence intervals for <img src="https://latex.codecogs.com/png.latex?%5Cmu_A">, <img src="https://latex.codecogs.com/png.latex?%5Cmu_B"> as follows <img src="https://latex.codecogs.com/png.latex?%5Cmu_A%20%5Ccolon%20%5Cquad%20%5Cbar%7BA%7D%20%5Cpm%202%20%5Ctimes%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D),%20%5Cquad%20%5Cquad%20%5Cmu_B%5Ccolon%20%5Cquad%20%5Cbar%7BB%7D%20%5Cpm%202%20%5Ctimes%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."> Similarly, we can construct a confidence interval for the difference in means <img src="https://latex.codecogs.com/png.latex?%5Cmu_A%20-%20%5Cmu_B">, namely <img src="https://latex.codecogs.com/png.latex?%0A(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20%5Cpm%202%20%5Ctimes%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D).%0A"> More generally, to construct an approximate <img src="https://latex.codecogs.com/png.latex?(1%20-%20%5Calpha)%20%5Ctimes%20100%5C%25"> confidence interval, we would replace the 2 above with the appropriate quantile of a standard normal distribution.<sup>1</sup> Below I’ll call this quantile <img src="https://latex.codecogs.com/png.latex?z"> for short.</p>
</section>
<section id="when-is-the-difference-significant" class="level2">
<h2 class="anchored" data-anchor-id="when-is-the-difference-significant">When is the difference significant?</h2>
<p>The difference between <img src="https://latex.codecogs.com/png.latex?%5Cmu_A"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_B"> is statistically significant at the <img src="https://latex.codecogs.com/png.latex?%5Calpha%20%5Ctimes%20100%5C%25"> level if the confidence interval for <img src="https://latex.codecogs.com/png.latex?%5CDelta"> does not include zero, i.e.&nbsp;if <img src="https://latex.codecogs.com/png.latex?%7C%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D%7C%20%3E%20z%20%5Ccdot%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)">. But working with absolute values will quickly become tedious, so without loss of generality, let’s assume <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20%5Cgeq%20%5Cbar%7BB%7D">. If this doesn’t hold, we can always relabel the two groups so it does hold. Then the condition for a significant difference becomes <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3E%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)."></p>
</section>
<section id="when-do-the-intervals-overlap" class="level2">
<h2 class="anchored" data-anchor-id="when-do-the-intervals-overlap">When do the intervals overlap?</h2>
<p>Again, without loss of generality, we can assume that <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%5Cgeq%20%5Cbar%7BB%7D">. Now think about what it would mean for the two confidence intervals to overlap. The center of the interval for <img src="https://latex.codecogs.com/png.latex?%5Cmu_A"> is to the right of the center of the interval for <img src="https://latex.codecogs.com/png.latex?%5Cmu_B"> since <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20%5Cgeq%20%5Cbar%7BB%7D">. So for the two intervals to overlap, the <em>lower confidence limit</em> of the <img src="https://latex.codecogs.com/png.latex?%5Cmu_A"> interval must be to the left of the <em>upper confidence limit</em> of the <img src="https://latex.codecogs.com/png.latex?%5Cmu_B"> interval. This figure illustrates the logic using <img src="https://latex.codecogs.com/png.latex?z%20=%202">.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/overlapping-confidence-intervals/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>From the figure, we see that the two intervals overlap if <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20-%202%20%5Ccdot%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20%3C%20%5Cbar%7BB%7D%20+%202%20%5Ccdot%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)"> Rearranging, and using the generic quantile <img src="https://latex.codecogs.com/png.latex?z">, this becomes <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."></p>
</section>
</section>
<section id="case-i-overlapping-intervals-for-a-and-b" class="level1">
<h1>Case I: Overlapping Intervals for A and B</h1>
<section id="formalizing-the-question" class="level2">
<h2 class="anchored" data-anchor-id="formalizing-the-question">Formalizing the Question</h2>
<p>Can the confidence intervals for A and B overlap despite there being a significant difference of means between the two groups? Using the results from above, this question is equivalent to asking whether it’s possible for both of these inequalities to hold at the same time:</p>
<ol type="1">
<li>Overlapping CIs: <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)"></li>
<li>Significant Difference: <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3E%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)"></li>
</ol>
<p>So the question becomes: can we find values of <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D">, <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D">, <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)">, and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BB%7D)"> such that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20%3C%20%5Cfrac%7B%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D%7D%7Bz%7D%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)?"></p>
</section>
<section id="lets-talk-about-triangles" class="level2">
<h2 class="anchored" data-anchor-id="lets-talk-about-triangles">Let’s talk about triangles!</h2>
<p>For just a moment, forget that we’re talking about statistics and cast your mind back to <em>high school geometry</em>. There are two facts I’d like you to recall:</p>
<section id="the-pythagorean-theorem" class="level3">
<h3 class="anchored" data-anchor-id="the-pythagorean-theorem">1: The Pythagorean Theorem</h3>
<p>If <img src="https://latex.codecogs.com/png.latex?a"> and <img src="https://latex.codecogs.com/png.latex?b"> are the lengths of the legs of a right triangle, then the length <img src="https://latex.codecogs.com/png.latex?c"> of the hypotenuse satisfies <img src="https://latex.codecogs.com/png.latex?c%20=%20%5Csqrt%7Ba%5E2%20+%20b%5E2%7D">.</p>
</section>
<section id="the-triangle-inequality" class="level3">
<h3 class="anchored" data-anchor-id="the-triangle-inequality">2: The Triangle Inequality</h3>
<p>For <em>any triangle</em> with sides <img src="https://latex.codecogs.com/png.latex?a">, <img src="https://latex.codecogs.com/png.latex?b">, and <img src="https://latex.codecogs.com/png.latex?c">, we have <img src="https://latex.codecogs.com/png.latex?c%20%5Cleq%20a%20+%20b">.<sup>2</sup></p>
<p>So how do these two facts help us? Consider a right triangle whose legs have lengths <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BB%7D)">. By the Pythagorean Theorem, the hypotenuse of this triangle has length <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7B%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%5E2%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%5E2%7D%20=%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20"> and by the triangle inequality, we have: <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."> We can read this inequality from the following figure: travelling along the dashed red path covers a distance of <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)">, while travelling along the solid blue path the shorter distance of <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)">.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/overlapping-confidence-intervals/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="the-solution" class="level2">
<h2 class="anchored" data-anchor-id="the-solution">The Solution</h2>
<p>The question we set out to answer is whether we can find values that satisfy the inequality: <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)%20%3C%20%5Cfrac%7B%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D%7D%7Bz%7D%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."> Since the right-hand side is <em>always</em> strictly larger than the left-hand side, the answer is <em>yes</em>. Let’s try it out using a simple example. Suppose that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20=%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%20=%2010"> as in the example from above, and consider a 95% confidence interval so that <img src="https://latex.codecogs.com/png.latex?z%20%5Capprox%202">. Then the inequality becomes <img src="https://latex.codecogs.com/png.latex?%0A20%20%5Csqrt%7B2%7D%20%3C%20%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D%20%3C%2040.%0A"> Since <img src="https://latex.codecogs.com/png.latex?20%20%5Csqrt%7B2%7D%20%5Capprox%2028.28"> we have a whole range of values for <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D"> that will do the trick. For example, <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D-%5Cbar%7BB%7D%20=%2030"> will do the trick. So if <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20=%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%20=%2010">, <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20=%2040"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D%20=%2010">, the 95% CIs for A and B overlap but there <em>is</em> a significant difference between the two groups. This is the opposite of Gelman &amp; Stern’s example.</p>
<p>Our inequality from above depends only the difference between <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D">, not on the individual value of each sample mean. So if we keep the same standard errors as before but set <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BA%7D%20=%2015"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BB%7D%20=%20-15">, so the difference of means remains 30, we obtain the same result: overlapping intervals with a significant difference between the two groups. Notice anything interesting about this example? The interval for A is <img src="https://latex.codecogs.com/png.latex?(-5,35)"> while the interval for B is <img src="https://latex.codecogs.com/png.latex?(-35,%205)">. So <em>both intervals include zero</em> but there is <em>still</em> a significant difference between them! If all we know is that two intervals overlap, we can’t say <em>anything</em> about the significance of the difference.</p>
</section>
</section>
<section id="case-ii-intervals-for-a-and-b-that-dont-overlap" class="level1">
<h1>Case II: Intervals for A and B that Don’t Overlap</h1>
<p>Now for the easy one. Since <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3C%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."> holds if and only if the intervals overlap, we merely reverse the inequality to get a condition for intervals that <em>do not overlap</em>, namely <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3E%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)."> Appending what we learned above from our right triangle diagram gives <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z%20%3E%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D)%20+%20%5Ctext%7BSE%7D(%5Cbar%7BB%7D)%3E%20%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)."> So if the intervals <em>do not</em> overlap, then <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)/z"> must be <em>greater than</em> <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BSE%7D(%5Cbar%7BA%7D%20-%20%5Cbar%7BB%7D)">, which is precisely the condition for a significant difference between the two groups.</p>
</section>
<section id="in-summary" class="level1">
<h1>In Summary</h1>
<p>In an independent samples problem where the two individual confidence intervals overlap, there <em>may or may not</em> be a significant difference between the groups. Even if the two intervals both contain zero, there <em>could still</em> be a significant difference between them. If the two intervals <em>do not overlap</em> then we can conclude that there <em>is</em> a significant difference. But if you only take one lesson away from this post it should be this one: the difference between significant and not significant is not itself significant. If you want to carry out inference for a difference, you need to construct the standard error for that difference.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Granted, 2 is <em>slightly</em> larger than <img src="https://latex.codecogs.com/png.latex?%5Ctexttt%7Bqnorm%7D(0.975)">, but do you really want to multiply by 1.96 in your head?↩︎</p></li>
<li id="fn2"><p>The triangle inequality just says that “the shortest distance between two points in a Euclidean plane is a straight line.” If we’re working with a genuine triangle, then the vertices <em>cannot</em> lie on the same line. So traveling from <img src="https://latex.codecogs.com/png.latex?x"> to <img src="https://latex.codecogs.com/png.latex?z"> via <img src="https://latex.codecogs.com/png.latex?y"> always covers a greater distance than going straight from <img src="https://latex.codecogs.com/png.latex?x"> to <img src="https://latex.codecogs.com/png.latex?z">.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <guid>https://www.econometrics.blog/post/overlapping-confidence-intervals/</guid>
  <pubDate>Sat, 15 Nov 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>A Good Instrument is a Bad Control: Part II</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/a-good-instrument-is-a-bad-control-part-ii/</link>
  <description><![CDATA[ 




<p>At a recent seminar dinner the conversation drifted to causal inference, and I mentioned my dream of one day producing a Lady Gaga parody music video called “Bad Control”.<sup>1</sup> A lively discussion of bad controls ensued, during which I offered one of my favorite examples: <a href="../../post/a-good-instrument-is-a-bad-control/">a good instrument is a bad control</a>. To summarize that earlier post: including a valid instrumental variable as a <em>control</em> variable can only amplify the bias on the coefficient for our endogenous regressor of interest. When used as a control, the instrument “soaks up” the good (exogenous) variation in the endogenous regressor, leaving behind only the bad (endogenous) variation. This is the opposite of what happens in an instrumental variables regression, where we use the instrument to <em>extract</em> only the good variation in the endogenous regressor. More generally, a “bad control” is a covariate that we <em>shouldn’t adjust for</em> when using a <a href="../../post/how-to-do-regression-adjustment/">selection-on-observables</a> approach to causal inference.</p>
<p>Upon hearing my IV example, my colleague immediately asked “but what about the coefficient on the <em>instrument</em> itself?” This is a great question and one I hadn’t thought about before. Today I’ll give you my answer.</p>
<p>This post is a sequel, so you may find it helpful to glance at my <a href="../../post/a-good-instrument-is-a-bad-control/">earlier post</a> before reading further. At the very end of the post I’ll rely on a few basic ideas about directed acyclic graphs (DAGs). If this material is unfamiliar, you may find my <a href="https://www.treatment-effects.com/basics/">treatment effects slides</a> helpful. With these caveats, I’ll do my best to keep this post relatively self-contained.</p>
<section id="recap-of-part-i" class="level2">
<h2 class="anchored" data-anchor-id="recap-of-part-i">Recap of Part I</h2>
<p>Suppose that <img src="https://latex.codecogs.com/png.latex?X"> is our endogenous regressor of interest in the linear causal model <img src="https://latex.codecogs.com/png.latex?Y%20=%20%5Calpha%20+%20%5Cbeta%20X%20+%20U"> where <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)%20%5Cneq%200"> but <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,U)%20=%200">, and where <img src="https://latex.codecogs.com/png.latex?Z"> is an instrumental variable that is correlated with <img src="https://latex.codecogs.com/png.latex?X">. Now consider the population linear regression of <img src="https://latex.codecogs.com/png.latex?Y"> on both <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">, namely <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cgamma_0%20+%20%5Cgamma_X%20X%20+%20%5Cgamma_Z%20Z%20+%20%5Ceta%0A"> where the error term <img src="https://latex.codecogs.com/png.latex?%5Ceta"> satisfies <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,%5Ceta)%20=%20%5Ctext%7BCov%7D(Z,%5Ceta)%20=%20%5Cmathbb%7BE%7D(%5Ceta)%20=%200"> <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>. Further define the population linear regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z">, namely <img src="https://latex.codecogs.com/png.latex?%0AX%20=%20%5Cpi_0%20+%20%5Cpi_Z%20Z%20+%20V%0A"> where the error term <img src="https://latex.codecogs.com/png.latex?V"> satisfies <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,V)%20=%20%5Cmathbb%7BE%7D(V)%20=%200"> <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>. Finally, define the population linear regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> as <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cdelta_0%20+%20%5Cdelta_X%20X%20+%20%5Cepsilon,%20%5Cquad%20%5Ctext%7BCov%7D(X,%5Cepsilon)%20=%20%5Cmathbb%7BE%7D(%5Cepsilon)%20=%200.%0A"> Using this notation, the result from <a href="../../post/a-good-instrument-is-a-bad-control/">my earlier post</a> can be written as <img src="https://latex.codecogs.com/png.latex?%0A%5Cdelta_X%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(X)%7D,%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%20%5Cgamma_X%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(V)%7D.%0A"> To understand what this tells us, notice that, using the “first-stage” regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z">, we can write <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(V)%20%5Cequiv%20%5Ctext%7BVar%7D(X%20-%20%5Cpi_0%20-%20%5Cpi_Z%20Z)%20=%20%5Ctext%7BVar%7D(X)%20-%20%5Cpi_Z%5E2%20%5Ctext%7BVar%7D(Z).%0A"> This shows that whenever <img src="https://latex.codecogs.com/png.latex?Z"> is a relevant instrument <img src="https://latex.codecogs.com/png.latex?(%5Cpi_Z%20%5Cneq%200)">, we must have <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(V)%20%3C%20%5Ctext%7BVar%7D(X)">. It follows that <img src="https://latex.codecogs.com/png.latex?%5Cgamma_X"> is <em>more biased</em> than <img src="https://latex.codecogs.com/png.latex?%5Cdelta_X">: adding <img src="https://latex.codecogs.com/png.latex?Z"> as a control regressor only makes our estimate of the effect of <img src="https://latex.codecogs.com/png.latex?X"> <em>worse</em>!<sup>2</sup></p>
</section>
<section id="what-about-gamma_z" class="level2">
<h2 class="anchored" data-anchor-id="what-about-gamma_z">What about <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z">?</h2>
<p>So if <img src="https://latex.codecogs.com/png.latex?Z"> soaks up the <em>good variation</em> in <img src="https://latex.codecogs.com/png.latex?X">, what about the coefficient <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z"> on the instrument <img src="https://latex.codecogs.com/png.latex?Z">? Perhaps this coefficient contains some useful information about the causal effect of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Y">? To find out, we’ll use the <a href="../../post/two-fwl-theorems-for-the-price-of-one/">FWL Theorem</a> as follows: <img src="https://latex.codecogs.com/png.latex?%0A%5Cgamma_Z%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(Y,%5Ctilde%7BZ%7D)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BZ%7D)%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?Z%20=%20%5Clambda_0%20+%20%5Clambda_X%20X%20+%20%5Ctilde%7BZ%7D"> is the population linear regression of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?X">. This is the <em>reverse</em> of the first-stage regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z"> described above. Here the error term <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BZ%7D"> satisfies <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BCov%7D(%5Ctilde%7BZ%7D,%20X)%20=%200"> <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>. Substituting the causal model gives <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BCov%7D(%5Calpha%20+%20%5Cbeta%20X%20+%20U,%20%5Ctilde%7BZ%7D)%20=%20%5Cbeta%20%5Ctext%7BCov%7D(X,%5Ctilde%7BZ%7D)%20+%20%5Ctext%7BCov%7D(U,%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BCov%7D(U,%20%5Ctilde%7BZ%7D)%0A"> since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,%5Ctilde%7BZ%7D)%20=%200"> by construction. Now, substituting the definition of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BZ%7D">, <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(U,%20%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BCov%7D(U,%20Z%20-%20%5Clambda_0%20-%20%5Clambda_X%20X)%20=%20%5Ctext%7BCov%7D(U,Z)%20-%20%5Clambda_X%20%5Ctext%7BCov%7D(U,X)%20=%20-%5Clambda_X%20%5Ctext%7BCov%7D(X,U)%0A"> since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(U,Z)%20=%200"> by assumption. We can already see that <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z"> is <em>not going to help us</em> learn about <img src="https://latex.codecogs.com/png.latex?%5Cbeta">. First of all, the term containing <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> vanished; second of all, the term that remained is polluted by the endogeneity of <img src="https://latex.codecogs.com/png.latex?X">, namely <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)">.</p>
<p>Still, let’s see if we can get a clean expression for <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z">. So far we have calculated the numerator of the FWL expression, showing that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Y,%5Ctilde%7BZ%7D)%20=%20-%5Clambda_X%20%5Ctext%7BCov%7D(X,U)">. The next step is to calculate <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BZ%7D)">: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BVar%7D(Z%20-%20%5Clambda_0%20-%20%5Clambda_X%20X)%20=%20%5Ctext%7BVar%7D(Z)%20+%20%5Clambda_X%5E2%20%5Ctext%7BVar%7D(X)%20-%202%5Clambda_X%20%5Ctext%7BCov%7D(X,Z).%0A"> Since <img src="https://latex.codecogs.com/png.latex?%5Clambda_X%20%5Cequiv%20%5Ctext%7BCov%7D(X,Z)/%5Ctext%7BVar%7D(X)">, our expression for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BZ%7D)"> simplifies to <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(%5Ctilde%7BZ%7D)%20=%20%5Ctext%7BVar%7D(Z)%20-%20%5Clambda_X%20%5Ctext%7BCov%7D(X,Z)%0A"> so we have discovered that: <img src="https://latex.codecogs.com/png.latex?%0A%5Cgamma_Z%20=%20%5Cfrac%7B-%5Clambda_X%20%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(Z)%20-%20%5Clambda_X%20%5Ctext%7BCov%7D(X,Z)%7D.%0A"></p>
<p>Call me old-fashioned, but I <em>really</em> don’t like having <img src="https://latex.codecogs.com/png.latex?%5Clambda_X"> in that expression. I’d feel much happier if we could find a way to re-write this in terms of the more familiar IV first-stage coefficient <img src="https://latex.codecogs.com/png.latex?%5Cpi_Z">. Let’s give it a try! Let’s use my favorite trick of <em>multiplying by one</em>: <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_X%20%5Cequiv%20%5Cfrac%7B%5Ctext%7BCov%7D(X,Z)%7D%7B%5Ctext%7BVar%7D(X)%7D%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(X,Z)%7D%7B%5Ctext%7BVar%7D(X)%7D%20%5Ccdot%20%5Cfrac%7B%5Ctext%7BVar%7D(Z)%7D%7B%5Ctext%7BVar%7D(Z)%7D%20=%20%5Cpi_Z%20%5Ccdot%20%5Cfrac%7B%5Ctext%7BVar%7D(Z)%7D%7B%5Ctext%7BVar%7D(X)%7D.%0A"> Substituting for <img src="https://latex.codecogs.com/png.latex?%5Clambda_X"> gives <img src="https://latex.codecogs.com/png.latex?%0A%5Cgamma_Z%20=%20%5Cfrac%7B-%5Cpi_Z%20%5Cfrac%7B%5Ctext%7BVar%7D(Z)%7D%7B%5Ctext%7BVar%7D(X)%7D%20%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(Z)%20-%20%5Cpi_Z%20%5Cfrac%7B%5Ctext%7BVar%7D(Z)%7D%7B%5Ctext%7BVar%7D(X)%7D%20%5Ctext%7BCov%7D(X,Z)%7D%20=%20%5Cfrac%7B-%5Cpi_Z%20%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(X)%20-%20%5Cpi_Z%5E2%20%5Ctext%7BVar%7D(Z)%7D.%0A"> We can simplify this even further by substituting <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(V)%20=%20%5Ctext%7BVar%7D(X)%20-%20%5Cpi_Z%5E2%20%5Ctext%7BVar%7D(Z)"> from above to obtain <img src="https://latex.codecogs.com/png.latex?%0A%5Cgamma_Z%20=%20-%5Cpi_Z%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(V)%7D.%0A"> And now we recognize something from above: <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)/%5Ctext%7BVar%7D(V)"> was the <em>bias</em> of <img src="https://latex.codecogs.com/png.latex?%5Cgamma_X"> relative to the true causal effect <img src="https://latex.codecogs.com/png.latex?%5Cbeta">! This means we can also write <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z%20=%20-%5Cpi_Z%20(%5Cgamma_X%20-%20%5Cbeta)">.</p>
</section>
<section id="a-little-simulation" class="level2">
<h2 class="anchored" data-anchor-id="a-little-simulation">A Little Simulation</h2>
<p>We seem to be doing an awful lot of algebra on this blog lately. To make sure that we haven’t made any silly mistakes, let’s check our work using a little simulation experiment taken from my <a href="../../post/a-good-instrument-is-a-bad-control/#a-simulation-example">earlier post</a>. Spoiler alert: everything checks out!</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb1-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e5</span></span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate instrument (z)</span></span>
<span id="cb1-5">z <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate error terms (u, v)</span></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mvtnorm)</span>
<span id="cb1-9">Rho <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, </span>
<span id="cb1-10">                <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">byrow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-11">errors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmvnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> Rho)</span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate linear causal model</span></span>
<span id="cb1-14">u <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> errors[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-15">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> errors[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb1-16">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> v</span>
<span id="cb1-17">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> u</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Regression of y on x and z</span></span>
<span id="cb1-20">gamma <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coefficients</span>()</span>
<span id="cb1-22"></span>
<span id="cb1-23">gamma</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>(Intercept)           x           z 
 -0.5471213   1.5018705  -0.3981116 </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First-stage regression of x on z</span></span>
<span id="cb3-2">pi <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coefficients</span>()</span>
<span id="cb3-4"></span>
<span id="cb3-5">pi</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>(Intercept)           z 
  0.5020338   0.7963889 </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare two different expressions for gamma_Z to the estimate itself</span></span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gamma_z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>(gamma[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]),</span>
<span id="cb5-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">version1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(x, u) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(v)),</span>
<span id="cb5-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">version2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pi[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (gamma[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb5-5">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>   gamma_z   version1   version2 
-0.3981116 -0.4024918 -0.3996841 </code></pre>
</div>
</div>
</section>
<section id="making-sense-of-this-result" class="level2">
<h2 class="anchored" data-anchor-id="making-sense-of-this-result">Making Sense of This Result</h2>
<p>So far all we’ve done is horrible, tedious algebra and a little simulation to check that it’s correct. But in fact there’s some very interesting intuition for the results we’ve obtained, intuition that is <em>deeply connected</em> to the idea of a bad control in a directed acyclic graph (DAG).</p>
<p>In the model we’ve described above, <img src="https://latex.codecogs.com/png.latex?Z"> has a causal effect on <img src="https://latex.codecogs.com/png.latex?Y">. This is because <img src="https://latex.codecogs.com/png.latex?Z"> causes <img src="https://latex.codecogs.com/png.latex?X"> which in turn causes <img src="https://latex.codecogs.com/png.latex?Y">. Because <img src="https://latex.codecogs.com/png.latex?Z"> is an instrument, its <em>only</em> effect on <img src="https://latex.codecogs.com/png.latex?Y"> goes through <img src="https://latex.codecogs.com/png.latex?X">. The unobserved confounder <img src="https://latex.codecogs.com/png.latex?U"> is a common cause of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> but is unrelated to <img src="https://latex.codecogs.com/png.latex?Z">. Even if you’re not familiar with DAGs, you will probably find this diagram relatively intuitive:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggdag)</span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb7-3"></span>
<span id="cb7-4">iv_dag <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dagify</span>(</span>
<span id="cb7-5">  Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> U,</span>
<span id="cb7-6">  X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> U,</span>
<span id="cb7-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coords =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb7-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Z =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">X =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">U =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb7-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Z =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">X =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">U =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb7-10">  )</span>
<span id="cb7-11">)</span>
<span id="cb7-12"></span>
<span id="cb7-13">iv_dag <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggdag</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_dag</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/a-good-instrument-is-a-bad-control-part-ii/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="576"></p>
</figure>
</div>
</div>
</div>
<p>In the figure, an arrow from <img src="https://latex.codecogs.com/png.latex?A"> to <img src="https://latex.codecogs.com/png.latex?B"> means that <img src="https://latex.codecogs.com/png.latex?A"> is a cause of <img src="https://latex.codecogs.com/png.latex?B">. A causal path is a sequence of arrows that “obeys one-way signs” and leads from <img src="https://latex.codecogs.com/png.latex?A"> to <img src="https://latex.codecogs.com/png.latex?B">. Because there is a directed path from <img src="https://latex.codecogs.com/png.latex?Z"> to <img src="https://latex.codecogs.com/png.latex?Y">, we say that <img src="https://latex.codecogs.com/png.latex?Z"> is a cause of <img src="https://latex.codecogs.com/png.latex?Y">. To see this using our regression equations from above, substitute the IV first-stage into the linear causal model to obtain <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AY%20&amp;=%20%5Calpha%20+%20%5Cbeta%20X%20+%20U%20=%20%5Calpha%20+%20%5Cbeta%20(%5Cpi_0%20+%20%5Cpi_Z%20Z%20+%20V)%20+%20U%5C%5C%0A&amp;=%20(%5Calpha%20+%20%5Cbeta%20%5Cpi_0)%20+%20%5Cbeta%20%5Cpi_Z%20Z%20+%20(%5Cbeta%20V%20+%20U).%0A%5Cend%7Baligned%7D%0A"> This gives us a linear equation with <img src="https://latex.codecogs.com/png.latex?Y"> on the left-hand side and <img src="https://latex.codecogs.com/png.latex?Z"> <em>alone</em> on the right-hand side. This is called the “reduced-form” regression. Since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,U)=0"> by assumption and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,V)%20=%200"> by construction, the reduced-form is a <em>bona fide</em> population linear regression. That means that regressing <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?Z"> will indeed give us a slope that equals <img src="https://latex.codecogs.com/png.latex?%5Cpi_Z%20%5Ctimes%20%5Cbeta">. To see why the slope is a product, recall that <img src="https://latex.codecogs.com/png.latex?%5Cpi_Z"> is the causal effect of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?X">, the <img src="https://latex.codecogs.com/png.latex?Z%5Crightarrow%20X"> arrow in the diagram, while <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> is the causal effect of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Y">, the <img src="https://latex.codecogs.com/png.latex?X%20%5Crightarrow%20Y"> arrow in the diagram. Because the only way <img src="https://latex.codecogs.com/png.latex?Z"> can influence <img src="https://latex.codecogs.com/png.latex?Y"> is through <img src="https://latex.codecogs.com/png.latex?X">, it makes sense that the causal effect of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?Y"> is the <em>product</em> of these two effects.</p>
<p>So now we see that the reduced-form coefficient <img src="https://latex.codecogs.com/png.latex?%5Cpi_Z%20%5Cbeta"> is indeed a causal effect. How does this relate to <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z">? Remember that <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z"> was the coefficient on <img src="https://latex.codecogs.com/png.latex?Z"> in a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?X">, in other words a regression that <em>adjusted</em> for <img src="https://latex.codecogs.com/png.latex?X">. So is adjusting for <img src="https://latex.codecogs.com/png.latex?X"> the right call? Absolutely not! There are no back-door paths between <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?Y">.<sup>3</sup> This means that we <em>don’t have to adjust</em> for anything to learn the causal effect of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?Y">. In fact adjusting for <img src="https://latex.codecogs.com/png.latex?X"> is a mistake for <em>two different reasons</em>.</p>
<p>First, <img src="https://latex.codecogs.com/png.latex?X"> is a mediator on the path <img src="https://latex.codecogs.com/png.latex?Z%20%5Crightarrow%20X%20%5Crightarrow%20Y">. If there were no confounding, i.e.&nbsp;if <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)%20=%200"> so there is no <img src="https://latex.codecogs.com/png.latex?U%5Crightarrow%20X"> arrow, adjusting for <img src="https://latex.codecogs.com/png.latex?X"> would <em>block</em> the only causal path from <img src="https://latex.codecogs.com/png.latex?Z"> to <img src="https://latex.codecogs.com/png.latex?Y">. We can see this in our equations from above. Suppose that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)%20=%200">. Then we have <img src="https://latex.codecogs.com/png.latex?%5Cgamma_X%20=%20%5Cbeta"> but <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z%20=%200">! There was a dead giveaway in our derivation: the formula for <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z"> doesn’t depend on <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> at all.</p>
<p>Second, because there <em>is</em> confounding, adjusting for <img src="https://latex.codecogs.com/png.latex?X"> creates a spurious association between <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?Y"> through the back-door path <img src="https://latex.codecogs.com/png.latex?Z%20%5Crightarrow%20X%20%5Cleftarrow%20U%20%5Crightarrow%20Y">. Because <img src="https://latex.codecogs.com/png.latex?X"> is a collider on the path <img src="https://latex.codecogs.com/png.latex?Z%20%5Crightarrow%20X%20%5Cleftarrow%20U%20%5Crightarrow%20Y">, this path starts out <em>closed</em>. Adjusting for <img src="https://latex.codecogs.com/png.latex?X"> <em>opens</em> this back-door path, creating a spurious association between <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?Y">. To see why this is the case, suppose that <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20=%200">. In this case there is <em>no causal effect</em> of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Y"> and hence no causal effect of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?Y">. But if <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)%20%5Cneq%200">, then we have <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z%20%5Cneq%200">!</p>
<p>So if you want to learn the causal effect of <img src="https://latex.codecogs.com/png.latex?Z"> on <img src="https://latex.codecogs.com/png.latex?Y">, it’s not just that <img src="https://latex.codecogs.com/png.latex?X"> is a <strong>bad control</strong>; it’s a doubly bad control! Without adjusting for <img src="https://latex.codecogs.com/png.latex?X">, everything is fine: the reduced-form regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?Z"> gives us exactly what we’re after.<sup>4</sup></p>
</section>
<section id="epilogue" class="level2">
<h2 class="anchored" data-anchor-id="epilogue">Epilogue</h2>
<p>When I showed this post to another colleague he asked me whether there is any way to learn about <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> by <em>combining</em> <img src="https://latex.codecogs.com/png.latex?%5Cgamma_Z"> and <img src="https://latex.codecogs.com/png.latex?%5Cgamma_X">. The answer is no: the regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z"> alone doesn’t contain enough information. Since <img src="https://latex.codecogs.com/png.latex?%0A%5Cgamma_Z%20=%20-%5Cpi_Z%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(V)%7D%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%20%5Cgamma_X%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(V)%7D%0A"> we can rearrange to obtain the following expression for <img src="https://latex.codecogs.com/png.latex?%5Cbeta">: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta%20=%20%5Cgamma_X%20+%20%5Cfrac%7B%5Cgamma_Z%7D%7B%5Cpi_Z%7D%0A"> which we can verify in our little simulation example as follows:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">gamma[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> gamma[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>pi[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>       x 
1.001975 </code></pre>
</div>
</div>
<p>Thus, in order to solve for <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, we need to run the first-stage regression to learn <img src="https://latex.codecogs.com/png.latex?%5Cpi_Z">.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I have a very rich inner life.↩︎</p></li>
<li id="fn2"><p>The right way to learn <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> by regressing <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and “something else” is the control function approach described in <a href="../../post/three-ways-of-thinking-about-instrumental-variables/">this post</a>. Rather than adding <img src="https://latex.codecogs.com/png.latex?Z">, we add <img src="https://latex.codecogs.com/png.latex?V%20=%20X%20-%20%5Cpi_0%20-%20%5Cpi_Z%20Z"> as a control.↩︎</p></li>
<li id="fn3"><p>The rest of this post relies on some DAG basics. If anything here is unfamiliar, check out my <a href="https://www.treatment-effects.com/basics/">treatment effects slides</a>.↩︎</p></li>
<li id="fn4"><p>Here I assume that we’re interested in the <img src="https://latex.codecogs.com/png.latex?Z%5Crightarrow%20Y"> causal effect. To obtain the <img src="https://latex.codecogs.com/png.latex?X%5Crightarrow%20Y"> effect we would need to use an instrumental variables regression.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <category>causal inference</category>
  <guid>https://www.econometrics.blog/post/a-good-instrument-is-a-bad-control-part-ii/</guid>
  <pubDate>Thu, 28 Aug 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Two FWL Theorems for the Price of One</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/two-fwl-theorems-for-the-price-of-one/</link>
  <description><![CDATA[ 




<p>The result that I prefer to call <a href="../../post/how-to-do-regression-adjustment/#fnref2">Yule’s Rule</a>, more commonly known as the “Frisch-Waugh-Lovell (FWL) theorem”, shows how to calculate the regression slope coefficient for <strong>one predictor</strong> by carrying out additional “auxiliary” regressions that adjust for <strong>all other predictors</strong>. You’ve probably encountered this result if you’ve studied introductory econometrics. But it may surprise you to learn that there are actually <em>two</em> variants of the FWL theorem, each with its pros and cons. Today we’ll take a look at the less familiar version and then circle back to understand what makes the more familiar one a textbook staple.</p>
<section id="simulation-example" class="level2">
<h2 class="anchored" data-anchor-id="simulation-example">Simulation Example</h2>
<p>Let’s start with a little simulation. First we’ll generate 5000 observations of predictors <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?W"> from a joint normal distribution with standard deviations of one, means of zero, and a correlation of 0.5.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1066</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mvtnorm)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate linear regression with two predictors: X and W</span></span>
<span id="cb1-5">covariance_matrix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(</span>
<span id="cb1-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), </span>
<span id="cb1-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nrow =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb1-8">)</span>
<span id="cb1-9"></span>
<span id="cb1-10">n_sims <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span></span>
<span id="cb1-11"></span>
<span id="cb1-12">x_w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmvnorm</span>(</span>
<span id="cb1-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n_sims,  </span>
<span id="cb1-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), </span>
<span id="cb1-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> covariance_matrix</span>
<span id="cb1-16">)</span>
<span id="cb1-17"></span>
<span id="cb1-18">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> x_w[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-19">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> x_w[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span></code></pre></div></div>
</div>
<p>Next we’ll simulate the outcome variable <img src="https://latex.codecogs.com/png.latex?Y"> where the true coefficient on <img src="https://latex.codecogs.com/png.latex?X"> is one and the true coefficient on <img src="https://latex.codecogs.com/png.latex?W"> is -1, adding standard normal errors.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> w <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n_sims)</span></code></pre></div></div>
</div>
<p>Now we’ll run the “auxiliary regressions”. The first one regresses <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?W"> and saves the residuals. Call these residuals <code>x_tilde</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Residuals from regression of X on W</span></span>
<span id="cb3-2">x_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> w) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">residuals</span>()</span></code></pre></div></div>
</div>
<p>The next one regresses <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?W"> and saves the residuals. Call these residuals <code>y_tilde</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Residuals from regression of Y on W</span></span>
<span id="cb4-2">y_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> w) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">residuals</span>()</span></code></pre></div></div>
</div>
<p>To make the code that follows a little simpler, I’ll also create a helper function that runs a linear regression and returns the coefficients after stripping away any variable names.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">get_coef <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(formula) {</span>
<span id="cb5-2">  formula <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb5-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>() </span>
<span id="cb5-6">}</span></code></pre></div></div>
</div>
<p>Now we’re ready to compare some regressions! The “long regression” is a standard linear regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?W">. The “FWL Standard” is a regression of <code>y_tilde</code> on <code>x_tilde</code>. In other words, it regresses the residuals of <img src="https://latex.codecogs.com/png.latex?Y"> on the residuals of <img src="https://latex.codecogs.com/png.latex?X">. The FWL as it is usually encountered in textbooks implies that we should recover the same coefficient on <img src="https://latex.codecogs.com/png.latex?X"> in “Long Regression” and in “FWL Standard”, and indeed the simulation bears this out.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb6-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Long Regression"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_coef</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> w)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],</span>
<span id="cb6-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FWL Standard"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_coef</span>(y_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], </span>
<span id="cb6-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FWL Alternative"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_coef</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb6-5">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Long Regression    FWL Standard FWL Alternative 
      0.9937046       0.9937046       0.9937046 </code></pre>
</div>
</div>
<p>But now take a look at “FWL” alternative: this is a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <code>x_tilde</code>. Compared to the standard FWL approach, this version <em>does not</em> residualize <img src="https://latex.codecogs.com/png.latex?Y"> with respect to <img src="https://latex.codecogs.com/png.latex?W">. But it still gives us <em>exactly</em> the same coefficient on <img src="https://latex.codecogs.com/png.latex?X"> as the other two regressions. That leaves us with two unanswered questions:</p>
<ol type="1">
<li>Why does the “alternative” FWL approach work?</li>
<li><em>Given</em> that the alternative approach works, why does anyone ever teach the “standard” version?</li>
</ol>
<p>In the rest of this post we’ll answer both questions using simple algebra and the properties of linear regression. There are lots of deep ideas here, but there’s no need to bring out the big matrix algebra guns to explain them.</p>
</section>
<section id="a-bit-of-notation" class="level2">
<h2 class="anchored" data-anchor-id="a-bit-of-notation">A Bit of Notation</h2>
<p>First we need a bit of notation. I find it a bit simpler to work with population linear regressions rather than sample regressions, but the ideas are the same either way. So if you prefer to put “hats” on everything and work with sums rather than expectations and covariances, be my guest!</p>
<p>First we’ll define the “Long Regression” as a <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">population linear regression</a> of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?W">, namely <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cbeta_0%20+%20%5Cbeta_X%20X%20+%20%5Cbeta_W%20W%20+%20U,%20%5Cquad%20%5Cmathbb%7BE%7D(U)%20=%20%5Ctext%7BCov%7D(X,U)%20=%20%5Ctext%7BCov%7D(W,U)=0.%0A"> Next I’ll define two additional population linear regressions: first the regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?W"> <img src="https://latex.codecogs.com/png.latex?%0AX%20=%20%5Cgamma_0%20+%20%5Cgamma_W%20W%20+%20%5Ctilde%7BX%7D,%20%5Cquad%20%5Cmathbb%7BE%7D(%5Ctilde%7BX%7D)%20=%20%5Ctext%7BCov%7D(W,%5Ctilde%7BX%7D)=0%0A"> and second the regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?W"> <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cdelta_0%20+%20%5Cdelta_W%20W%20+%20%5Ctilde%7BY%7D,%20%5Cquad%20%5Cmathbb%7BE%7D(%5Ctilde%7BY%7D)%20=%20%5Ctext%7BCov%7D(W,%5Ctilde%7BY%7D)=0.%0A"> I’ve already linked to a post making this point, but it bears repeating: all of the properties of the error terms <img src="https://latex.codecogs.com/png.latex?U">, <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D"> that I’ve stated here hold <em>by construction</em>. They are not assumptions; they are merely <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">what defines an error term</a> in a population linear regression.</p>
</section>
<section id="why-does-the-alternative-fwl-approach-work" class="level2">
<h2 class="anchored" data-anchor-id="why-does-the-alternative-fwl-approach-work">Why does the “alternative” FWL approach work?</h2>
<p>As mentioned in the discussion of our simulation experiment from above, the standard FWL theorem says that a regression of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D"> on <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> with no intercept gives us <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X">, while the <em>alternative</em> version says that a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> with an intercept also gives us <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X">. It is the second claim that we’ll prove now.<sup>1</sup></p>
<p>The alternative FWL theorem claims that <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X%20=%20%5Ctext%7BCov%7D(Y,%5Ctilde%7BX%7D)/%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)">. Since <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> is uncorrelated with <img src="https://latex.codecogs.com/png.latex?W"> by construction, we can <a href="https://github.com/fditraglia/random-variables-cheatsheet/blob/main/random-variables-cheatsheet.pdf">expand the numerator</a> as follows: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(Y,%5Ctilde%7BX%7D)%20=%20%5Ctext%7BCov%7D(%5Cbeta_0%20+%20%5Cbeta_X%20X%20+%20%5Cbeta_W%20W%20+%20U,%20%5Ctilde%7BX%7D)%20=%20%5Cbeta_X%20%5Ctext%7BCov%7D(X,%5Ctilde%7BX%7D)%20+%20%5Ctext%7BCov%7D(U,%5Ctilde%7BX%7D).%0A"> But since <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D%20=%20(X%20-%20%5Cgamma_0%20-%20%5Cgamma_W%20W)"> we also have <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(U,%20%5Ctilde%7BX%7D)%20=%20%5Ctext%7BCov%7D(U,%20X%20-%20%5Cgamma_0%20-%20%5Cgamma_W%20W)%20=%20%5Ctext%7BCov%7D(U,X)%20-%20%5Cgamma_W%20%5Ctext%7BCov%7D(U,W)%20=%200%0A"> since <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?W"> are uncorrelated with <img src="https://latex.codecogs.com/png.latex?U"> by construction. So to prove our original claim it suffices to show that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,%5Ctilde%7BX%7D)%20=%20%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)">. To see why this holds, first write <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(X,%20%5Ctilde%7BX%7D)%20=%20%5Ctext%7BCov%7D(X,%20X%20-%20%5Cgamma_0%20-%20%5Cgamma_W%20W)%20=%20%5Ctext%7BVar%7D(X)%20-%20%5Cgamma_W%20%5Ctext%7BCov%7D(X,W).%0A"> using <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,X)%20=%20%5Ctext%7BVar%7D(X)">. Next, expand <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)"> as follows: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)%20=%20%5Ctext%7BVar%7D(X%20-%20%5Cgamma_0%20-%20%5Cgamma_W%20W)%20=%20%5Ctext%7BVar%7D(X)%20+%20%5Cgamma_W%5E2%20%5Ctext%7BVar%7D(W)%20-%202%20%5Cgamma_W%20%5Ctext%7BCov%7D(X,W).%0A"> and then subtract <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,%5Ctilde%7BX%7D)"> from <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)">: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)%20-%20%5Ctext%7BCov%7D(X,%5Ctilde%7BX%7D)%20=%20%5Cgamma_W%20%5Cleft%5B%20%5Cgamma_W%20%5Ctext%7BVar%7D(W)%20-%20%5Ctext%7BCov%7D(X,W)%20%5Cright%5D.%0A"> This shows that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,%5Ctilde%7BX%7D)"> are equal if and only if <img src="https://latex.codecogs.com/png.latex?%5Cgamma_W%20%5Ctext%7BVar%7D(W)%20=%20%5Ctext%7BCov%7D(X,W)">. But since <img src="https://latex.codecogs.com/png.latex?%5Cgamma_W"> is the coefficient from the regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?W">, we already know that <img src="https://latex.codecogs.com/png.latex?%5Cgamma_W%20=%20%5Ctext%7BCov%7D(X,W)/%5Ctext%7BVar%7D(W)">! With a bit of algebra using the properties of covariance and the definition of a population linear regression, we’ve shown that the alternative FWL theorem holds.</p>
</section>
<section id="whats-different-about-the-usual-fwl-theorem" class="level2">
<h2 class="anchored" data-anchor-id="whats-different-about-the-usual-fwl-theorem">What’s different about the “usual” FWL theorem?</h2>
<p>At this point you may be wondering why anyone teaches the “usual” version of the FWL theorem at all. If that extra short regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?W"> isn’t needed to learn <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X">, why bother?</p>
<p>To answer this question, we’ll start by re-writing the long regression two different ways. First, we’ll substitute <img src="https://latex.codecogs.com/png.latex?X%20=%20%5Cgamma_0%20+%20%5Cgamma_W%20W%20+%20%5Ctilde%7BX%7D"> into the long regression and re-arrange, yielding <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20(%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0)%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20(%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W)%20W%20+%20%20U.%0A"> Next we’ll substitute <img src="https://latex.codecogs.com/png.latex?Y%20=%20%5Cdelta_0%20+%20%5Cdelta_W%20W%20+%20%5Ctilde%7BY%7D"> on the left-hand side of the preceding equation and rearrange to isolate <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D">. This leaves us with <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7BY%7D%20=%20(%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0%20-%20%5Cdelta_0)%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20(%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W%20-%20%5Cdelta_W)%20W%20+%20U.%0A"> Now we have two expressions, each with <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X%20%5Ctilde%7BX%7D"> as one of the terms on the right-hand side and <img src="https://latex.codecogs.com/png.latex?U"> as another. Notice that both expressions have an intercept and a term in which <img src="https://latex.codecogs.com/png.latex?W"> is multiplied by a constant. What’s more, the intercepts are closely related across the two equations, as are the <img src="https://latex.codecogs.com/png.latex?W"> coefficients. I’m now going to make a bold assertion: the intercept and <img src="https://latex.codecogs.com/png.latex?W"> coefficient in the second expression, the <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D"> one, are <strong>both equal to zero</strong> <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0%20-%20%5Cdelta_0%20=%200,%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%20%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W%20-%20%5Cdelta_W%20=%200.%0A"> Perhaps you don’t believe me, but just for the moment <em>suppose that I’m correct</em>. In this case it would immediately follow that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0%20=%20%5Cdelta_0,%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%20%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W%20=%20%5Cdelta_W%0A"> leaving us with two simple linear regressions, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AY%20&amp;=%20%5Cdelta_0%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20(%5Cbeta_W%20W%20+%20U)%5C%5C%0A%5Ctilde%7BY%7D%20&amp;=%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20U.%0A%5Cend%7Baligned%7D%0A"> We’re tantalizingly close to unraveling the mystery of why the “usual” FWL theorem is so popular. But first we need to verify my bold claim from the previous paragraph. To do so, we’ll fall back on our old friend: the <em>omitted variable bias formula</em>, also known as the regression anatomy formula: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cdelta_W%20&amp;%5Cequiv%20%5Cfrac%7B%5Ctext%7BCov%7D(Y,W)%7D%7B%5Ctext%7BVar%7D(W)%7D%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(%5Cbeta_0%20+%20%5Cbeta_X%20X%20+%20%5Cbeta_W%20W%20+%20U,%20W)%7D%7B%5Ctext%7BVar%7D(W)%7D%20=%20%5Cfrac%7B%5Cbeta_W%20%5Ctext%7BVar%7D(W)%20+%20%5Cbeta_X%20%5Ctext%7BCov%7D(X,W)%7D%7B%5Ctext%7BVar%7D(W)%7D%5C%5C%0A&amp;=%20%5Cbeta_W%20+%20%5Cbeta_X%20%5Cfrac%7B%5Ctext%7BCov%7D(X,W)%7D%7B%5Ctext%7BVar%7D(W)%7D%20=%20%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W.%0A%5Cend%7Baligned%7D%0A"> Thus, <img src="https://latex.codecogs.com/png.latex?%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W%20-%20%5Cdelta_W%20=%200"> as claimed. One down, one more to go. By definition, <img src="https://latex.codecogs.com/png.latex?%5Cdelta_0%20=%20%5Cmathbb%7BE%7D(Y)%20-%20%5Cdelta_W%20%5Cmathbb%7BE%7D(W)">. Substituting the long regression for <img src="https://latex.codecogs.com/png.latex?Y">, we have <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cdelta_0%20&amp;=%20%5Cmathbb%7BE%7D(%5Cbeta_0%20+%20%5Cbeta_X%20X%20+%20%5Cbeta_W%20W%20+%20U)%20-%20%5Cdelta_W%20%5Cmathbb%7BE%7D(W)%5C%5C%0A&amp;=%20%5Cbeta_0%20+%20%5Cbeta_X%20%5Cmathbb%7BE%7D(X)%20+%20(%5Cbeta_W%20-%20%5Cdelta_W)%20%5Cmathbb%7BE%7D(W)%0A%5Cend%7Baligned%7D%0A"> by the linearity of expectation and the fact that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(U)%20=%200"> by construction. Now, we’re <em>trying to show</em> that <img src="https://latex.codecogs.com/png.latex?%5Cdelta_0%20=%20%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0">. Substituting for <img src="https://latex.codecogs.com/png.latex?%5Cgamma_0"> in this expression gives <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_0%20+%20%5Cbeta_X%20%5Cgamma_0%20=%20%5Cbeta_0%20+%20%5Cbeta_X%20%5B%5Cmathbb%7BE%7D(X)%20-%20%5Cgamma_W%20%5Cmathbb%7BE%7D(W)%5D%20=%20%5Cbeta_0%20+%20%5Cbeta_X%20%5Cmathbb%7BE%7D(X)%20-%20%5Cbeta_X%20%5Cgamma_W%20%5Cmathbb%7BE%7D(W).%0A"> Inspecting our work so far, we see that the two alternative expressions for <img src="https://latex.codecogs.com/png.latex?%5Cdelta_0"> will be equal precisely when <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X%20%5Cgamma_W%20=%20%5Cdelta_W%20-%20%5Cbeta_W">. But re-arranging this gives <img src="https://latex.codecogs.com/png.latex?%5Cdelta_W%20=%20%5Cbeta_W%20+%20%5Cbeta_X%20%5Cgamma_W">, which we already proved above using the omitted variables bias formula!</p>
</section>
<section id="taking-stock" class="level2">
<h2 class="anchored" data-anchor-id="taking-stock">Taking Stock</h2>
<p>That was a lot of algebra, so let’s spend some time thinking about the results. We showed that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AY%20&amp;=%20%5Cdelta_0%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20(%5Cbeta_W%20W%20+%20U)%5C%5C%0A%5Ctilde%7BY%7D%20&amp;=%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20U.%0A%5Cend%7Baligned%7D%0A"> Now, if you’ll permit me, I’d like to re-write that first equality as <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cdelta_0%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20V,%20%5Cquad%20%5Ctext%7Bwhere%20%7D%20V%20%5Cequiv%20%5Cbeta_W%20W%20+%20U.%0A"> Since <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> is uncorrelated with <img src="https://latex.codecogs.com/png.latex?U">, as explained above, and since <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(U)%20=%200"> by construction, it follows that <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D%20=%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20U"> is a <em>bona fide</em> population linear regression model. If we regress <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BY%7D"> on <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> the slope coefficient will be <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X"> and the error term will be <img src="https://latex.codecogs.com/png.latex?U">. This regression corresponds to the <em>standard</em> FWL theorem. Notice that it has an intercept of <em>zero</em> and an error term that is <em>identical</em> to that of the long regression. We can verify this using our simulation experiment from above as follows:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Standard FWL has same residuals as long regression</span></span>
<span id="cb8-2">u_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resid</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> w))</span>
<span id="cb8-3">u_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resid</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb8-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all.equal</span>(u_hat, u_tilde)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Standard FWL has an intercept of zero (to machine precision!)</span></span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fit with intercept; check it's (numerically) 0</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> (Intercept) 
8.273433e-17 </code></pre>
</div>
</div>
<p>So what about <img src="https://latex.codecogs.com/png.latex?Y%20=%20%5Cdelta_0%20+%20%5Cbeta_X%20%5Ctilde%7BX%7D%20+%20V">? This is the regression that corresponds to the <em>alternative</em> FWL theorem. Since <img src="https://latex.codecogs.com/png.latex?V%20=%20%5Cbeta_W%20W%20+%20U"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> is uncorrelated with both <img src="https://latex.codecogs.com/png.latex?U"> and <img src="https://latex.codecogs.com/png.latex?W">, this too is a population regression. But unless <img src="https://latex.codecogs.com/png.latex?%5Cbeta_W%20=%200">, it has a <em>different error term</em>. In other words, <img src="https://latex.codecogs.com/png.latex?V%20%5Cneq%20U">. Moreover, this regression <em>includes an intercept</em> that is not in general zero. Again we can verify this using our simulation example from above:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Alternative FWL has different residuals than long regression</span></span>
<span id="cb12-2">v_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resid</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde))</span>
<span id="cb12-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all.equal</span>(u_hat, v_hat)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "Mean relative difference: 0.4905107"</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Alternative FWL has a non-zero intercept</span></span>
<span id="cb14-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x_tilde))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>(Intercept) 
  0.4878453 </code></pre>
</div>
</div>
</section>
<section id="the-punchline" class="level2">
<h2 class="anchored" data-anchor-id="the-punchline">The Punchline</h2>
<p>If your goal is <em>merely</em> to learn <img src="https://latex.codecogs.com/png.latex?%5Cbeta_X">, then either version of the FWL theorem will do the trick and the alternative version is <em>simpler</em> because it only involves one auxiliary regression instead of two. But if you want to ensure that you end up with the same <em>error term</em> as in the original long regression, then you need to use the <em>standard</em> version of the FWL theorem. This is crucial for the purposes of <em>inference</em> because the properties of the error term determine the standard errors of your estimates.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Fear not: we’ll return to the first claim soon!↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <guid>https://www.econometrics.blog/post/two-fwl-theorems-for-the-price-of-one/</guid>
  <pubDate>Thu, 14 Aug 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Econometrics Puzzler #2: Fitting a Regression with Fitted Values</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/econometrics-puzzler-2-fitting-a-regression-with-fitted-values/</link>
  <description><![CDATA[ 




<p>Suppose I run a simple linear regression of an outcome variable on a predictor variable. If I save the fitted values from this regression and then run a <em>second</em> regression of the outcome variable on the fitted values, what will I get? For extra credit: how will the R-squared from the second regression compare to that from the first regression?</p>
<section id="example-height-and-handspan" class="level2">
<h2 class="anchored" data-anchor-id="example-height-and-handspan">Example: Height and Handspan</h2>
<p>Here’s a simple example: a regression of height, measured in inches, on handspan, measured in centimeters.<sup>1</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(broom)</span>
<span id="cb1-3">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://ditraglia.com/data/height-handspan.csv'</span>)</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(dat, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> height, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> handspan)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Height (in)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Handspan (cm)"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/econometrics-puzzler-2-fitting-a-regression-with-fitted-values/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit the regression</span></span>
<span id="cb2-2">reg1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(height <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> handspan, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat)</span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(reg1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
1 (Intercept)    40.9     1.67        24.5 9.19e-76
2 handspan        1.27    0.0775      16.3 3.37e-44</code></pre>
</div>
</div>
<p>As expected, bigger people are bigger in all dimensions, on average, so we see a positive relationship between handspan and height. Now let’s save the fitted values from this regression and run a second regression of height on the fitted values:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> reg1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(dat)</span>
<span id="cb4-3">reg2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(height <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> .fitted, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat)</span>
<span id="cb4-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(reg2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
  term         estimate std.error statistic   p.value
  &lt;chr&gt;           &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;
1 (Intercept) -1.51e-13    4.17   -3.62e-14 1.000e+ 0
2 .fitted      1.00e+ 0    0.0612  1.63e+ 1 3.37 e-44</code></pre>
</div>
</div>
<p>The intercept isn’t <em>quite</em> zero, but it’s about as close as we can reasonably expect to get on a computer and the slope is <em>exactly</em> one. Now how about the R-squared? Let’s check:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(reg1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
# ℹ 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(reg2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
# ℹ 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
</div>
<p>The R-squared values from the two regressions are <em>identical</em>! Surprised? Now’s your last chance to think it through on your own before I give my solution.</p>
</section>
<section id="solution" class="level2">
<h2 class="anchored" data-anchor-id="solution">Solution</h2>
<p>Suppose we wanted to choose <img src="https://latex.codecogs.com/png.latex?%5Calpha_0"> and <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> to minimize <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bi=1%7D%5En%20(Y_i%20-%20%5Calpha_0%20-%20%5Calpha_1%20%5Cwidehat%7BY%7D_i)%5E2"> where <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7BY%7D_i%20=%20%5Cwidehat%7B%5Cbeta%7D_0%20+%20%5Cwidehat%7B%5Cbeta%7D_1%20X_i">. This is equivalent to minimizing <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bi=1%7D%5En%20%5Cleft%5BY_i%20-%20(%5Calpha_0%20+%20%5Calpha_1%20%5Cwidehat%7B%5Cbeta%7D_0)%20-%20(%5Calpha_1%5Cwidehat%7B%5Cbeta%7D_1)X_i%5Cright%5D%5E2.%0A"> By construction <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_0"> and <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_1"> minimize <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bi=1%7D%5En%20(Y_i%20-%20%5Cbeta_0%20-%20%20%5Cbeta_1%20X_i)%5E2">, so unless <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Calpha%7D_0%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Calpha%7D_1%20=%201"> we’d have a contradiction!</p>
<p>Similar reasoning explains why the R-squared values for the two regressions are the same. The R-squared of a regression equals <img src="https://latex.codecogs.com/png.latex?1%20-%20%5Ctext%7BSS%7D_%7B%5Ctext%7Bresidual%7D%7D%20/%20%5Ctext%7BSS%7D_%7B%5Ctext%7Btotal%7D%7D"> <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSS%7D_%7B%5Ctext%7Btotal%7D%7D%20=%20%5Csum_%7Bi=1%7D%5En%20(Y_i%20-%20%5Cbar%7BY%7D)%5E2,%5Cquad%0A%5Ctext%7BSS%7D_%7B%5Ctext%7Bresidual%7D%7D%20=%20%5Csum_%7Bi=1%7D%5En%20(Y_i%20-%20%5Cwidehat%7BY%7D_i)%5E2%0A"> The total sum of squares is the same for both regressions because they have the same outcome variable. The residual sum of squares is the same because <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Calpha%7D_0%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Calpha%7D_1%20=%201"> together imply that both regressions have the same fitted values.</p>
<p>Here I focused on the case of a simple linear regression, one with a single predictor variable, but the same basic idea holds in general.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>In case you don’t know what handspan is: stretch out your dominant hand, and measure from the tip of your thumb to the tip of your pinky finger. This is your handspan. I collected this dataset from many years of <a href="https://ditraglia.com/Econ103Public">Econ 103</a> classes at UPenn.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <guid>https://www.econometrics.blog/post/econometrics-puzzler-2-fitting-a-regression-with-fitted-values/</guid>
  <pubDate>Thu, 24 Jul 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Econometrics Puzzler #1: To Instrument or Not?</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/econometrics-puzzler-1-to-instrument-or-not/</link>
  <description><![CDATA[ 




<p>Welcome to the first installment of the <em>Econometrics Puzzler</em>, a new series of shorter posts that will test and strengthen your econometric intuition. Here’s the format: I’ll pose a question that requires only introductory econometrics knowledge, but has an unexpected answer. The idea is for you to ponder the question before reading my solution. Many of these questions are based on common misconceptions that come up year-after-year in my econometrics teaching. I hope you’ll find them both challenging and enlightening. Today we’ll revisit everyone’s favorite example: Angrist &amp; Krueger’s 1991 paper on the returns to education.<sup>1</sup></p>
<section id="to-instrument-or-not-to-instrument" class="level2">
<h2 class="anchored" data-anchor-id="to-instrument-or-not-to-instrument">To Instrument or Not to Instrument?</h2>
<p>Suppose I want to predict someone’s wage as accurately as possible using a linear model–that is, I want my predictions to be as close as they can be to the actual wages. (In fact we will predict the <em>log</em> of wage.) I observe a representative sample of workers that includes their log wage <img src="https://latex.codecogs.com/png.latex?Y_i"> and years of schooling <img src="https://latex.codecogs.com/png.latex?X_i">. I could use an OLS regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> to make my predictions, but years of schooling are the classic example of an endogenous regressor; they’re correlated with myriad unobserved causes of wages, like “ability” and family background. Fortunately, I also have a valid and relevant instrument: quarter of birth <img src="https://latex.codecogs.com/png.latex?Z_i"> is correlated with years of schooling and (supposedly) uncorrelated with unobserved causes of wage.<sup>2</sup></p>
<p>So here’s the question: <strong>to get the best possible predictions of wage from the information I have, should I run OLS or IV?</strong> More specifically, let’s use mean squared error (MSE) as our measure of “best”. To borrow a term from <a href="https://www.3blue1brown.com/">Grant Sanderson</a>, “pause and ponder” before reading further.</p>
</section>
<section id="taking-it-to-the-data" class="level2">
<h2 class="anchored" data-anchor-id="taking-it-to-the-data">Taking it to the Data</h2>
<p>The Angrist &amp; Krueger (1991) dataset is available from Michal Kolesár’s <a href="https://github.com/kolesarm/ManyIV?tab=readme-ov-file"><code>ManyIV</code> R package</a>.<sup>3</sup> Here I’ll restrict attention to people born in the first or fourth quarter of the year. The instrument is a dummy variable for being born in the fourth quarter, relative to being born in the first quarter:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remotes::install_github("kolesarm/ManyIV") # if needed</span></span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ManyIV) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Contains Angrist &amp; Krueger (1991) dataset</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For information about the dataset, see the package documentation:</span></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ?ManyIV::ak80</span></span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-9"></span>
<span id="cb1-10">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ak80 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(qob <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Q1'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Q4'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> (qob <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Q4'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> education, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> lwage, z)</span></code></pre></div></div>
</div>
<p>To test how well OLS and IV perform as predictors, we’ll carry out a “pseudo-out-of-sample” experiment. First we’ll randomly split <code>dat</code> into a “training” sample containing 80% of the observations and a “test” sample containing the remaining 20%:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1693</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For reproducibility</span></span>
<span id="cb2-2"></span>
<span id="cb2-3">n_total <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(dat) </span>
<span id="cb2-4">n_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n_total) </span>
<span id="cb2-5">n_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> n_total <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n_train </span>
<span id="cb2-6"></span>
<span id="cb2-7">train_indices <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(n_total, n_train, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) </span>
<span id="cb2-8"></span>
<span id="cb2-9">dat_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat[train_indices, ] </span>
<span id="cb2-10">dat_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>train_indices, ] </span></code></pre></div></div>
</div>
<p>Now we’ll use <code>dat_train</code> to fit IV and OLS:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">ols_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat_train) </span>
<span id="cb3-2">ols_coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(ols_fit)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ivreg) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install with `install.packages("ivreg")` if needed </span></span>
<span id="cb3-5">iv_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ivreg</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> z, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dat_train)</span>
<span id="cb3-6">iv_coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(iv_fit)</span>
<span id="cb3-7"></span>
<span id="cb3-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OLS =</span> ols_coefs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">IV =</span> iv_coefs)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>    (Intercept)          x
OLS    5.004283 0.07008633
IV     4.749959 0.09000644</code></pre>
</div>
</div>
<p>Now we’re ready to make our predictive comparison! We’ll “pretend” that we don’t know the wages of the people in our test sample and use the OLS and IV coefficients from above to predict the “missing” wages:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">dat_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat_test <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ols_pred =</span> ols_coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ols_coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x,</span>
<span id="cb5-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iv_pred =</span> iv_coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> iv_coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x) </span></code></pre></div></div>
</div>
<p>Of course we actually <em>do</em> know the wages of everyone in <code>dat_test</code>; this is the column <code>y</code>. So we can now compare our predictions against the truth.<sup>4</sup> A common measure of predictive quality is mean squared error (MSE), the average squared difference between the truth and our predictions. Because it squares the difference between the truth and our prediction, MSE penalizes larger errors more than smaller ones. While there are other ways to measure prediction error, MSE is a common choice and one that will play a key role in the rest of this post. And the winner is … <strong>OLS</strong>! Because it has a lower MSE, the predictions from the OLS model are, on average, closer to the true wages than the predictions from the IV model:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">dat_test <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ols_mse =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>((y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ols_pred)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb6-3">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iv_mse =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>((y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> iv_pred)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 2
  ols_mse iv_mse
    &lt;dbl&gt;  &lt;dbl&gt;
1   0.407  0.411</code></pre>
</div>
</div>
<p>OLS beats IV by a small but appreciable margin. (The relatively small difference in this case reflects the fact that IV and OLS estimates are fairly similar in this example.) It turns out that this <em>isn’t a fluke</em>. The same will be true in <em>any example</em>. Unless the instrument is perfectly correlated with the endogenous regressor, OLS will always have a lower predictive MSE than IV.</p>
</section>
<section id="whats-really-going-on-here" class="level2">
<h2 class="anchored" data-anchor-id="whats-really-going-on-here">What’s really going on here?</h2>
<p>I ask this question of my introductory econometric students every year and most of them are surprised by the answer. If we have an endogenous regressor OLS is biased and inconsistent; why would we ever pass up the opportunity to use a valid and relevant instrument! The answer is surprisingly simple: <em>by definition</em> the OLS estimand gives the best linear predictor of <img src="https://latex.codecogs.com/png.latex?Y">, the one that minimizes MSE: <img src="https://latex.codecogs.com/png.latex?%5Cmin_%7Ba,b%7D%20%5Cmathbb%7BE%7D%5B%5C%7BY%20-%20(a%20+%20b%20X)%5C%7D%5E2%5D">. This is true <em>regardless</em> of whether <img src="https://latex.codecogs.com/png.latex?X"> is endogenous. Indeed, from a predictive perspective, endogeneity is a feature not a bug! The fact that years of schooling “smuggles in” information about ability and family background is exactly why it gives better predictions than IV. Remember: the whole point of IV is to <em>remove</em> the part of <img src="https://latex.codecogs.com/png.latex?X"> that is related to unobserved causes of <img src="https://latex.codecogs.com/png.latex?Y">. This is exactly what we want if our goal is to understand cause-and-effect, but it’s the <em>opposite</em> of what would make sense in a prediction problem, where we’d like to use as much information as possible.</p>
</section>
<section id="a-red-herring-the-bias-variance-tradeoff" class="level2">
<h2 class="anchored" data-anchor-id="a-red-herring-the-bias-variance-tradeoff">A Red Herring: The Bias-Variance Tradeoff</h2>
<p>Students sometimes answer this question by invoking the <a href="https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff">bias-variance tradeoff</a>, pointing out that “OLS is biased but has a lower variance than IV, so it could have a lower MSE.” This is correct, but misses the deeper point. They’re thinking about bias in estimating the <em>causal parameter</em>.<sup>5</sup> But, again, the point here is that this isn’t relevant when prediction is our goal. When ML researchers discuss the bias-variance tradeoff in predictive settings, they mean something entirely different: bias of a linear predictive model relative to the true conditional mean function. OLS gives the best linear approximation to <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BY%7CX%5D">, so it’s what we want in this example, since I stipulated we’d be working with linear models.</p>
</section>
<section id="take-home-message" class="level2">
<h2 class="anchored" data-anchor-id="take-home-message">Take Home Message</h2>
<p>Causal inference and prediction are different goals. Causality is about <em>counterfactuals</em>: what would happen if we <em>intervened</em> to change someone’s years of education? Prediction answers a different question: if I <em>observe</em> that someone has eight years of schooling, what is my best guess of their wage? If you want to predict, use OLS; if you want to estimate a causal effect, use IV.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I’m sick of this example too, but the point of this puzzler is to get you thinking about instrumental variables; using an example that most people know will get us to the punch line faster.↩︎</p></li>
<li id="fn2"><p>If you’re unfamiliar with this example check out my <a href="https://youtu.be/NeAkMcgdWxA?si=XHsvGG5aPMNvMUfs&amp;t=2034">video overview</a>, including some discussion of why quarter of birth might <em>not</em> really be exogenous after all!↩︎</p></li>
<li id="fn3"><p>You can install this package using the <a href="https://cran.r-project.org/web/packages/remotes/index.html"><code>remotes</code></a> package, which is a convenient way to install packages from GitHub.↩︎</p></li>
<li id="fn4"><p>It’s crucial that we used one dataset to <em>estimate</em> our models and a <em>different</em> one to evaluate their predictive performance to avoid a problem called “overfitting”. This issue calls for a post of its own, but if you want a preview check out this blog post on <a href="https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart.html">Goodhart’s law</a>.↩︎</p></li>
<li id="fn5"><p>When our goal is to learn the causal parameter, this bias-variance tradeoff becomes relevant. I even <a href="https://ideas.repec.org/a/eee/econom/v195y2016i2p187-208.html">wrote a paper</a>!↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <category>causal inference</category>
  <guid>https://www.econometrics.blog/post/econometrics-puzzler-1-to-instrument-or-not/</guid>
  <pubDate>Sun, 13 Jul 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Not Quite the James-Stein Estimator</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/</link>
  <description><![CDATA[ 




<p>If you study enough econometrics or statistics, you’ll eventually hear someone mention “Stein’s Paradox” or the <a href="https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator">“James-Stein Estimator”</a>. You’ve probably learned in your introductory econometrics course that ordinary least squares (OLS) is the <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem">best linear unbiased estimator</a> (BLUE) in a linear regression model under the Gauss-Markov assumptions. The stipulations “linear” and “unbiased” are crucial here. If we remove them, it’s possible to do better–maybe even <em>much better</em>–than OLS.<sup>1</sup> Stein’s paradox is a famous example of this phenomenon, one that created much consternation among statisticians and fellow-travelers when it was first pointed out by <a href="https://en.wikipedia.org/wiki/Charles_M._Stein">Charles Stein</a> in the mid-1950s. The example is interesting in its own right, but also has deep connections to ideas in Bayesian inference and machine learning making it much more than a mere curiosity.</p>
<p>The supposed <a href="https://youtu.be/XXhJKzI1u48?si=cS--uLd09_JnAXdr">paradox</a> is most simply stated by considering a special case of linear regression–that of estimating multiple unknown means. <a href="https://www.jstor.org/stable/24954030">Efron &amp; Morris (1977)</a> introduce the basic idea as follows:</p>
<blockquote class="blockquote">
<p>A baseball player who gets seven hits in 20 official times at bat is said to have a batting average of .350. In computing this statistic we are forming an estimate of the player’s true batting ability in terms of his observed average rate of success. Asked how well the player will do in his next 100 times at bat, we would probably predict 35 more hits. In traditional statistical theory it can be proved that no other estimation rule is uniformly better than the observed average. The paradoxical element in Stein’s result is that it sometimes contradicts this elementary law of statistical theory. If we have three or more baseball players, and if we are interested in predicting future batting averages for each of them, then there is a procedure that is better than simply extrapolating from the three separate averages. Here “better” has a strong meaning. The statistician who employs Stein’s method can expect to predict the future averages more accurately no matter what the true batting abilities of the players may be.</p>
</blockquote>
<p>I first encountered Stein’s Paradox in an offhand remark by my PhD supervisor. I dutifully looked it up in an attempt to better understand the point he had been making, but lacked sufficient understanding of decision theory at the time to see what the fuss was all about. The second time I encountered it, after I knew a bit more, it seemed astounding: almost like magic. I decided to include the topic in my <a href="https://ditraglia.com/econ722">Econ 722</a> course at Penn, but struggled to make it accessible to my students. A big problem, in my view, is that the proof–see <a href="https://ditraglia.com/econ722/slides/econ722slides.pdf">lecture 1</a> or <a href="https://ditraglia.com/econ722/main.pdf">section 7.3</a>–is ultimately a bit of a let-down: algebra, followed by repeated integration by parts, and then a fact about the existence of moments for an <a href="https://en.wikipedia.org/wiki/Inverse-chi-squared_distribution">inverse-chi-squared random variable</a>. It seems like a sterile technical exercise when in fact that result itself is deep, surprising, and important. As if a benign deity were keen on making my point for me, the wikipedia article on the <a href="https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator">James-Stein Estimator</a> is flagged as “may be too technical for readers to understand” at the time of this writing!</p>
<p>After six months of pondering, this post is my attempt to explain the James-Stein Estimator in a way that is accessible to a broad audience. The assumed background is minimal: just an introductory course in probability and statistics. I’ll show how we can arrive at something that is <em>very nearly</em> the James-Stein estimator by following some very simple and natural intuition. After you understand my “not quite James-Stein” estimator, it’s a short step to the real thing. So the “let-down” proof I mentioned before becomes merely a technical justification for a slight modification of a formula that is already intuitively compelling. As far as possible, I’ve tried to keep this post self-contained by introducing, or at least reviewing, key background material as we go along. The cost of this approach, unfortunately, is that the post is pretty long! I hope you’ll soldier on to the end and that you’ll find the payoff worth your time and effort.</p>
<p>As far as I know, the precise way that I motivate the James-Stein estimator in this post is new, but there are many other papers that aim to make sense of the supposed paradox in an intuitive way. In keeping with my injunction that you should always consider <a href="../../post/how-to-read-an-econometrics-paper/">reading something else instead</a>, here are a few references that you may find helpful. <a href="https://www.jstor.org/stable/24954030">Efron &amp; Morris (1977)</a> is a classic article aimed at the general reader without a background in statistics. <a href="https://projecteuclid.org/journals/statistical-science/volume-5/issue-1/The-1988-Neyman-Memorial-Lecture--A-Galtonian-Perspective-on/10.1214/ss/1177012274.full">Stigler (1988)</a> is a more technical but still accessible discussion of the topic while <a href="https://www.jstor.org/stable/2682801">Casella (1985)</a> is a very readable paper that discusses the James-Stein estimator in the context of empirical Bayes. A less well-known paper that I found helpful is <a href="https://www.jstor.org/stable/2490394">Ijiri &amp; Leitch (1980)</a>, who consider the James-Stein estimator in a real-world setting, namely “Audit Sampling” in accounting. They discuss several interesting practical and philosophical issues including the distinction between “composite” and “individual” risk that I’ll pick up on below.</p>
<section id="warm-up-exercise" class="level2">
<h2 class="anchored" data-anchor-id="warm-up-exercise">Warm-up Exercise</h2>
<p>This section provides some important background that we’ll need to understand Stein’s Paradox later in the post reviewing the ideas of <strong>bias</strong>, <strong>variance</strong> and <strong>mean-squared error</strong> along with introducing a very simple <strong>shrinkage estimator</strong>. To make these ideas as transparent as possible we’ll start with a ridiculously simple problem. Suppose that you observe <img src="https://latex.codecogs.com/png.latex?X%20%5Csim%20%5Ctext%7BNormal%7D(%5Cmu,%201)">, a single draw from a normal distribution with variance one and unknown mean <img src="https://latex.codecogs.com/png.latex?%5Cmu">. Your task is to estimate <img src="https://latex.codecogs.com/png.latex?%5Cmu">. This may strike you as a very silly problem: it only involves a single datapoint and we assume the variance of <img src="https://latex.codecogs.com/png.latex?X"> is one! But in fact there’s nothing special about <img src="https://latex.codecogs.com/png.latex?n%20=%201"> and a variance of one: these merely make the notation simpler. If you prefer, you can think of <img src="https://latex.codecogs.com/png.latex?X"> as the sample mean of <img src="https://latex.codecogs.com/png.latex?n"> iid draws from a population with unknown mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> where we’ve <em>rescaled</em> everything to have variance one. So how should we estimate <img src="https://latex.codecogs.com/png.latex?%5Cmu">? A natural and reasonable idea is to use the sample mean, in this case <img src="https://latex.codecogs.com/png.latex?X"> itself. This is in fact the <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood estimator</a> for <img src="https://latex.codecogs.com/png.latex?%5Cmu">, so I’ll define <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D_%7B%5Ctext%7BML%7D%7D%20=%20X">. But is this estimator any good? And can we find something better?</p>
<section id="review-of-bias-variance-and-mse" class="level3">
<h3 class="anchored" data-anchor-id="review-of-bias-variance-and-mse">Review of Bias, Variance and MSE</h3>
<p>The concepts of <em>bias</em> and <em>variance</em> are key ideas that we typically reach for when considering the quality of an estimator. To refresh your memory, <em>bias</em> is the difference between an estimators expected value and the true value of the parameter being estimated while <em>variance</em> is the expected squared difference between an estimator and its expected value. So if <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> is an estimator of some unknown parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, then <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BBias%7D(%5Chat%7B%5Ctheta%7D)%20=%20%5Cmathbb%7BE%7D%5B%5Chat%7B%5Ctheta%7D%5D%20-%20%5Ctheta"> while <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Chat%7B%5Ctheta%7D)%20=%20%5Cmathbb%7BE%7D%5B(%5Chat%7B%5Ctheta%7D%20-%20%5Cmathbb%7BE%7D%5B%5Chat%7B%5Ctheta%7D%5D)%5E2%5D">. A bias of zero means that an estimator is <em>correctly centered</em>: its expectation equals the truth. We say that such an estimator is <em>unbiased</em>.<sup>2</sup> A small variance means that an estimator is <em>precise</em>: it doesn’t “jump around” too much. Ideally we’d like an estimator that is correctly centered and precise. But it turns out that there is generally a <em>trade-off</em> between bias and variance: if you want to reduce one of them, you have to accept an increase in the other.</p>
<p>A common way of trading off bias and variance relies on a concept called <em>mean-squared error</em> (MSE) defined as the <em>sum</em> of the squared bias and the variance.<sup>3</sup> In particular: <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D(%5Chat%7B%5Ctheta%7D)%20=%20%5Ctext%7BVar%7D(%5Chat%7B%5Ctheta%7D)%20+%20%5Ctext%7BBias%7D(%5Chat%7B%5Ctheta%7D)%5E2">. Equivalently, we can write <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D(%5Chat%7B%5Ctheta%7D)%20=%20%5Cmathbb%7BE%7D%5B(%5Chat%7B%5Ctheta%7D%20-%20%5Ctheta)%5E2%5D">.<sup>4</sup> To borrow some terminology from introductory microeconomics, you can think of MSE as the <em>negative</em> of a utility function over bias and variance. Both bias and variance are “bads” in that we’d rather have less rather than more of each. This formula expresses our <em>preferences</em> in terms of how much of one we’d be willing to accept in exchange for less of the other. Slightly foreshadowing something that will come later in this post, we can think of MSE as the square of the average distance that an archer’s arrows land from the bulls-eye. Smaller values of MSE are better: variance measures how closely the arrows cluster together while bias measures how far the center of the cluster is from the bulls-eye, as in the following diagram:</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="a-shrinkage-estimator" class="level3">
<h3 class="anchored" data-anchor-id="a-shrinkage-estimator">A Shrinkage Estimator</h3>
<p>Returning to our maximum likelihood estimator: it’s unbiased, <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BBias%7D(%5Chat%7B%5Cmu%7D_%7B%5Ctext%7BML%7D%7D)%20=%200">, so <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D(%5Chat%7B%5Cmu%7D_%7B%5Ctext%7BML%7D%7D)%20=%20%5Ctext%7BVar%7D(%5Chat%7B%5Cmu%7D_%7B%5Ctext%7BML%7D%7D)%20=%201">. Suppose that low MSE is what we’re after. Is there any way to improve on the ML estimator? In other words, can we achieve an MSE that’s lower than one? The answer turns out to be <em>yes</em>. Here’s the idea. Suppose we had some reason to believe that the true mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> isn’t very large. Then perhaps we could try to adjust our maximum likelihood estimate by <em>shrinking</em> slightly towards zero. One way to do this would be by taking a weighted average of the ML estimator and zero: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D(%5Clambda)%20=%20(1%20-%20%5Clambda)%20%5Ctimes%20%5Chat%7B%5Cmu%7D_%7B%5Ctext%7BML%7D%7D%20+%20%5Clambda%20%5Ctimes%200%20=%20(1%20-%20%5Clambda)X%0A"> for <img src="https://latex.codecogs.com/png.latex?0%20%5Cleq%20%5Clambda%20%5Cleq%201">. The constant <img src="https://latex.codecogs.com/png.latex?(1%20-%20%5Clambda)"> is called the “shrinkage factor” and controls how the ML estimator gets pulled towards zero.<sup>5</sup> We get a different estimator for every value of <img src="https://latex.codecogs.com/png.latex?%5Clambda">. If <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%200"> then we get the ML estimator back. If <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%201"> then we get a very silly estimator that ignores the data and simply reports zero no matter what! So let’s see how the MSE depends on our choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda">. Substituting the definition of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D(%5Clambda)"> into the formulas for bias and variance gives: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BBias%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D&amp;=%20%5Cmathbb%7BE%7D%5B(1%20-%20%5Clambda)%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D%20-%20%5Cmu%20=%20(1%20-%20%5Clambda)%5Cmathbb%7BE%7D%5B%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D%20-%20%5Cmu%20=%20(1%20-%20%5Clambda)%5Cmu%20-%20%5Cmu%20=%20-%5Clambda%5Cmu%5C%5C%20%5C%5C%0A%5Ctext%7BVar%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D&amp;=%20%5Ctext%7BVar%7D%5B(1%20-%20%5Clambda)%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D%20=%20(1%20-%20%5Clambda)%5E2%5Ctext%7BVar%7D%5B%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D%20=%20(1%20-%20%5Clambda)%5E2%5C%5C%20%5C%5C%0A%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D&amp;=%20%5Ctext%7BVar%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D%20+%20%5Ctext%7BBias%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D%5E2%20=%20(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%5Cmu%5E2%0A%5Cend%7Baligned%7D%0A"> Unless <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%200">, the shrinkage estimator is <em>biased</em>. And while the MSE of the ML estimator is always one, regardless of the true value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">, the MSE of the shrinkage estimator <em>depends on the unknown parameter</em> <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
<p>So why should we use a biased estimator? The answer is that by tolerating a small amount of bias we may be able to achieve a <em>larger</em> reduction in variance, resulting in a lower MSE compared to the higher variance but unbiased ML estimator. A quick plot shows us that the shrinkage estimator <em>can indeed</em> have a lower MSE than the ML estimator depending on the value of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> and the true value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Range of values for the unknown parameter mu</span></span>
<span id="cb1-2">mu <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Try three different values of lambda</span></span>
<span id="cb1-4">lambda1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb1-5">lambda2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span></span>
<span id="cb1-6">lambda3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot the MSE of the shrinkage estimator as a function of mu for all </span></span>
<span id="cb1-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># three values of lambda at once</span></span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matplot</span>(mu, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> lambda1)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> lambda1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, </span>
<span id="cb1-10">                  (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> lambda2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> lambda2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, </span>
<span id="cb1-11">                  (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> lambda3)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> lambda3<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), </span>
<span id="cb1-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'l'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, </span>
<span id="cb1-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>), </span>
<span id="cb1-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'MSE'</span>, </span>
<span id="cb1-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'MSE of Shrinkage Estimator'</span>)</span>
<span id="cb1-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add legend</span></span>
<span id="cb1-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">legend</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'topright'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>), </span>
<span id="cb1-18">                              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), </span>
<span id="cb1-19">                              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)), </span>
<span id="cb1-20">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'green'</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add dashed line for MSE of ML estimator</span></span>
<span id="cb1-22"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="some-algebra" class="level3">
<h3 class="anchored" data-anchor-id="some-algebra">Some Algebra</h3>
<p>It’s time for some algebra. If you’re tempted to skip this <em>please don’t</em>: this section is a warm-up for our main event. If you thoroughly understand the mechanics of shrinkage in this simple example, everything that follows below will seem much more natural.</p>
<p>As seen from the plot above, the MSE of our shrinkage estimator (the solid lines) is lower than that of the ML estimator (the dashed line) provided that our chosen value of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> isn’t too large relative to the true value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">. With a bit of algebra, we can work out <em>precisely</em> how large <img src="https://latex.codecogs.com/png.latex?%5Clambda"> can be to make shrinkage worthwhile. Since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D=%201">, by expanding and simplifying the expression for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D"> we see that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D%20%3C%20%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_%5Ctext%7BML%7D%5D"> if and only if <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%5Cmu%5E2%20&amp;%3C%201%20%5C%5C%0A1%20-%202%5Clambda%20+%20%5Clambda%5E2%20+%20%5Clambda%5E2%5Cmu%5E2%20&amp;%3C%201%20%5C%5C%0A%5Clambda%5E2%20(1%20+%20%5Cmu%5E2)%20-2%20%5Clambda%20&amp;%3C%200%20%5C%5C%0A%5Clambda%20%5B%5Clambda%20(1%20+%20%5Cmu%5E2)%20-%202%5D%20&amp;%3C%200.%0A%5Cend%7Baligned%7D%0A"> Since <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%5Cgeq%200">, the final inequality can only hold if the factor inside the square brackets is negative, i.e.&nbsp; <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clambda%20(1%20+%20%5Cmu%5E2)%20-%202%20&amp;%3C%200%20%5C%5C%0A%5Clambda%20&amp;%3C%20%5Cfrac%7B2%7D%7B1%20+%20%5Cmu%5E2%7D.%0A%5Cend%7Baligned%7D%0A"> This shows that any choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> between <img src="https://latex.codecogs.com/png.latex?0"> and <img src="https://latex.codecogs.com/png.latex?2%20/%20(1%20+%20%5Cmu%5E2)"> will give us a shrinkage estimator with an MSE less than one. To check our algebra, we can change the inequality to an equality and solve for <img src="https://latex.codecogs.com/png.latex?%5Cmu"> to obtain the boundary of the region where shrinkage is better than ML: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clambda%20(1%20+%20%5Cmu%5E2)%20-%202%20&amp;=%200%20%5C%5C%0A1%20+%20%5Cmu%5E2%20&amp;=%202/%5Clambda%20%5C%5C%0A%5Cmu%20&amp;=%20%5Cpm%20%5Csqrt%7B2/%5Clambda%20-%201%7D.%0A%5Cend%7Baligned%7D%0A"> Adding these boundaries to a simplified version of our previous plot with only <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%200.3"> we see that everything works out correctly: the dashed red lines intersect the blue curve at the points where the MSE of the shrinkage estimator equals that of the ML estimator.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot the MSE of the shrinkage estimator as a function of mu for lambda = 0.3</span></span>
<span id="cb2-2">lambda <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(mu, (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> lambda)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> lambda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'l'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, </span>
<span id="cb2-4">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'MSE'</span>, </span>
<span id="cb2-5">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Boundary of Region Where Shrinkage is Better than ML'</span>)</span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add dashed line for MSE of ML estimator</span></span>
<span id="cb2-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add boundaries of region where shrinkage is better than ML estimator</span></span>
<span id="cb2-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb2-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>But there’s still more to learn! Suppose we wanted to take things <em>one step further</em> and find the <em>optimal</em> value of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> for any given value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">. In other words, suppose we wanted the value of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> that <em>minimizes</em> the MSE of our shrinkage estimator given a particular assumed value for <img src="https://latex.codecogs.com/png.latex?%5Cmu">. Since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D"> is a quadratic function of <img src="https://latex.codecogs.com/png.latex?%5Clambda">, as shown above, this turns out to be a fairly straightforward calculation. Differentiating, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cfrac%7Bd%7D%7Bd%5Clambda%7D%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D%20&amp;=%20%5Cfrac%7Bd%7D%7Bd%5Clambda%7D%5B(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%20%5Cmu%5E2%5D%20%5C%5C%0A&amp;=%20-2(1%20-%20%5Clambda)%20+%202%5Clambda%20%5Cmu%5E2%20%5C%5C%0A&amp;=%202%20%5B%5Clambda%20(1%20+%20%5Cmu%5E2)%20-%201%5D%5C%5C%20%5C%5C%0A%5Cfrac%7Bd%5E2%7D%7Bd%5Clambda%5E2%7D%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D%20&amp;=%202(1%20+%20%5Cmu%5E2)%20%3E%200%0A%5Cend%7Baligned%7D%0A"> so there is a unique global minimum at <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*%20%5Cequiv%201/(1%20+%20%5Cmu%5E2)">. This gives the <em>optimal</em> shrinkage factor in the sense that it minimizes the MSE of the shrinkage estimator. Substituting <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*"> into the expression for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda)%5D"> gives: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D(%5Clambda%5E*)%5D%20&amp;=%20%5Cleft(1%20-%20%5Cfrac%7B1%7D%7B1%20+%20%5Cmu%5E2%7D%20%5Cright)%5E2%20+%20%5Cleft(%5Cfrac%7B1%7D%7B1%20+%20%5Cmu%5E2%7D%5Cright)%5E2%20%5Cmu%5E2%20%20%5C%5C%0A&amp;=%20%5Cleft(%20%5Cfrac%7B%5Cmu%5E2%7D%7B1%20+%20%5Cmu%5E2%7D%5Cright)%5E2%20+%20%5Cleft(%5Cfrac%7B1%7D%7B1%20+%20%5Cmu%5E2%7D%5Cright)%5E2%20%5Cmu%5E2%20%5C%5C%0A&amp;=%20%5Cleft(%20%5Cfrac%7B1%7D%7B1%20+%20%5Cmu%5E2%7D%5Cright)%5E2%20(%5Cmu%5E4%20+%20%5Cmu%5E2)%20%5C%5C%0A&amp;=%20%5Cleft(%20%5Cfrac%7B1%7D%7B1%20+%20%5Cmu%5E2%7D%5Cright)%5E2%20%5Cmu%5E2(1%20+%20%5Cmu%5E2)%20%5C%5C%0A&amp;=%20%5Cfrac%7B%5Cmu%5E2%7D%7B1%20+%20%5Cmu%5E2%7D%20%3C%201.%0A%5Cend%7Baligned%7D%0A"></p>
</section>
</section>
<section id="steins-paradox" class="level2">
<h2 class="anchored" data-anchor-id="steins-paradox">Stein’s Paradox</h2>
<section id="recap" class="level3">
<h3 class="anchored" data-anchor-id="recap">Recap</h3>
<p>We’re moments away from having all the ingredients we need to introduce Stein’s Paradox! But first let’s review what we’ve uncovered thus far. We’ve seen that the shrinkage estimator can improve on the ML estimator in terms of MSE provided that <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is chosen judiciously: it needs to be between zero and <img src="https://latex.codecogs.com/png.latex?2/(1%20+%20%5Cmu%5E2)">. The optimal choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda">, namely <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*%20=%201%20/%20(1%20+%20%5Cmu%5E2)">, gives an MSE of <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2/(1%20+%20%5Cmu%5E2)">. This is always lower than one, the MSE of the ML estimator.</p>
<p>There’s just one massive problem we’ve ignored this whole time: <strong>we don’t know the value of</strong> <img src="https://latex.codecogs.com/png.latex?%5Cmu">! As seen from the figure plotted above, the MSE curves for different values of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> <em>cross each other</em>: the best one to use depends on the true value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">. This doesn’t mean that all is lost. Perhaps in practice we have some outside information about the likely value of <img src="https://latex.codecogs.com/png.latex?%5Cmu"> that could help guide our choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda">. What it does mean is that there’s no “one-size-fits-all” value.</p>
</section>
<section id="admissibility" class="level3">
<h3 class="anchored" data-anchor-id="admissibility">Admissibility</h3>
<p>It’s time to introduce a bit of technical vocabulary. We say that an estimator <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5Ctheta%7D"> <strong>dominates</strong> another estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> if <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Ctilde%7B%5Ctheta%7D%5D%20%5Cleq%20%5Ctext%7BMSE%7D%5B%5Chat%7B%5Ctheta%7D%5D"> for <em>all</em> possible values of the parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> being estimated and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BMSE%7D%5B%5Ctilde%7B%5Ctheta%7D%5D%20%3C%20%5Ctext%7BMSE%7D%5B%5Chat%7B%5Ctheta%7D%5D"> for at least <em>one</em> possible value of <img src="https://latex.codecogs.com/png.latex?%5Ctheta">.<sup>6</sup> In words, this means that it never makes sense to use <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> in preference to <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5Ctheta%7D">. No matter what the true parameter value is, you can’t do worse with <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5Ctheta%7D"> and you might do better. An estimator that is <em>not dominated</em> by any other estimator is called <strong>admissible</strong>; an estimator that <em>is dominated</em> by some other estimator is called <strong>inadmissible</strong>. The concept of <em>admissibility</em> in decision theory is a bit like the concept of <a href="https://en.wikipedia.org/wiki/Pareto_efficiency">Pareto efficiency</a> in microeconomics. An admissible estimator is only “good” in the sense that it doesn’t leave any money on the table: there’s no way to do better for one parameter value without doing worse for another. In a similar way, a Pareto efficient allocation in economics is one in which no individual can be made better off without making another person worse off.</p>
<p>It’s quite challenging to prove, but in fact the ML estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D_%7BML%7D%20=%20X"> turns out to be admissible in our little example. So while we could potentially do better by using shrinkage, it’s not a slam-dunk case. If we really have no idea of how large <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is likely to be, the ML estimator is a reasonable choice. Because it’s admissible, at the very least we know that there’s no free lunch!</p>
</section>
<section id="a-more-general-example" class="level3">
<h3 class="anchored" data-anchor-id="a-more-general-example">A More General Example</h3>
<p>Now let’s make things a bit more interesting. For the rest of this post, suppose that we observe not a single draw <img src="https://latex.codecogs.com/png.latex?X"> from a <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BNormal%7D(%5Cmu,%201)"> distribution but a <em>collection</em> of <img src="https://latex.codecogs.com/png.latex?p"> independent draws from <img src="https://latex.codecogs.com/png.latex?p"> <em>different</em> normal distributions: <img src="https://latex.codecogs.com/png.latex?%0AX_1,%20X_2,%20...,%20X_p%20%5Csim%20%5Ctext%7Bindependent%20Normal%7D(%5Cmu_j,%201),%20%5Cquad%20j%20=%201,%20...,%20p.%0A"> You can think of this as <img src="https://latex.codecogs.com/png.latex?p"> copies of our original problem: we observe <img src="https://latex.codecogs.com/png.latex?X_j%20%5Csim%20%5Ctext%7BNormal%7D(%5Cmu_j,%201)"> and our task is to estimate <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. The observations are all independent, and each comes from a distribution with a potentially <strong>different mean</strong>. At first glance it seems like these <img src="https://latex.codecogs.com/png.latex?p"> separate problems should have <em>absolutely nothing to do with each other</em>. And indeed the maximum likelihood estimator for the collection of <img src="https://latex.codecogs.com/png.latex?p"> means is simply <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BML%7D%20=%20X_j">. As above in our example with <img src="https://latex.codecogs.com/png.latex?p=1">, the question is: how good is the ML estimator, and can we do any better?</p>
</section>
<section id="composite-mse" class="level3">
<h3 class="anchored" data-anchor-id="composite-mse">Composite MSE</h3>
<p>But first things first: how can we evaluate the quality of <img src="https://latex.codecogs.com/png.latex?p"> estimators for <img src="https://latex.codecogs.com/png.latex?p"> different parameters <em>at the same time</em>? A common approach, and the one we will follow here, is to take the <em>sum</em> of the individual MSEs of each estimator, yielding a quantity called <strong>composite MSE</strong>. If <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D_1,%20%5Chat%7B%5Cmu%7D_2,%20%5Cdots,%20%5Chat%7B%5Cmu%7D_p"> is a collection of estimators for each of the individual unknown means, then the composite MSE is defined as <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BComposite%20MSE%7D%20%5Cequiv%20%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D(%5Chat%7B%5Cmu%7D_j)%20=%20%5Csum_%7Bj=1%7D%5Ep%20%5Cleft%5B%20%5Ctext%7BBias%7D(%5Chat%7B%5Cmu%7D_j)%5E2%20+%20%5Ctext%7BVar%7D(%5Chat%7B%5Cmu%7D_j)%5Cright%5D%20=%20%5Csum_%7Bj=1%7D%5Ep%20%5Cmathbb%7BE%7D%5B(%5Chat%7B%5Cmu%7D_j%20-%20%5Cmu_j)%5E2%5D.%0A"> Adopting composite MSE as our measure of <em>good</em> performance means that we view each of the <img src="https://latex.codecogs.com/png.latex?p"> estimation problems as in some way “interchangeable”–we’re happy to accept a trade in which we do a slightly worse job estimating <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> in exchange for doing a much better job estimating <img src="https://latex.codecogs.com/png.latex?%5Cmu_k">. At the end of the post I’ll say a few more words about this idea and when it may or may not be reasonable. But for the rest of the post, we will assume that our goal is to <strong>minimize the composite MSE</strong>. The concept of composite MSE will be crucial in understanding why the James-Stein estimator works the way it does.</p>
</section>
<section id="steins-paradox-1" class="level3">
<h3 class="anchored" data-anchor-id="steins-paradox-1">Stein’s Paradox</h3>
<p>Putting our new idea into practice, we see that the composite MSE of the ML estimator is <img src="https://latex.codecogs.com/png.latex?p"> regardless of the true values of the individual means <img src="https://latex.codecogs.com/png.latex?%5Cmu_1,%20%5Cdots,%20%5Cmu_p"> since <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D%5Cleft%5B%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BML%7D%5Cright%5D%20=%20%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D(X_j)%20=%20%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BVar%7D(X_j)%20=%20p.%0A"> If the ML estimator is admissible, then there should be no other estimator that always has an MSE less than or equal to <img src="https://latex.codecogs.com/png.latex?p"> and sometimes has an MSE strictly less than <img src="https://latex.codecogs.com/png.latex?p">. I’ve already told you that this is true when <img src="https://latex.codecogs.com/png.latex?p%20=%201">. When <img src="https://latex.codecogs.com/png.latex?p%20=%202"> it’s still true: the ML estimator remains admissible. But when <img src="https://latex.codecogs.com/png.latex?p%20%5Cgeq%203"> something very unexpected happens: it becomes possible to construct an estimator that <strong>dominates</strong> the ML estimator by using information from <em>all</em> of the <img src="https://latex.codecogs.com/png.latex?(X_1,%20...,%20X_p)"> observations to estimate <img src="https://latex.codecogs.com/png.latex?%5Cmu_j">. This is spite of the fact that there is <em>no obvious connection</em> between the observations. Again: they are all independent and come from distributions with different means!</p>
<p>The estimator that does the trick is the so-called “James-Stein Estimator” (JS), defined according to <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BJS%7D%20=%20%5Cleft(1%20-%20%5Cfrac%7Bp%20-%202%7D%7B%5Csum_%7Bk=1%7D%5Ep%20X_k%5E2%7D%5Cright)X_j.%0A"> This estimator dominates the ML estimator when <img src="https://latex.codecogs.com/png.latex?p%20%5Cgeq%203"> in that<br>
<img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D%5Cleft%5B%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BJS%7D%5Cright%5D%20%5Cleq%20%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D%5Cleft%5B%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BML%7D%5Cright%5D=%20p%0A"> for <em>all</em> possible values of the <img src="https://latex.codecogs.com/png.latex?p"> unknown means <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> with strict inequality for at least <em>some</em> values. Taking a closer look at the formula, we see that the James-Stein estimator is just a <em>shrinkage</em> estimator applied to each of the <img src="https://latex.codecogs.com/png.latex?p"> means, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BJS%7D%20=%20(1%20-%20%5Chat%7B%5Clambda%7D_%5Ctext%7BJS%7D)X_j,%20%5Cquad%20%5Chat%7B%5Clambda%7D_%5Ctext%7BJS%7D%20%5Cequiv%20%5Cfrac%7Bp%20-%202%7D%7B%5Csum_%7Bk=1%7D%5Ep%20X_k%5E2%7D.%0A"> The shrinkage factor in the James-Stein estimator depends on the number of means we’re estimating, <img src="https://latex.codecogs.com/png.latex?p">, along with the <em>overall</em> sum of the squared observations. All else equal, the more parameters we need to estimate, the more we shrink each of them towards zero. And the farther the observations are from zero <em>overall</em>, the less we shrink <em>each of them</em> towards zero.</p>
<p>Just like our simple shrinkage estimator from above, the James-Stein estimator achieves a lower MSE by tolerating a small bias in exchange for a larger reduction in variance, compared to the higher-variance but unbiased ML estimator. Unlike our simple shrinkage estimator, the James-Stein estimator uses the <em>data</em> to determine the shrinkage factor. And as long as <img src="https://latex.codecogs.com/png.latex?p%20%5Cgeq%203"> it is always <em>at least as good</em> as the ML estimator and sometimes <em>much better</em>. The <strong>paradox</strong> is that this seems impossible: how can information from <em>all</em> of the observations be useful when they come from <em>different</em> distributions with no obvious connection?</p>
<p>The rest of this post will <em>not</em> prove that the James-Stein estimator dominates the ML estimator. Instead it will try to convince you that there is some <em>very good intuition</em> for why the formula for the James-Stein estimator takes the form it does. By the end, I hope you’ll feel that, far from seeming paradoxical, using <em>all</em> of the observations to determine the shrinkage factor for one particular <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> makes perfect sense.</p>
</section>
</section>
<section id="where-does-the-james-stein-estimator-come-from" class="level2">
<h2 class="anchored" data-anchor-id="where-does-the-james-stein-estimator-come-from">Where does the James-Stein Estimator Come From?</h2>
<section id="an-infeasible-estimator-when-p-2" class="level3">
<h3 class="anchored" data-anchor-id="an-infeasible-estimator-when-p-2">An Infeasible Estimator When <img src="https://latex.codecogs.com/png.latex?p%20=%202"></h3>
<p>To start the ball rolling, let’s <a href="https://en.wikipedia.org/wiki/Assume_a_can_opener">assume a can-opener</a>: suppose that we don’t know any of the <em>individual</em> means <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> but for some strange reason a benevolent deity has told us the value of their sum of squares: <img src="https://latex.codecogs.com/png.latex?%0Ac%5E2%20%5Cequiv%20%5Csum_%7Bj=1%7D%5Ep%20%5Cmu_j%5E2.%0A"> It turns out that this is enough information to construct a shrinkage estimator that <em>always</em> has a lower composite MSE than the ML estimator. Let’s see why this is the case. If <img src="https://latex.codecogs.com/png.latex?p%20=%201">, then telling you <img src="https://latex.codecogs.com/png.latex?c%5E2"> is the same as telling you <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2">. Granted, knowledge of <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2"> isn’t as informative as knowledge of <img src="https://latex.codecogs.com/png.latex?%5Cmu">. For example, if I told you that <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2%20=%209"> you couldn’t tell whether <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%203"> or <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%20-3">. But, as we showed above, the optimal shrinkage estimator when <img src="https://latex.codecogs.com/png.latex?p=1"> sets <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*%20=%201/(1%20+%20%5Cmu%5E2)"> and yields an MSE of <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2/(1%20+%20%5Cmu%5E2)%20%3C%201">. Since <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*"> only depends on <img src="https://latex.codecogs.com/png.latex?%5Cmu"> through <img src="https://latex.codecogs.com/png.latex?%5Cmu%5E2">, we’ve <em>already shown</em> that knowledge of <img src="https://latex.codecogs.com/png.latex?c%5E2"> allows us to construct a shrinkage estimator that dominates the ML estimator when <img src="https://latex.codecogs.com/png.latex?p%20=%201">.</p>
<p>So what if <img src="https://latex.codecogs.com/png.latex?p"> equals 2? In this case, knowledge of <img src="https://latex.codecogs.com/png.latex?c%5E2%20=%20%5Cmu_1%5E2%20+%20%5Cmu_2%5E2"> is equivalent to knowing the <em>radius</em> of a circle centered at the origin in the <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> plane where the two unknown means must lie. For example, if I told you that <img src="https://latex.codecogs.com/png.latex?c%5E2%20=%201"> you would know that <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> lies somewhere on a circle of radius one centered at the origin. As illustrated in the following plot, the points <img src="https://latex.codecogs.com/png.latex?(x_1,%20x_2)"> and <img src="https://latex.codecogs.com/png.latex?(y_1,%20y_2)"> would then be potential values of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> as would all other points on the blue circle.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>So how can we construct a shrinkage estimator of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> with lower composite MSE than the ML estimator if <img src="https://latex.codecogs.com/png.latex?c%5E2"> is known? While there are other possibilities, the simplest would be to use the <em>same</em> shrinkage factor for each of the two coordinates. In other words, our estimator would be <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D_1(%5Clambda)%20=%20(1%20-%20%5Clambda)X_1,%20%5Cquad%20%5Chat%7B%5Cmu%7D_2(%5Clambda)%20=%20(1%20-%20%5Clambda)X_2%0A"> for some <img src="https://latex.codecogs.com/png.latex?%5Clambda"> between zero and one. The composite MSE of this estimator is just the sum of the MSE of each <em>individual</em> component, so we can re-use our algebra from above to obtain <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_1(%5Clambda)%5D%20+%20%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_2(%5Clambda)%5D%20&amp;=%20%5B(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%5Cmu_1%5E2%5D%20+%20%5B(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%5Cmu_2%5E2%5D%20%5C%5C%0A&amp;=%202(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2(%5Cmu_1%5E2%20+%20%5Cmu_2%5E2)%20%5C%5C%0A&amp;=%202(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2c%5E2.%0A%5Cend%7Baligned%7D%0A"> Notice that the composite MSE only depends on <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> through their sum of squares, <img src="https://latex.codecogs.com/png.latex?c%5E2">. Differentiating with respect to <img src="https://latex.codecogs.com/png.latex?%5Clambda">, just as we did above in the <img src="https://latex.codecogs.com/png.latex?p=1"> case, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cfrac%7Bd%7D%7Bd%5Clambda%7D%5Cleft%5B2(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2c%5E2%5Cright%5D%20&amp;=%20-4(1%20-%20%5Clambda)%20+%202%5Clambda%20c%5E2%20%5C%5C%0A&amp;=%202%20%5Cleft%5B%5Clambda%20(2%20+%20c%5E2)%20-%202%5Cright%5D%5C%5C%20%5C%5C%0A%5Cfrac%7Bd%5E2%7D%7Bd%5Clambda%5E2%7D%5Cleft%5B2(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2c%5E2%5Cright%5D%20&amp;=%202(2%20+%20c%5E2)%20%3E%200%0A%5Cend%7Baligned%7D%0A"> so there is a unique global minimum at <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*%20=%202/(2%20+%20c%5E2)">. Substituting this value of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> into the expression for the composite MSE, a few lines of algebra give <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_1(%5Clambda%5E*)%5D%20+%20%5Ctext%7BMSE%7D%5B%5Chat%7B%5Cmu%7D_2(%5Clambda%5E*)%5D%20&amp;=%202%5Cleft(1%20-%20%5Cfrac%7B2%7D%7B2%20+%20c%5E2%7D%5Cright)%5E2%20+%20%5Cleft(%5Cfrac%7B2%7D%7B2%20+%20c%5E2%7D%5Cright)%5E2c%5E2%20%5C%5C%0A&amp;=%202%5Cleft(%5Cfrac%7Bc%5E2%7D%7B2%20+%20c%5E2%7D%5Cright).%0A%5Cend%7Baligned%7D%0A"> Since <img src="https://latex.codecogs.com/png.latex?c%5E2/(2%20+%20c%5E2)%20%3C%201"> for all <img src="https://latex.codecogs.com/png.latex?c%5E2%20%3E%200">, the optimal shrinkage estimator <em>always</em> has a composite MSE less than <img src="https://latex.codecogs.com/png.latex?2">, the composite MSE of the ML estimator. Strictly speaking this estimator is <strong>infeasible</strong> since we don’t know <img src="https://latex.codecogs.com/png.latex?c%5E2">. But it’s a crucial step on our journey to make the leap from applying shrinkage to an estimator for a <em>single</em> unknown mean, to using the same idea for <em>more than one</em> unknown mean.</p>
</section>
<section id="a-simulation-experiment-for-p-2" class="level3">
<h3 class="anchored" data-anchor-id="a-simulation-experiment-for-p-2">A Simulation Experiment for <img src="https://latex.codecogs.com/png.latex?p%20=%202"></h3>
<p>You may have already noticed that it’s easy to generalize this argument to <img src="https://latex.codecogs.com/png.latex?p%3E2">. But before we consider the general case, let’s take a moment to understand the geometry of shrinkage estimation for <img src="https://latex.codecogs.com/png.latex?p=2"> a bit more deeply. The nice thing about two-dimensional problems is that they’re easy to plot. So here’s a graphical representation of both the ML estimator and our infeasible optimum shrinkage estimator when <img src="https://latex.codecogs.com/png.latex?p%20=%202">. I’ve set the true, unknown, values of <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_2"> to one so the true value of <img src="https://latex.codecogs.com/png.latex?c%5E2"> is <img src="https://latex.codecogs.com/png.latex?2"> and the optimal choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*%20=%202/(2%20+%20c%5E2)%20=%202/4%20=%200.5">. The following R code simulates our estimators and visualizes their performance, helping us see the shrinkage effect in action.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1983</span>)</span>
<span id="cb3-2"></span>
<span id="cb3-3">nreps <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb3-4">mu1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mu2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-5">x1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mu1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(nreps)</span>
<span id="cb3-6">x2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mu2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(nreps)</span>
<span id="cb3-7"></span>
<span id="cb3-8">csq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mu1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mu2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb3-9">lambda <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> csq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> csq)</span>
<span id="cb3-10"></span>
<span id="cb3-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mfrow =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb3-12"></span>
<span id="cb3-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Left panel: ML Estimator</span></span>
<span id="cb3-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(x1, x2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'MLE'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'black'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cex =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, </span>
<span id="cb3-15">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]))</span>
<span id="cb3-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> mu1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> mu2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-18"></span>
<span id="cb3-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add MSE to the plot</span></span>
<span id="cb3-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MSE ="</span>, </span>
<span id="cb3-21">                                  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>((x1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu1)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> (x2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))</span>
<span id="cb3-22"></span>
<span id="cb3-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Right panel: Shrinkage Estimator</span></span>
<span id="cb3-24"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(x1, x2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Shrinkage'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]), </span>
<span id="cb3-25">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(mu[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]))</span>
<span id="cb3-26"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">points</span>(lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x1, lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blue'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cex =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-27"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">segments</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x0 =</span> x1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y0 =</span> x2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x1 =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y1 =</span> lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> mu1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-29"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> mu2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'red'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-30"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-31"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">h =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-32"></span>
<span id="cb3-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add MSE to the plot</span></span>
<span id="cb3-34"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MSE ="</span>, </span>
<span id="cb3-35">                                  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>((lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu1)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb3-36">                                               (lambda <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu2)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>My plot has two panels. The left panel shows the raw data. Each black point is a pair <img src="https://latex.codecogs.com/png.latex?(X_1,%20X_2)"> of independent normal draws with means <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1%20=%201,%20%5Cmu_2%20=%201)"> and variances <img src="https://latex.codecogs.com/png.latex?(1,%201)">. As such, each point is also the <em>ML estimate</em> (MLE) of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> based on <img src="https://latex.codecogs.com/png.latex?(X_1,%20X_2)">. The red cross shows the location of the true values of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)">, namely <img src="https://latex.codecogs.com/png.latex?(1,%201)">. There are 50 points in the plot, representing 50 replications of the simulation, each independent of the rest and with the same parameter values. This allows us to measure how close the ML estimator is to the true value of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)"> in repeated sampling, approximating the composite MSE.</p>
<p>The right panel is more complicated. This shows <em>both</em> the ML estimates (unfilled black circles) <em>and</em> the corresponding shrinkage estimates (filled blue circles) along with dashed lines connecting them. Each shrinkage estimate is constructed by “pulling” the corresponding MLE towards the origin by a factor of <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%200.5">. Thus, if a given unfilled black circle is located at <img src="https://latex.codecogs.com/png.latex?(X_1,%20X_2)">, the corresponding filled blue circle is located at <img src="https://latex.codecogs.com/png.latex?(0.5X_1,%200.5X_2)">. As in the left panel, the red cross in the right panel shows the true values of <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)">, namely <img src="https://latex.codecogs.com/png.latex?(1,%201)">. The black cross, on the other hand, shows the point towards which the shrinkage estimator pulls the ML estimator, namely <img src="https://latex.codecogs.com/png.latex?(0,%200)">.</p>
<p>We see immediately that the ML estimator is <em>unbiased</em>: the black filled dots in the left panel (along with the unfilled ones in the right) are centered at <img src="https://latex.codecogs.com/png.latex?(1,%201)">. But the ML estimator is also <em>high-variance</em>: the black dots are quite spread out around <img src="https://latex.codecogs.com/png.latex?(1,%201)">. We can approximate the composite MSE of the ML estimator by computing the average squared Euclidean distance between the black points and the red cross.<sup>7</sup> And in keeping with our theoretical calculations, the simulation gives a composite MSE of almost exactly 2 for the ML estimator.</p>
<p>In contrast, the optimal shrinkage estimator is <em>biased</em>: the filled blue dots in the right panel centered somewhere between the red cross (the true means) and the origin. But the shrinkage estimator also has a lower variance: the filled blue dots are much closer together than the black ones. Even more importantly <em>they are on average closer to</em> <img src="https://latex.codecogs.com/png.latex?(%5Cmu_1,%20%5Cmu_2)">, as indicated by the red cross and as measured by composite MSE. Our theoretical calculations showed that the composite MSE of the optimal shrinkage estimator equals <img src="https://latex.codecogs.com/png.latex?2c%5E2/(2%20+%20c%5E2)">. When <img src="https://latex.codecogs.com/png.latex?c%5E2%20=%202">, as in this case, we obtain <img src="https://latex.codecogs.com/png.latex?2%5Ctimes%202/(2%20+%202)%20=%201">. Again, this is almost exactly what we see in the simulation.</p>
<p>If we had used more than 50 simulation replications, the composite MSE values would have been even closer to our theoretical predictions, at the cost of making the plot much harder to read! But I hope the key point is still clear: shrinkage <em>pulls</em> the MLE towards the origin, and can give a <em>much</em> lower composite MSE.</p>
</section>
<section id="an-infeasible-estimator-the-general-case" class="level3">
<h3 class="anchored" data-anchor-id="an-infeasible-estimator-the-general-case">An Infeasible Estimator: The General Case</h3>
<p>Now that we understand the case of <img src="https://latex.codecogs.com/png.latex?p=2">, the general case is a snap. Our shrinkage estimator of each <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> will take the form <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D_j(%5Clambda)%20=%20(1%20-%20%5Clambda)%20X_j,%20%5Cquad%20j%20=%201,%20%5Cdots,%20p%0A"> for some <img src="https://latex.codecogs.com/png.latex?%5Clambda"> between zero and one. To find the optimal choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda">, we minimize <img src="https://latex.codecogs.com/png.latex?%0A%5Csum_%7Bj=1%7D%5Ep%5Ctext%7BMSE%7D%5Cleft%5B%5Chat%7B%5Cmu%7D_j(%5Clambda)%20%5Cright%5D%20=%20%5Csum_%7Bj=1%7D%5Ep%20%5Cleft%5B(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%20%5Cmu_j%5E2%5Cright%5D%20=%20p(1%20-%20%5Clambda)%5E2%20+%20%5Clambda%5E2%20c%5E2%0A"> with respect to <img src="https://latex.codecogs.com/png.latex?%5Clambda">. Again, the key is that the composite MSE only depends on the unknown means through <img src="https://latex.codecogs.com/png.latex?c%5E2">. Using almost exactly the same calculations as above for the case of <img src="https://latex.codecogs.com/png.latex?p%20=%202">, we find that <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda%5E*%20=%20%5Cfrac%7Bp%7D%7Bp%20+%20c%5E2%7D,%20%5Cquad%20%5Csum_%7Bj=1%7D%5Ep%20%5Ctext%7BMSE%7D%5Cleft%5B%5Chat%7B%5Cmu%7D_j(%5Clambda%5E*)%20%5Cright%5D%20=%20p%5Cleft(%5Cfrac%7Bc%5E2%7D%7Bp%20+%20c%5E2%7D%5Cright).%0A"> since <img src="https://latex.codecogs.com/png.latex?c%5E2/(p%20+%20c%5E2)%20%3C%201"> for all <img src="https://latex.codecogs.com/png.latex?c%5E2%20%3E%200">, the optimal shrinkage estimator <em>always</em> has a composite MSE less than <img src="https://latex.codecogs.com/png.latex?p">, the composite MSE of the ML estimator.</p>
</section>
<section id="not-quite-the-james-stein-estimator" class="level3">
<h3 class="anchored" data-anchor-id="not-quite-the-james-stein-estimator">Not Quite the James-Stein Estimator</h3>
<p>The end is in sight! We’ve shown that if we knew the sum of squares of the unknown means, <img src="https://latex.codecogs.com/png.latex?c%5E2">, we could construct a shrinkage estimator that always has a lower composite MSE than the ML estimator. But we don’t know <img src="https://latex.codecogs.com/png.latex?c%5E2">. So what can we do? To start off, re-write <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*"> as follows <img src="https://latex.codecogs.com/png.latex?%0A%5Clambda%5E*%20=%20%5Cfrac%7Bp%7D%7Bp%20+%20c%5E2%7D%20=%20%5Cfrac%7B1%7D%7B1%20+%20c%5E2/p%7D.%0A"> This way of writing things makes it clear that it’s not <img src="https://latex.codecogs.com/png.latex?c%5E2"> <em>per se</em> that matters but rather <img src="https://latex.codecogs.com/png.latex?c%5E2/p">. And this quantity is simply is the <em>average</em> of the unknown squared means: <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Bc%5E2%7D%7Bp%7D%20=%20%5Cfrac%7B1%7D%7Bp%7D%5Csum_%7Bj=1%7D%5Ep%20%5Cmu_j%5E2.%0A"> So how could we learn <img src="https://latex.codecogs.com/png.latex?c%5E2/p">? An idea that immediately suggests itself is to estimate this quantity by replacing each unobserved <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> with the corresponding observation <img src="https://latex.codecogs.com/png.latex?X_j">, in other words <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7Bp%7D%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2.%0A"> This is a good starting point, but we can do better. Since <img src="https://latex.codecogs.com/png.latex?X_j%20%5Csim%20%5Ctext%7BNormal%7D(%5Cmu_j,%201)">, we see that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5Cleft%5B%5Cfrac%7B1%7D%7Bp%7D%20%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2%20%5Cright%5D%20=%20%5Cfrac%7B1%7D%7Bp%7D%20%5Csum_%7Bj=1%7D%5Ep%20%5Cmathbb%7BE%7D%5BX_j%5E2%5D%20=%20%5Cfrac%7B1%7D%7Bp%7D%20%5Csum_%7Bj=1%7D%5Ep%20%5B%5Ctext%7BVar%7D(X_j)%20+%20%5Cmathbb%7BE%7D(X_j)%5E2%5D%20=%20%5Cfrac%7B1%7D%7Bp%7D%20%5Csum_%7Bj=1%7D%5Ep%20(1%20+%20%5Cmu_j%5E2)%20=%201%20+%20%5Cfrac%7Bc%5E2%7D%7Bp%7D.%0A"> This means that <img src="https://latex.codecogs.com/png.latex?(%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2)/p"> will on average <em>overestimate</em> <img src="https://latex.codecogs.com/png.latex?c%5E2/p"> by one. But that’s a problem that’s easy to fix: simply subtract one! This is a rare situation in which there is <em>no bias-variance tradeoff</em>. Subtracting a constant, in this case one, doesn’t contribute any additional variation while completely removing the bias. Plugging into our formula for <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*">, this suggests using the estimator <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Clambda%7D%20%5Cequiv%20%5Cfrac%7B1%7D%7B1%20+%20%5Cleft%5B%5Cleft(%5Cfrac%7B1%7D%7Bp%7D%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2%20%5Cright)%20-%201%5Cright%5D%7D%20=%20%5Cfrac%7B1%7D%7B%5Cfrac%7B1%7D%7Bp%7D%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2%7D%20=%20%5Cfrac%7Bp%7D%7B%5Csum_%7Bj=1%7D%5Ep%20X_j%5E2%7D%0A"> as our stand-in for the unknown <img src="https://latex.codecogs.com/png.latex?%5Clambda%5E*">, yielding a shrinkage estimator that I’ll call “NQ” for “not quite” for reasons that will become apparent in a moment: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BNQ%7D%20=%20%5Cleft(1%20-%20%5Cfrac%7Bp%7D%7B%5Csum_%7Bk=1%7D%5Ep%20X_k%5E2%7D%5Cright)X_j.%0A"> Notice what’s happening here: our optimal shrinkage estimator depends on <img src="https://latex.codecogs.com/png.latex?c%5E2/p">, something we can’t observe. But we’ve constructed an <em>unbiased estimator</em> of this quantity by using <em>all of the observations</em> <img src="https://latex.codecogs.com/png.latex?X_j">. This is the resolution of the paradox discussed above: all of the observations contain information about <img src="https://latex.codecogs.com/png.latex?c%5E2"> since this is simply the sum of the squared means. And because we’ve chosen to minimize composite MSE, the optimal shrinkage factor only depends on the individual <img src="https://latex.codecogs.com/png.latex?%5Cmu_j"> parameters through <img src="https://latex.codecogs.com/png.latex?c%5E2">! This is the sense in which it’s possible to learn something useful about, say, <img src="https://latex.codecogs.com/png.latex?%5Cmu_1"> from <img src="https://latex.codecogs.com/png.latex?X_2"> in spite of the fact that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BX_2%5D%20=%20%5Cmu_2"> may bear no relationship to <img src="https://latex.codecogs.com/png.latex?%5Cmu_1">.</p>
<p>But wait a minute! This looks <em>suspiciously familiar</em>. Recall that the James-Stein estimator is given by <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmu%7D%5E%7B(j)%7D_%5Ctext%7BJS%7D%20=%20%5Cleft(1%20-%20%5Cfrac%7Bp%20-%202%7D%7B%5Csum_%7Bk=1%7D%5Ep%20X_k%5E2%7D%5Cright)X_j.%0A"> Just like the JS estimator, my NQ estimator shrinks each of the <img src="https://latex.codecogs.com/png.latex?p"> means towards zero by a factor that depends on the number of means we’re estimating, <img src="https://latex.codecogs.com/png.latex?p">, and the overall sum of the squared observations. The key difference between JS and NQ is that JS uses <img src="https://latex.codecogs.com/png.latex?p%20-%202"> in the numerator instead of <img src="https://latex.codecogs.com/png.latex?p">. This means that NQ is a more “aggressive” shrinkage estimator than JS: it pulls the means towards zero by a larger amount than JS. This difference turns out to be crucial for proving that the JS estimator dominates the ML estimator. But when it comes to understanding why the JS estimator has the <em>form</em> that it does, I would argue that the difference is minor. If you want all the gory details of where that extra <img src="https://latex.codecogs.com/png.latex?-2"> comes from, along with the closely related issue of why <img src="https://latex.codecogs.com/png.latex?p%5Cgeq%203"> is crucial for JS to dominate the ML estimator, see <a href="https://ditraglia.com/econ722/slides/econ722slides.pdf">lecture 1</a> or <a href="https://ditraglia.com/econ722/main.pdf">section 7.3</a> from my Econ 722 teaching materials.</p>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>Before we conclude, there’s one important caveat to bear in mind. In addition to the qualifications that NQ isn’t <em>quite</em> JS, and that JS only dominates the MLE when <img src="https://latex.codecogs.com/png.latex?p%20%5Cgeq%203">, there’s one more fundamental issue that could be easily missed. Our decision to minimize <em>composite</em> MSE is <em>absolutely crucial</em> to the reasoning given above. The magic of shrinkage depends on our willingness to accept a trade-off in which we do a worse job estimating one mean in exchange for doing a better job estimating another, as composite MSE imposes. Whether this makes sense in practice depends on the context.</p>
<p>If we’re searching for a lost submarine in the ocean (a 3-dimensional problem), it makes perfect sense to be willing to be farther from the submarine in one dimension in exchange for being closer in another. That’s because <em>Euclidean distance</em> is obviously what we’re after here. But if instead we’re estimating <a href="https://ideas.repec.org/p/nbr/nberwo/27094.html">teacher value-added</a> and the results of our estimation exercise will be used to determine which teachers lose their jobs, it’s less clear that we should be willing to be farther from one teacher in exchange for being closer to another. Certainly that would be no consolation to someone who had been wrongly dismissed! If we were merely using this information to identify teachers who might need extra help, it’s another story. But the point I’m trying to make here is that our choice of which criterion to minimize necessarily encodes our <em>values</em> in a particular problem.</p>
<p>But with that said, I hope you’re satisfied that this extremely long post was worth the effort. Without using any fancy mathematics or statistical theory, we’ve managed to invent something that is <em>nearly identical</em> to the James-Stein estimator and thus to resolve Stein’s paradox. We started by pretending what we knew <img src="https://latex.codecogs.com/png.latex?c%5E2"> and showed that this would allow us to derive a shrinkage estimator with a lower composite MSE than the ML estimator. Then we simply plugged in an unbiased estimator of the key unknown quantity: <img src="https://latex.codecogs.com/png.latex?c%5E2/p">. Because all the observations contain information about <img src="https://latex.codecogs.com/png.latex?c%5E2">, it makes sense that we should decide how much to shrink one component <img src="https://latex.codecogs.com/png.latex?X_j"> by using all of the others. At this point, I hope that the James-Stein estimator seems not only plausible but practically <em>obvious</em>, excepting of course that pesky <img src="https://latex.codecogs.com/png.latex?-2"> in the numerator.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>If I ruled the universe, the Gauss-Markov Theorem would be demoted to much less exalted status in econometrics teaching!↩︎</p></li>
<li id="fn2"><p>Don’t let words do your thinking for you: “bias” sounds like a very bad thing, like kicking puppies. But that’s because the word “bias” has a negative connotation in English. In statistics, it’s just a technical term for “not centered”. An estimator can be biased and still be very good. Indeed the punchline of this post is that the James-Stein estimator is biased but can be much better than the obvious alternative!↩︎</p></li>
<li id="fn3"><p>Why squared bias and not simply bias itself? The answer is units: bias is measured in the same units as the parameter being estimated while the variance is in squared units. It doesn’t make sense to add things with different units, so we either have to square the bias or take the square root of the variance, i.e.&nbsp;replace it with the standard deviation. But bias can be negative, and we wouldn’t want a large negative bias to cancel out a large standard deviation so MSE squares the bias instead.↩︎</p></li>
<li id="fn4"><p>See if you can prove this as a homework exercise!↩︎</p></li>
<li id="fn5"><p>In Bayesian terms, we could view this “shrinkage” idea as calculating the posterior mean of <img src="https://latex.codecogs.com/png.latex?%5Cmu"> conditional on our data <img src="https://latex.codecogs.com/png.latex?X"> under a normal prior. In this case <img src="https://latex.codecogs.com/png.latex?%5Clambda"> would equal <img src="https://latex.codecogs.com/png.latex?%5Ctau/(1%20+%20%5Ctau)"> where <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is the <em>prior precision</em>, i.e.&nbsp;the reciprocal of the prior variance. But for this post we’ll mainly stick to the Frequentist perspective.↩︎</p></li>
<li id="fn6"><p>Strictly speaking all of this pre-supposes that we’re working with squared-error loss so that MSE is the right thing to minimize. There are other loss functions we could have used instead and these would lead to different risk functions. But for the purposes of this post, I prefer to keep things simple. See <a href="https://ditraglia.com/econ722/slides/econ722slides.pdf">lecture 1</a> of my Econ 722 slides for more detail.↩︎</p></li>
<li id="fn7"><p>Remember that there are two equivalent definitions of MSE: bias squared plus variance on the one hand and expected squared distance from the truth on the other hand.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <guid>https://www.econometrics.blog/post/not-quite-the-james-stein-estimator/</guid>
  <pubDate>Sat, 10 Aug 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>How to Do Regression Adjustment</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/how-to-do-regression-adjustment/</link>
  <description><![CDATA[ 




<p>By the end of a typical introductory econometrics course students have become accustomed to the idea of “controlling” for covariates by adding them to the end of a linear regression model. But this familiarity can sometimes cause confusion when students later encounter <em>regression adjustment</em>, a widely-used approach to causal inference under the selection-on-observables assumption. While regression adjustment is simple in theory, the finer points of how and when to apply it in practice are much more subtle. One of these finer points is how to tell whether a particular covariate is a “good control” that will help us learn the causal effect of interest or a “bad control” that will only make things worse.<sup>1</sup> Another, and the topic of today’s post, is how to actually <em>implement</em> regression adjustment after we’ve decided which covariates to adjust for.</p>
<p>The pre-requisites for this post are a basic understanding of selection-on-observables and regression adjustment. If you’re a bit rusty on these points, you might find it helpful to glance at the first half of my <a href="https://www.treatment-effects.com/02-selection-on-observables.pdf">lecture slides</a> along with this series of <a href="https://youtube.com/playlist?list=PLi6qbNWpQUeM1kKYjqq36aY5WQ1Zn-I6E&amp;si=zqxf9LmexMh0cL2E">short videos</a>. If you’re still hungry for more after this, you might also enjoy this <a href="../../post/misunderstanding-selection-on-observables/">earlier post</a> from <a href="https://econometrics.blog">econometrics.blog</a> on common misunderstandings about the selection-on-observables assumption.</p>
<section id="a-quick-review" class="level2">
<h2 class="anchored" data-anchor-id="a-quick-review">A Quick Review</h2>
<p>Consider a binary treatment <img src="https://latex.codecogs.com/png.latex?D"> and an observed outcome <img src="https://latex.codecogs.com/png.latex?Y">. Let <img src="https://latex.codecogs.com/png.latex?(Y_0,%20Y_1)"> be the <a href="https://youtu.be/EXgOSj7GdSs?si=0Nhx5p2GwJHH3d69">potential outcomes</a> corresponding to the treatment <img src="https://latex.codecogs.com/png.latex?D">. Our goal is to learn the average treatment effect <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BATE%7D%20%5Cequiv%20%5Cmathbb%7BE%7D(Y_1%20-%20Y_0)"> but, unless <img src="https://latex.codecogs.com/png.latex?D"> is randomly assigned, using the difference of observed means <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CD=1)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0)"> to estimate the ATE in general <a href="https://youtu.be/zbgN0GLolFo?si=aYH_huGqezsWIuUv">won’t work</a>. The idea of <strong>selection-on-observables</strong> is that <img src="https://latex.codecogs.com/png.latex?D"> might be “as good as randomly assigned” after we adjust for a collection of observed covariates <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>Regression adjustment relies on two assumptions: <strong>selection-on-observables</strong> and <strong>overlap</strong>. The selection-on-observables assumption says that learning <img src="https://latex.codecogs.com/png.latex?D"> provides no additional information about the average values of <img src="https://latex.codecogs.com/png.latex?Y_0"> and <img src="https://latex.codecogs.com/png.latex?Y_1">, provided that we already know <img src="https://latex.codecogs.com/png.latex?X">. This implies that we can learn the <em>conditional average treatment effect</em> (CATE) by comparing observed outcomes of the treated and untreated <strong>holding <img src="https://latex.codecogs.com/png.latex?X"> fixed</strong>: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCATE%7D(x)%20%5Cequiv%20%5Cmathbb%7BE%7D%5BY_1%20-%20Y_0%7CX%20=%20x%5D%20=%20%5Cmathbb%7BE%7D%5BY%7CD=1,%20X%20=%20x%5D%20-%20%5Cmathbb%7BE%7D%5BY%7CD=0,%20X%20=%20x%5D.%0A"> For example: older people might be more likely to take a new medication but also more likely to die without it. If so, perhaps by comparing average outcomes <em>holding age fixed</em> we can learn the causal effect of the medication. The overlap assumption says that, for any fixed value <img src="https://latex.codecogs.com/png.latex?x"> of the covariates, there are some treated and some untreated people. This allows us to learn <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCATE%7D(x)"> for every value of <img src="https://latex.codecogs.com/png.latex?x"> in the population and average it using the law of iterated expectations to recover the ATE:<br>
<img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BATE%7D%20=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCATE%7D(X)%5D%20=%20%5Cmathbb%7BE%7D%5B%5Cmathbb%7BE%7D(Y%7CD=1,%20X)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X)%5D.%0A"> In the medication example, this would correspond to computing the difference of means for each age group <em>separately</em>, and then averaging them using the share of people in each age group. Notice that this is only possible if there are some people who took the medication and some who didn’t in each age group. That’s exactly what the overlap assumption buys us. For example, if there were no senior citizens who <em>didn’t</em> take the medication, we wouldn’t be able to learn the effect of the medication for senior citizens.</p>
</section>
<section id="which-regression-should-we-run" class="level2">
<h2 class="anchored" data-anchor-id="which-regression-should-we-run">Which regression should we run?</h2>
<p>So suppose that we’ve found a set of covariates <img src="https://latex.codecogs.com/png.latex?X"> that satisfy the required assumptions. How should we actually <em>carry out</em> regression adjustment? To answer this question, let’s start by making things a bit simpler. Suppose that <img src="https://latex.codecogs.com/png.latex?X"> is a single <em>binary</em> covariate. At the end of the post, we’ll return to the general case. Since <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?D"> are both binary, we can write the conditional mean function of <img src="https://latex.codecogs.com/png.latex?Y"> given <img src="https://latex.codecogs.com/png.latex?(D,%20X)"> as <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(Y%7CD,%20X)%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20DX.%0A"> Since the true conditional mean function is linear, a linear regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D">, <img src="https://latex.codecogs.com/png.latex?X">, <img src="https://latex.codecogs.com/png.latex?DX"> and an intercept will recover <img src="https://latex.codecogs.com/png.latex?(%5Cbeta_0,%20%5Cbeta_1,%20%5Cbeta_2,%20%5Cbeta_3)">. But what on earth do these coefficients actually <em>mean</em>?! Substituting all possible values of <img src="https://latex.codecogs.com/png.latex?(D,%20X)">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0)%20&amp;=%20%5Cbeta_0%20%5C%5C%0A%5Cmathbb%7BE%7D(Y%7CD=1,%20X=0)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20%5C%5C%0A%5Cmathbb%7BE%7D(Y%7CD=0,%20X=1)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_2%20%5C%5C%0A%5Cmathbb%7BE%7D(Y%7CD=1,%20X=1)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20+%20%5Cbeta_2%20+%20%5Cbeta_3.%0A%5Cend%7Baligned%7D%0A"> And so, after a bit of re-arranging, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cbeta_0%20&amp;=%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0)%5C%5C%0A%5Cbeta_1%20&amp;=%20%5Cmathbb%7BE%7D(Y%7CD=1,%20X=0)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0)%5C%5C%0A%5Cbeta_2%20&amp;=%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=1)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0)%5C%5C%0A%5Cbeta_3%20&amp;=%20%5Cmathbb%7BE%7D(Y%7CD=1,%20X=1)%20-%20%5Cmathbb%7BE%7D(Y%7CD=1,%20X=0)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=1)%20+%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0).%0A%5Cend%7Baligned%7D%0A"> <strong>What a mess!</strong> Alas, we’ll need a few more steps of algebra to figure out how these relate to the ATE. Notice that <img src="https://latex.codecogs.com/png.latex?%5Cbeta_1"> equals the CATE when <img src="https://latex.codecogs.com/png.latex?X=0"> since <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BCATE%7D(0)%20&amp;%5Cequiv%20%5Cmathbb%7BE%7D(Y%7CD=1,%20X=0)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=0)%5C%5C%0A&amp;=%20(%5Cbeta_0%20+%20%5Cbeta_1)%20-%20%5Cbeta_0%5C%5C%0A&amp;%20=%20%5Cbeta_1%0A%5Cend%7Baligned%7D%0A"> Proceeding similarly for the CATE when <img src="https://latex.codecogs.com/png.latex?X%20=%201">, we find that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BCATE%7D(1)%20&amp;%5Cequiv%20%5Cmathbb%7BE%7D(Y%7CD=1,%20X=1)%20-%20%5Cmathbb%7BE%7D(Y%7CD=0,%20X=1)%20%5C%5C%0A&amp;=%20(%5Cbeta_0%20+%20%5Cbeta_1%20+%20%5Cbeta_2%20+%20%5Cbeta_3)%20-%20(%5Cbeta_0%20+%20%5Cbeta_2)%20%5C%5C%0A&amp;=%20%5Cbeta_1%20+%20%5Cbeta_3.%0A%5Cend%7Baligned%7D%0A"> Now that we have expressions for each of the two conditional average treatment effects, corresponding to each of the values that <img src="https://latex.codecogs.com/png.latex?X"> can take, we’re finally ready to compute the ATE: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BATE%7D%20&amp;=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCATE%7D(X)%5D%20%5C%5C%0A&amp;=%20%5Ctext%7BCATE%7D(0)%20%5Ctimes%20%5Cmathbb%7BP%7D(X%20=%200)%20+%20%5Ctext%7BCATE%7D(1)%20%5Ctimes%20%5Cmathbb%7BP%7D(X%20=%201)%20%5C%5C%0A&amp;=%20%5Cbeta_1%20%5Cleft%5B1%20-%20%5Cmathbb%7BP%7D(X%20=%201)%5Cright%5D%20+%20(%5Cbeta_1%20+%20%5Cbeta_3)%20%5Cmathbb%7BP%7D(X%20=%201)%20%5C%5C%0A&amp;=%20%5Cbeta_1%20+%20%5Cbeta_3%20p%0A%5Cend%7Baligned%7D%0A"> where we define the shorthand <img src="https://latex.codecogs.com/png.latex?p%20%5Cequiv%20%5Cmathbb%7BP%7D(X=1)">. So to compute the ATE, we need to know the coefficients <img src="https://latex.codecogs.com/png.latex?%5Cbeta_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta_3"> from the regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D">, <img src="https://latex.codecogs.com/png.latex?X">, and <img src="https://latex.codecogs.com/png.latex?DX">, <em>in addition</em> to the share of people with <img src="https://latex.codecogs.com/png.latex?X%20=%201">. Needless to say, your favorite regression package will not spit out the ATE for you if you run the regression from above. And it <em>certainly</em> won’t spit out the standard error! So what can we do besides computing everything by hand?</p>
</section>
<section id="two-simple-alternatives" class="level2">
<h2 class="anchored" data-anchor-id="two-simple-alternatives">Two Simple Alternatives</h2>
<p>It turns out that there are two simple ways to get the your favorite software package to spit out the ATE for you and associated standard error. Each involves a slight <em>re-parameterization</em> of the conditional mean expression from above. The first one replaces <img src="https://latex.codecogs.com/png.latex?DX"> with <img src="https://latex.codecogs.com/png.latex?D%5Ctilde%7BX%7D"> where <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D%20%5Cequiv%20X%20-%20p"> and <img src="https://latex.codecogs.com/png.latex?p%20%5Cequiv%20%5Cmathbb%7BP%7D(X=1)">. To see why this works, notice that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D(Y%7CD,%20X)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20DX%20%5C%5C%0A&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20D(X%20-%20p)%20+%20%5Cbeta_3%20pD%5C%5C%0A&amp;=%20%5Cbeta_0%20+%20(%5Cbeta_1%20+%20%5Cbeta_3%20p)%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D%5C%5C%0A&amp;=%20%5Cbeta_0%20+%20%5Ctext%7BATE%7D%5Ctimes%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D.%0A%5Cend%7Baligned%7D%0A"> This works perfectly well, but there’s something about it that offends my sense of order: why subtract the mean from <img src="https://latex.codecogs.com/png.latex?X"> in <em>one place but not in another</em>? If you share my aesthetic sensibilities, then you can feel free to replace that offending <img src="https://latex.codecogs.com/png.latex?X"> with another <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> since <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D(Y%7CD,%20X)%20&amp;=%20%5Cbeta_0%20+%20%5Ctext%7BATE%7D%5Ctimes%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D%5C%5C%0A&amp;=%20%5Cbeta_0%20+%20%5Ctext%7BATE%7D%5Ctimes%20D%20+%20%5Cbeta_2%20(X-p)%20+%20p%20%5Cbeta_2%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D%5C%5C%0A&amp;=%20(%5Cbeta_0%20+%20p%20%5Cbeta_2)%20+%20%5Ctext%7BATE%7D%5Ctimes%20D%20+%20%5Cbeta_2%20%5Ctilde%7BX%7D%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D%5C%5C%0A&amp;=%20%5Ctilde%7B%5Cbeta%7D_0%20+%20%5Ctext%7BATE%7D%5Ctimes%20D%20+%20%5Cbeta_2%20%5Ctilde%7BX%7D%20+%20%5Cbeta_3%20D%5Ctilde%7BX%7D%0A%5Cend%7Baligned%7D%0A"> where we define <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5Cbeta%7D_0%20%5Cequiv%20%5Cbeta_0%20+%20p%20%5Cbeta_2">. Notice that the only coefficient that changes is the intercept, and we’re typically not interested in this anyway!</p>
</section>
<section id="what-if-we-ignore-the-interaction" class="level2">
<h2 class="anchored" data-anchor-id="what-if-we-ignore-the-interaction">What if we ignore the interaction?</h2>
<p>Wait a minute, you may be ready to object, when researchers claim to be “adjusting” or “controlling” for <img src="https://latex.codecogs.com/png.latex?X"> in practice, they very rarely include an interaction term between <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X"> in their regression! Instead, they just regress <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X">. What can we say about this approach? To answer this question, let’s continue with our example from above and define the following population linear regression model: <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Calpha_0%20+%20%5Calpha_1%20D%20+%20%5Calpha_2%20X%20+%20V%0A"> where <img src="https://latex.codecogs.com/png.latex?V"> is the population linear regression error term so that, <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(V)%20=%20%5Cmathbb%7BE%7D(XV)%20=%200">. Notice that I’ve called the coefficients in this regression <img src="https://latex.codecogs.com/png.latex?%5Calpha"> rather than <img src="https://latex.codecogs.com/png.latex?%5Cbeta">. That’s because they will <em>not in general coincide</em> with the conditional mean function from above, namely <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CD,%20X)%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20DX">. In particular, the regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X"> without an interaction will <em>only</em> coincide with the true conditional mean function if <img src="https://latex.codecogs.com/png.latex?%5Cbeta_3%20=%200">.</p>
<p>So what, if anything, can we say about <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> in relation to the ATE? By <a href="https://en.wikipedia.org/wiki/Frisch%E2%80%93Waugh%E2%80%93Lovell_theorem">Yule’s Rule</a><sup>2</sup> we have <img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_1%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)%7D,%20%5Cquad%0AD%20=%20%5Cgamma_0%20+%20%5Cgamma_1%20X%20+%20%5Ctilde%7BD%7D,%20%5Cquad%20%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D)%20=%20%5Cmathbb%7BE%7D(X%5Ctilde%7BD%7D)%20=%200%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BD%7D"> is the error term from a population linear regression of <img src="https://latex.codecogs.com/png.latex?D"> on <img src="https://latex.codecogs.com/png.latex?X">. In words, the way that a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X"> “adjusts” for <img src="https://latex.codecogs.com/png.latex?X"> is by first regressing <img src="https://latex.codecogs.com/png.latex?D"> on <img src="https://latex.codecogs.com/png.latex?X">, taking the part of <img src="https://latex.codecogs.com/png.latex?D"> that is <em>not</em> correlated with <img src="https://latex.codecogs.com/png.latex?X">, namely <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BD%7D">, and regressing <img src="https://latex.codecogs.com/png.latex?Y"> on this alone.<sup>3</sup> As shown in the appendix to this post, <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Ctext%7BCov%7D(Y,%5Ctilde%7BD%7D)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)%7D%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)(%5Cbeta_1%20+%20%5Cbeta_3%20X)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D.%0A"> in this example. And since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCATE%7D(X)%20=%20%5Cbeta_1%20+%20%5Cbeta_3%20X"> it follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_1%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%20%5Ccdot%20%5Ctext%7BCATE%7D(X)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D.%0A"> The only thing that’s random in this expression is <img src="https://latex.codecogs.com/png.latex?X">. Both expectations involve averaging over its distribution. To make this clearer, define the <strong>propensity score</strong> <img src="https://latex.codecogs.com/png.latex?%5Cpi(x)%20%5Cequiv%20%5Cmathbb%7BP%7D(D=1%7CX=x)">. Using this notation, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BVar%7D(D%7CX)%20&amp;=%20%5Cmathbb%7BE%7D(D%5E2%7CX)%20-%20%5Cmathbb%7BE%7D(D%7CX)%5E2%20=%20%5Cmathbb%7BE%7D(D%7CX)%20-%20%5Cmathbb%7BE%7D(D%7CX)%5E2%5C%5C%0A&amp;=%20%5Cpi(X)%20-%20%5Cpi(X)%5E2%20=%20%5Cpi(X)%5B1%20-%20%5Cpi(X)%5D%0A%5Cend%7Baligned%7D%0A"> since <img src="https://latex.codecogs.com/png.latex?D"> is binary. Defining <img src="https://latex.codecogs.com/png.latex?p(x)%20%5Cequiv%20%5Cmathbb%7BP%7D(X%20=%20x)">, we see that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Calpha_1%20%20&amp;=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Cpi(X)%5C%7B1%20-%20%5Cpi(X)%5C%7D%5Ccdot%20%5Ctext%7BCATE%7D(X)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Cpi(X)%5C%7B1%20-%20%5Cpi(X)%5C%7D%5D%7D%5C%5C%20%5C%5C%0A&amp;=%20%5Cfrac%7Bp(0)%20%5Ccdot%20%5Cpi(0)%5B1%20-%20%5Cpi(0)%5D%5Ccdot%20%5Ctext%7BCATE%7D(0)%20+%20p(1)%20%5Ccdot%20%5Cpi(1)%5B1%20-%20%5Cpi(1)%5D%5Ccdot%20%5Ctext%7BCATE%7D(1)%7D%7Bp(0)%20%5Ccdot%20%5Cpi(0)%5B1%20-%20%5Cpi(0)%5D%20+%20p(1)%20%5Ccdot%20%5Cpi(1)%5B1%20-%20%5Cpi(1)%5D%7D%5C%5C%20%5C%5C%0A&amp;=%20w_0%20%5Ccdot%20%5Ctext%7BCATE%7D(0)%20+%20w_1%20%5Ccdot%20%5Ctext%7BCATE%7D(1)%0A%5Cend%7Baligned%7D%0A"> where we introduce the shorthand <img src="https://latex.codecogs.com/png.latex?%0Aw(x)%20%5Cequiv%20%5Cfrac%7Bp(x)%20%5Ccdot%20%5Cpi(x)%5B1%20-%20%5Cpi(x)%5D%7D%7B%5Csum_%7B%5Ctext%7Ball%20%7D%20k%7D%20p(k)%20%5Ccdot%20%5Cpi(k)%5B1%20-%20%5Cpi(k)%5D%7D.%0A"> In other words, the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> in a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X"> excluding the interaction term <img src="https://latex.codecogs.com/png.latex?DX"> gives a <strong>weighted average</strong> of the conditional average treatment effects for the different values of <img src="https://latex.codecogs.com/png.latex?X">. The weights are between zero and one and sum to one. Because <img src="https://latex.codecogs.com/png.latex?w(x)"> is increasing in <img src="https://latex.codecogs.com/png.latex?p(x)">, values of <img src="https://latex.codecogs.com/png.latex?X"> that are <em>more common</em> are given more weight just as they are in the ATE. But since <img src="https://latex.codecogs.com/png.latex?w(x)"> is <em>also</em> increasing in <img src="https://latex.codecogs.com/png.latex?%5Cpi(x)%5B1%20-%20%5Cpi(x)%5D">, values of <img src="https://latex.codecogs.com/png.latex?X"> for which <img src="https://latex.codecogs.com/png.latex?%5Cpi(x)"> is closer to 0.5 are given more weight, <em>unlike</em> in the ATE. As such, we could describe <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> as a <em>variance-weighted average</em> of the conditional average treatment effects.</p>
<p>In general, the weighted average <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> will <em>not</em> coincide with the ATE, although there are two special cases where it will. The first case is when <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCATE%7D(X)"> does not depend on <img src="https://latex.codecogs.com/png.latex?X">, i.e.&nbsp;treatment effects are <em>homogeneous</em>. In this case <img src="https://latex.codecogs.com/png.latex?%5Cbeta_3%20=%200"> so there <em>is no interaction term in the conditional mean function</em>! The second is when <img src="https://latex.codecogs.com/png.latex?%5Cpi(X)"> does not depend on <img src="https://latex.codecogs.com/png.latex?X">, in which case the probability of treatment does not depend on <img src="https://latex.codecogs.com/png.latex?X">, so we don’t need to adjust for <img src="https://latex.codecogs.com/png.latex?X"> in the first place!</p>
</section>
<section id="what-about-the-general-case" class="level2">
<h2 class="anchored" data-anchor-id="what-about-the-general-case">What about the general case?</h2>
<p>All of the above derivations assumed that <img src="https://latex.codecogs.com/png.latex?X"> is one-dimensional and binary. So how much of this still applies more generally? First, if <img src="https://latex.codecogs.com/png.latex?X"> is a vector of binary variables representing categories like sex, race etc., everything goes through <em>exactly</em> as above.<sup>4</sup> All that changes is that <img src="https://latex.codecogs.com/png.latex?%5Cbeta_2">, <img src="https://latex.codecogs.com/png.latex?%5Cbeta_3"> and <img src="https://latex.codecogs.com/png.latex?p%20=%20%5Cmathbb%7BE%7D(X)"> become vectors. The coefficient on <img src="https://latex.codecogs.com/png.latex?D"> in a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D">, <img src="https://latex.codecogs.com/png.latex?X"> and the interaction <img src="https://latex.codecogs.com/png.latex?D%20%5Ctilde%7BX%7D"> is still the ATE, and the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> in a regression that <em>excludes</em> the interaction term is still a weighted average of CATEs that does <em>not in general</em> equal the ATE.</p>
<p>So whenever the covariates you need to adjust for are categorical, this post has you covered.<sup>5</sup> But what if some of our covariates are continuous? In this case things are a bit more complicated, but all of the results from above still go through if we’re willing to assume that the conditional mean functions <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CD=0,%20X)">, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CD=1,X)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(D%7CX)"> are linear in <img src="https://latex.codecogs.com/png.latex?X">. This is undoubtedly a strong assumption, but not perhaps as strong as it seems. For example, <img src="https://latex.codecogs.com/png.latex?X"> could include logs, squares or other functions of some underlying continuous covariates, e.g.&nbsp;age or years of experience. In this case, the weighted average interpretation of the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> in a regression that excludes the interaction term still holds but now involves an integral rather than a sum.</p>
</section>
<section id="does-it-really-work-an-empirical-example" class="level2">
<h2 class="anchored" data-anchor-id="does-it-really-work-an-empirical-example">Does it really work? An Empirical Example</h2>
<p>But perhaps you don’t trust my algebra.<sup>6</sup> To assuage your fears, let’s take this to the data! The following example is based on <a href="https://www.almendron.com/tribuna/wp-content/uploads/2018/04/electoral-effects-of-biased-media-russian-television-in-ukraine.pdf">Peisakhin &amp; Rozenas (2018) - Electoral Effects of Biased Media: Russian Television in Ukraine</a>. I’ve adapted it from Llaudet and Imai’s fantastic book <a href="https://press.princeton.edu/books/hardcover/9780691199429/data-analysis-for-social-science">Data Analysis for Social Science</a>, the perfect holiday or birthday gift for the budding social scientist in your life.</p>
<p>Here’s a bit of background. In the lead-up to Ukraine’s 2014 parliamentary election, Russian state-controlled TV mounted a fierce media campaign against the Ukrainian government. Ukrainians who lived near the border with Russia could <em>potentially</em> receive Russian TV signals. Did receiving these signals <em>cause</em> them to support pro-Russia parties in the election? To answer this question, we’ll use a dataset called <code>precincts</code> that contains aggregate election results in precincts close to the Russian border:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2">precincts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://ditraglia.com/data/UA_precincts.csv'</span>)</span></code></pre></div></div>
</div>
<p>Each row of <code>precincts</code> is an electoral precinct in Ukraine that is near the Russian border. The columns <code>pro_russian</code> and <code>prior_pro_russian</code> give the vote share (in percentage points) of pro-Russian parties in the 2014 and 2012 Ukrainian elections, respectively. Our outcome of interest will be the <em>change</em> in pro-Russian vote share between the two elections, so we first need to construct this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">precincts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> precincts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">change =</span> pro_russian <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> prior_pro_russian) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pro_russian, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>prior_pro_russian)</span>
<span id="cb2-4">precincts</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3,589 × 3
   russian_tv within_25km change
        &lt;dbl&gt;       &lt;dbl&gt;  &lt;dbl&gt;
 1          0           1  -22.4
 2          0           0  -34.5
 3          1           1  -18.8
 4          0           1  -12.2
 5          0           0  -27.7
 6          1           0  -44.2
 7          0           0  -34.5
 8          0           0  -29.5
 9          0           0  -24.1
10          0           0  -25.4
# ℹ 3,579 more rows</code></pre>
</div>
</div>
<p>The column <code>russian_tv</code> equals <code>1</code> if the precinct has Russian TV reception. This is our treatment variable: <img src="https://latex.codecogs.com/png.latex?D">. But crucially, this is <em>not</em> randomly assigned. While it’s true that there is some natural variation in signal strength that is plausibly independent of other factors related to voting behavior, on average <em>precincts closer to Russia</em> are more likely to receive a signal. So suppose for the sake of argument that <em>conditional</em> on proximity to the Russian border, <code>russian_tv</code> is as good as randomly assigned. This is the <em>selection on observables</em> assumption. There’s no way to check this using our data alone. It’s something we need to justify based on our understanding of the world and the substantive problem at hand.</p>
<p>As our measure of proximity, we’ll use the dummy variable <code>within_25km</code> which equals <code>1</code> if the precinct is within 25km of the Russian border. This our <img src="https://latex.codecogs.com/png.latex?X">-variable. The <em>overlap</em> assumption requires that there are some precincts with Russian TV reception and some without in each distance category. This is an assumption that we <em>can</em> check using the data, so let’s do so before proceeding:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">precincts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(within_25km) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">share with Russian tv</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(russian_tv))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  within_25km `share with Russian tv`
        &lt;dbl&gt;                   &lt;dbl&gt;
1           0                   0.105
2           1                   0.692</code></pre>
</div>
</div>
<p>We see that just over 10% of that are <em>not</em> within 25km of the border have Russian TV reception while just under 70% of those within 25km have reception, so overlap is satisfied in this example. Neither of these values is close to 0% or 100%, so this dataset comfortably satisfies the overlap assumption.</p>
<p>To avoid taxing your memory about which variable is which, for the rest of this exercise, I’ll create a new dataset that renames the columns of <code>precincts</code> to <code>D</code>, <code>X</code>, and <code>Y</code> for the treatment, covariate, and outcome, respectively.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> precincts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">D =</span> russian_tv, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">X =</span> within_25km, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Y =</span> change)</span></code></pre></div></div>
</div>
<section id="computing-the-ate-the-hard-way" class="level3">
<h3 class="anchored" data-anchor-id="computing-the-ate-the-hard-way">Computing the ATE the Hard Way</h3>
<p>Now we’re ready to verify the calculations from above. First we’ll compute the ATE “the hard way”, in other words by computing each of the CATEs separately and averaging them. Warning: there’s a fair bit of <code>dplyr</code> to come!</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: compute the mean Y for each combination of (D, X)</span></span>
<span id="cb7-2">means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(D, X) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Ybar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(Y))</span>
<span id="cb7-5">means <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># display the results</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 3
# Groups:   D [2]
      D     X  Ybar
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0     0 -24.6
2     0     1 -34.2
3     1     0 -13.0
4     1     1 -32.2</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: reshape so the means of Y|D=0,X and Y|D=1,X are in separate cols</span></span>
<span id="cb9-2">means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> means <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> D, </span>
<span id="cb9-4">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> Ybar, </span>
<span id="cb9-5">              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_prefix =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Ybar'</span>)</span>
<span id="cb9-6">means <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># display the results</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 3
      X Ybar0 Ybar1
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0 -24.6 -13.0
2     1 -34.2 -32.2</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: attach a column with the proportion of X = 0 and X = 1</span></span>
<span id="cb11-2">regression_adjustment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(X) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(count)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>count) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(means, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">CATE =</span> Ybar1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> Ybar0) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># compute the CATEs</span></span>
<span id="cb11-9">regression_adjustment <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># display the results</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
      X     p Ybar0 Ybar1  CATE
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0 0.849 -24.6 -13.0 11.6 
2     1 0.151 -34.2 -32.2  2.01</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 4: at long last, compute the ATE!</span></span>
<span id="cb13-2">ATE <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> regression_adjustment <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">out =</span> (Ybar1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> Ybar0) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(out) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>()</span>
<span id="cb13-6">ATE</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 10.12062</code></pre>
</div>
</div>
</section>
<section id="computing-the-ate-the-easy-way" class="level3">
<h3 class="anchored" data-anchor-id="computing-the-ate-the-easy-way">Computing the ATE the Easy Way</h3>
<p>And now the easy way, using the two regressions described above<sup>7</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Construct Xtilde = X - mean(X) </span></span>
<span id="cb15-2">dat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Xtilde =</span> X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(X))</span>
<span id="cb15-4"></span>
<span id="cb15-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Regression of Y on D, X, and D*Xtilde</span></span>
<span id="cb15-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> D<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>Xtilde, dat)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Y ~ D + X + D:Xtilde, data = dat)

Coefficients:
(Intercept)            D            X     D:Xtilde  
    -24.591       10.121       -9.604       -9.562  </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Regression of Y on D, Xtilde, and D*Xtilde</span></span>
<span id="cb17-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Xtilde, dat)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Y ~ D * Xtilde, data = dat)

Coefficients:
(Intercept)            D       Xtilde     D:Xtilde  
    -26.045       10.121       -9.604       -9.562  </code></pre>
</div>
</div>
<p>Everything works as it should! The coefficient on <code>D</code> in each regression equals the ATE we computed by hand, namely 10.121, and the two regression agree with each other with the exception of the intercept.</p>
</section>
<section id="standard-errors" class="level3">
<h3 class="anchored" data-anchor-id="standard-errors">Standard Errors</h3>
<p>The nice thing about computing the ATE by running a regression rather than computing it “by hand” is that we can easily obtain valid standard errors, confidence intervals, and p-values if desired. For example, if you wanted “robust” standard errors for the ATE, you could simply use <code>lm_robust()</code> from the <code>estimatr</code> package as follows</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(estimatr)</span>
<span id="cb19-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(broom)</span>
<span id="cb19-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm_robust</span>(Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> Xtilde, dat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(term <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb19-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>df, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>outcome)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  term estimate std.error statistic      p.value conf.low conf.high
1    D 10.12062 0.4838613  20.91636 9.315921e-92 9.171946  11.06929</code></pre>
</div>
</div>
<p>Getting these “by hand” would have been much more work!</p>
<p>There is one subtle point that I should mention. I’ve heard it said on numerous occasions that the above standard error calculation is “not quite right” since we <em>estimated</em> the mean of <code>X</code> and used it to re-center <code>X</code> in the regression. Surely we should account for the sampling variability in <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> around its mean, the argument goes.</p>
<p>Perhaps I’m about to get blacklisted by the Econometrician’s alliance for saying this, but I’m not convinced. The usual way of thinking about inference for regression is <em>conditional</em> on the regressors, in this case <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?D">. Viewed from this perspective, <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> <em>isn’t random</em>. Now, of course, if you prefer to see the world through finite-population design-based lenses, <img src="https://latex.codecogs.com/png.latex?D"> is <em>definitely</em> random. But in this case it’s the <em>only</em> thing that’s random. The design-based view situates randomness exclusively in the <em>treatment assignment mechanism</em>. Under this view, since the units in our dataset are not considered as having been drawn from a hypothetical super-population, any summary statistic of their covariates <img src="https://latex.codecogs.com/png.latex?X"> is <em>fixed</em>. So again, <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> isn’t random and doesn’t contribute any uncertainty.</p>
<p><strong>Update</strong>: I initially concluded this section with “as far as I can see, it’s perfectly reasonable to use the sample mean of <img src="https://latex.codecogs.com/png.latex?X"> to re-center <img src="https://latex.codecogs.com/png.latex?X"> in the regression” but <a href="https://twitter.com/Apoorva__Lal">apoorva.lal</a> pointed out that this elides an important distinction. The key is that whether <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> is random or not depends on the question you’re interested in. If you want inference for the ATE <em>computed using the population values of</em> <img src="https://latex.codecogs.com/png.latex?X">, then <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> is random and you should account for its variability. But if you’re interested in the ATE computed using <em>the observed values</em> of <img src="https://latex.codecogs.com/png.latex?X"> in the sample, then <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> is fixed and you shouldn’t:</p>
<blockquote class="twitter-tweet blockquote">
<p lang="en" dir="ltr">
Point about whether Xbar is random depends on whether you're interested in SATE v PATE right? In any case, it is surprisingly easy to propagate that uncertainty forward with (what else?) GMM (earlier posts in the thread discuss the recentering point)<a href="https://t.co/3GXfTeF9DW">https://t.co/3GXfTeF9DW</a>
</p>
— apoorva.lal (<span class="citation" data-cites="Apoorva__Lal">@Apoorva__Lal</span>) <a href="https://twitter.com/Apoorva__Lal/status/1819397448852545620?ref_src=twsrc%5Etfw">August 2, 2024</a>
</blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This agrees with my logic about conditioning on <img src="https://latex.codecogs.com/png.latex?X"> and the design-based perspective, but it’s a much clearer way of making the relevant distinction so thanks for pointing it out!</p>
</section>
<section id="excluding-the-interaction" class="level3">
<h3 class="anchored" data-anchor-id="excluding-the-interaction">Excluding the Interaction</h3>
<p>Finally, we’ll verify the derivations from above for <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> in the regression that <em>excludes</em> an interaction term. First we’ll compute the “variance weighted average” of CATEs by hand and check that it does not agree with the ATE:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the propensity score pi(X)</span></span>
<span id="cb21-2">pscore <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(X) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pi =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(D))</span>
<span id="cb21-5"></span>
<span id="cb21-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the weights w </span></span>
<span id="cb21-7">regression_adjustment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(regression_adjustment, pscore, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb21-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">w =</span> p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> pi) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> pi))) </span>
<span id="cb21-9"></span>
<span id="cb21-10">regression_adjustment <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># display the results</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 7
      X     p Ybar0 Ybar1  CATE    pi     w
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0 0.849 -24.6 -13.0 11.6  0.105 0.713
2     1 0.151 -34.2 -32.2  2.01 0.692 0.287</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute the variance weighted average of the CATEs</span></span>
<span id="cb23-2">wCATE <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> regression_adjustment <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">wCATE =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(w <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> CATE)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb23-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(wCATE)</span>
<span id="cb23-5"></span>
<span id="cb23-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">wCATE =</span> wCATE, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ATE =</span> ATE)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>    wCATE       ATE 
 8.822285 10.120617 </code></pre>
</div>
</div>
<p>Finally, we’ll compare this hand calculation to the results of a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X"> without an interaction:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> X, dat)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Y ~ D + X, data = dat)

Coefficients:
(Intercept)            D            X  
    -24.302        8.822      -14.614  </code></pre>
</div>
</div>
<p>As promised, the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> equals the variance-weighted average of CATEs that we computed by hand, namely 8.822, which does not equal the ATE, 10.121. Here the CATE for <img src="https://latex.codecogs.com/png.latex?X=1"> receives <em>more weight</em> when the interaction term is omitted, pulling the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> away from the ATE and towards the (smaller) CATE for <img src="https://latex.codecogs.com/png.latex?X=1">.</p>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>I hope this post has convinced you that regression adjustment isn’t simply a matter of tossing a collection of covariates into your regression! In general, the coefficient on <img src="https://latex.codecogs.com/png.latex?D"> in a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?D"> will <em>not</em> equal the ATE of <img src="https://latex.codecogs.com/png.latex?D">. Instead it will be a weighted average of CATEs. To obtain the ATE we need to include an <em>interaction</em> between <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?D">. The simplest way to get your favorite statistical software package to calculate this for you, along with an appropriate standard error, is by <em>de-meaning</em> <img src="https://latex.codecogs.com/png.latex?X"> before including the interaction. And don’t forget that causal inference <em>always requires untestable assumptions</em>, in this case the selection-on-observables assumption. While implementation details are important, getting them right won’t make any difference if you’re not adjusting for the right covariates in the first place.</p>
</section>
<section id="appendix-the-missing-algebra" class="level2">
<h2 class="anchored" data-anchor-id="appendix-the-missing-algebra">Appendix: The Missing Algebra</h2>
<p>This section provides the algebra needed to justify the expression for <img src="https://latex.codecogs.com/png.latex?%5Calpha_1"> from a regression that omits the interaction between <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X">. In particular, we will show that <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Ctext%7BCov%7D(Y,%5Ctilde%7BD%7D)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)%7D%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)(%5Cbeta_1%20+%20%5Cbeta_3%20X)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D.%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BD%7D"> is the error term from a population linear regression of <img src="https://latex.codecogs.com/png.latex?D"> on <img src="https://latex.codecogs.com/png.latex?X">, namely <img src="https://latex.codecogs.com/png.latex?D%20=%20%5Cgamma_0%20+%20%5Cgamma_1%20X%20+%20%5Ctilde%7BD%7D"> so that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D)%20=%20%5Cmathbb%7BE%7D(X%5Ctilde%7BD%7D)%20=%200"> by construction. The proof isn’t too difficult, but it’s a bit tedious so I thought you might prefer to skip it on a first reading. Still here? Great! Let’s dive into the algebra.</p>
<p>We need to calculate <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)">. A nice way to carry out this calculation is by applying the <a href="https://en.wikipedia.org/wiki/Law_of_total_covariance">law of total covariance</a>. You may have heard of the law of total variance, but in my view the law of total covariance is more useful. Just as you can deduce all the properties of variance from the properties of covariance, using <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(W,%20W)%20=%20%5Ctext%7BVar%7D(W)">, you can deduce the law of total variance from the law of covariance! In the present example, the law of total covariance allows us to write <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)%20=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D%7CX)%5D%20+%20%5Ctext%7BCov%7D%5B%5Cmathbb%7BE%7D(Y%7CX),%20%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)%5D.%0A"> If this looks intimidating, don’t worry: we’ll break it down piece by piece. The second term on the RHS is a covariance between two random variables: <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D,X)">.<sup>8</sup> We already have an equation for <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BD%7D">, namely the population linear regression of <img src="https://latex.codecogs.com/png.latex?D"> on <img src="https://latex.codecogs.com/png.latex?X">, so let’s use it to simplify <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)">: <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)%20=%20%5Cmathbb%7BE%7D(D%20-%20%5Cgamma_0%20-%20%5Cgamma_1%20X%7CX)%20=%20%5Cmathbb%7BE%7D(D%7CX)%20-%20%5Cgamma_0%20-%20%5Cgamma_1%20X.%0A"> Here’s the key thing to note: since <img src="https://latex.codecogs.com/png.latex?D"> is binary, the population linear regression of <img src="https://latex.codecogs.com/png.latex?D"> on <img src="https://latex.codecogs.com/png.latex?X"> is <em>identical</em> to the conditional mean of <img src="https://latex.codecogs.com/png.latex?D"> given <img src="https://latex.codecogs.com/png.latex?X">.<sup>9</sup> This tells us that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)=0">. Since the covariance of anything with a constant is zero, the second term on the RHS of the law of total covariance drops out, leaving us with <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)%20=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D%7CX)%5D%20=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20D%20-%20%5Cgamma_0%20-%20%5Cgamma_1%20X%20%7C%20X)%5D.%0A"> Now let’s deal with the conditional covariance inside the expectation. Remember that conditioning on <img src="https://latex.codecogs.com/png.latex?X"> is equivalent to saying “suppose that <img src="https://latex.codecogs.com/png.latex?X"> were known”. Anything that’s known is constant, not random. So we can treat <em>both</em> <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?%5Cdelta"> as constants and apply the usual rules for covariance to obtain <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(Y,%20D%20-%20%5Cgamma_0%20-%20%5Cgamma_1%20X%20%7C%20X)%20=%20%5Ctext%7BCov%7D(Y,%20D%7CX).%0A"> Therefore, <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)%20=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20D%7CX)%5D">. A very similar calculation using the <a href="https://en.wikipedia.org/wiki/Law_of_total_variance">law of total variance</a> gives <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)%20&amp;=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D%7CX)%5D%20+%20%5Ctext%7BVar%7D%5B%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)%5D%20=%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D%7CX)%5D%5C%5C%0A&amp;=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%20-%20%5Cgamma_0%20-%20%5Cgamma_1%20X%7C%20X)%5D%5C%5C%0A&amp;=%20%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%0A%5Cend%7Baligned%7D%0A"> since <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)%20=%200"> and the variance of any constant is simply zero. So, with the help of the laws of total covariance and variance, we’ve established that<br>
<img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_1%20%5Cequiv%20%5Cfrac%7B%5Ctext%7BCov%7D(Y,%20%5Ctilde%7BD%7D)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BD%7D)%7D=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20D%7CX)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D%0A"> in this example. Note that this does <em>not</em> hold in general: it relies on the fact that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(%5Ctilde%7BD%7D%7CX)=0">, which holds in our example because <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(D%7CX)%20=%20%5Cgamma_0%20+%20%5Cgamma_1%20X"> given that <img src="https://latex.codecogs.com/png.latex?X"> is binary.</p>
<p>We’re very nearly finished. All that remains is to simplify the numerator. To do this, we’ll use the equality <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20DX%20+%20U%0A"> where <img src="https://latex.codecogs.com/png.latex?U%20%5Cequiv%20Y%20-%20%5Cmathbb%7BE%7D(Y%7CD,%20X)"> satisfies <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(U%7CD,X)%20=%200"> <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>. This allows us to write <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BCov%7D(Y,%20D%7CX)%20&amp;=%20%5Ctext%7BCov%7D(%5Cbeta_0%20+%20%5Cbeta_1%20D%20+%20%5Cbeta_2%20X%20+%20%5Cbeta_3%20DX%20+%20U,%20D%7CX)%5C%5C%0A&amp;=%20%5Cbeta_1%20%5Ctext%7BCov%7D(D,%20D%7CX)%20+%20%5Cbeta_3%20%5Ctext%7BCov%7D(DX,%20D%7CX)%20+%20%5Ctext%7BCov%7D(U,D%7CX)%5C%5C%0A&amp;=%20%5Cbeta_1%20%5Ctext%7BVar%7D(D%7CX)%20+%20%5Cbeta_3%20X%20%5Ccdot%20%5Ctext%7BVar%7D(D%7CX)%20+%20%5Ctext%7BCov%7D(U,D%7CX)%5C%5C%0A&amp;=%20%5Ctext%7BVar%7D(D%7CX)(%5Cbeta_1%20+%20%5Cbeta_3%20X)%20+%20%5Ctext%7BCov%7D(U,%20D%7C%20X).%0A%5Cend%7Baligned%7D%0A"> So what about that pesky <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(U,D%7CX)"> term? By the law of iterated expectations this turns out to equal zero, since <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BCov%7D(U,D%7CX)%20&amp;=%20%5Cmathbb%7BE%7D(DU%7CX)%20-%20%5Cmathbb%7BE%7D(D%7CX)%20%5Cmathbb%7BE%7D(U%7CX)%5C%5C%0A&amp;=%20%5Cmathbb%7BE%7D_%7BD%7CX%7D%5BD%5Cmathbb%7BE%7D(U%7CD,X)%5D%20-%20%5Cmathbb%7BE%7D(D%7CX)%20%5Cmathbb%7BE%7D_%7BD%7CX%7D%5B%5Cmathbb%7BE%7D(U%7CD,X)%5D%0A%5Cend%7Baligned%7D%0A"> and, again, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(U%7CD,X)%20=%200"> <a href="../../post/why-econometrics-is-confusing-part-1-the-error-term/">by construction</a>. So we’re left with <img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_1%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BCov%7D(Y,%20D%7CX)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)(%5Cbeta_1%20+%20%5Cbeta_3%20X)%5D%7D%7B%5Cmathbb%7BE%7D%5B%5Ctext%7BVar%7D(D%7CX)%5D%7D.%0A"></p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>See <a href="../../post/a-good-instrument-is-a-bad-control/">this post</a> for a prototypical example of a “bad control” and the second half of my <a href="https://www.treatment-effects.com/02-selection-on-observables.pdf">slides</a> for some general discussion of “bad controls.” These <a href="https://ditraglia.com/erm/15-selection-on-observables.pdf">alternative slides</a> from my <a href="https://ditraglia.com/erm/">core ERM</a> course cover similar ground but make a more explicit connection to good and bad advice about bad controls that one encounters in introductory econometrics books.↩︎</p></li>
<li id="fn2"><p>Call it “Frisch-Waugh-Lovell” if you must, but I will continue trying to <a href="https://ideas.repec.org/p/arx/papers/2307.00369.html">make fetch happen</a>.↩︎</p></li>
<li id="fn3"><p>If you want the standard error of <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> and not just the point estimate, then replace <img src="https://latex.codecogs.com/png.latex?Y"> with the residual from a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X">.↩︎</p></li>
<li id="fn4"><p>This is a nice homework exercise to test your understanding of the post!↩︎</p></li>
<li id="fn5"><p>If you have a very <em>large</em> number of categories things are still fine <em>in theory</em> but can break down in practice, since you’ll typically have very few observations in each “cell” corresponding to the different values of the categorical variables. But this is a topic for another day!↩︎</p></li>
<li id="fn6"><p>I certainly don’t!↩︎</p></li>
<li id="fn7"><p>If you’re rusty on R’s formula syntax, you may find my <a href="../../post/the-r-formula-cheatsheet/">cheat sheet</a> helpful.↩︎</p></li>
<li id="fn8"><p>An unconditional expectation like <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> is a constant: it’s a probability-weighted average of all possible realizations of <img src="https://latex.codecogs.com/png.latex?Y">. In contrast, a conditional expectation like <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> is a random variable: it’s our “best guess” of <img src="https://latex.codecogs.com/png.latex?Y"> based on observing <img src="https://latex.codecogs.com/png.latex?X">, where “best” means “minimum mean-squared error”. See <a href="https://youtu.be/CbsZHNQX54s?si=MN80w00yj1W5yDmX">this video</a> for some more details on conditional expectation.↩︎</p></li>
<li id="fn9"><p>In general, a population linear regression gives the best linear approximation of the conditional mean, but when the conditional mean is in fact linear, the two coincide. The reason these coincide in our example is that we can write <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BD%7CX%5D%20=%20X%20%5Cmathbb%7BE%7D(D%7CX=1)%20+%20(1%20-%20X)%20%5Cmathbb%7BE%7D(D%7CX=0)">. There are only two values that <img src="https://latex.codecogs.com/png.latex?X"> can take, and we are simply “picking out” the average value of <img src="https://latex.codecogs.com/png.latex?D"> in each case. But we can re-arrange this to take precisely the form <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20+%20%5Ckappa%20X"> defining <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%20%5Cmathbb%7BE%7D(D%7CX=0)"> and <img src="https://latex.codecogs.com/png.latex?%5Ckappa%20=%20%5Cmathbb%7BE%7D(D%7CX=1)%20-%20%5Cmathbb%7BE%7D(D%7CX=0)">.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>causal inference</category>
  <guid>https://www.econometrics.blog/post/how-to-do-regression-adjustment/</guid>
  <pubDate>Fri, 02 Aug 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Is it better to improve sensitivity or specificity?</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/is-it-better-to-improve-sensitivity-or-specificity/</link>
  <description><![CDATA[ 




<p>Here’s a slightly unusual exercise on the topic of Bayes’ Theorem for those of you teaching or studying introductory probability. Imagine that you’re developing a diagnostic test for a disease. The test is very simple: it either comes back positive or negative. You have a choice between slightly increasing either your test’s <a href="https://en.wikipedia.org/wiki/Sensitivity_and_specificity">sensitivity or its specificity</a>. If your goal is to maximize the <a href="https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values">positive predictive value (PPV)</a> of your test, i.e.&nbsp;the probability that a patient has the disease given that the test comes back positive, which test characteristic should you choose to improve?</p>
<section id="an-open-invitation" class="level2">
<h2 class="anchored" data-anchor-id="an-open-invitation">An Open Invitation</h2>
<p>If you’re still hungry for more Bayes’ Theorem after reading this post, then why not join the <em>Summer of Bayes 2024</em> online reading group? If you’d like to be added to the mailing list, just send an email to <code>bayes [at] user.sent.as</code>. Recordings of past sessions along with slides and other materials are available to group members via the Summer of Bayes discussion board. And now back to your regularly-scheduled blog content…</p>
</section>
<section id="odds-arent-so-odd" class="level2">
<h2 class="anchored" data-anchor-id="odds-arent-so-odd">Odds aren’t so odd!</h2>
<p>While I give you a few minutes to pause and ponder this question, here’s a brief rant on the topic of odds. If you’re anything like me, the first time you encountered odds, you thought to yourself</p>
<blockquote class="blockquote">
<p>What is this $*@%^!? Why would anyone want to spoil a perfectly good probability by dividing it by one minus itself?“<sup>1</sup></p>
</blockquote>
<p>But it’s time to take the red pill and see the world as it really is: <strong>the only reason you prefer to think in terms of probabilities rather than odds is because you’ve been brainwashed by the educational system</strong>. Of course I exaggerate slightly, but the point is that odds are just as natural as probabilities; we’re just not as accustomed to working with them. In many situations in probability, statistics, and econometrics, it turns out that working with odds (or their logarithm) makes life <em>much</em> simpler, as I will try to convince you with a simple example.</p>
<p>First we need to define odds. Consider some event <img src="https://latex.codecogs.com/png.latex?A"> with probability <img src="https://latex.codecogs.com/png.latex?p"> of occurring. Then we say that the <strong>odds</strong> of <img src="https://latex.codecogs.com/png.latex?A"> are <img src="https://latex.codecogs.com/png.latex?p/(1%20-%20p)">. For example, if <img src="https://latex.codecogs.com/png.latex?p%20=%201/3"> then the event <img src="https://latex.codecogs.com/png.latex?A"> is equivalent to drawing a red ball from an urn that contains one red and two blue balls: the probability gives the <em>ratio of red balls to total balls</em>. The odds of <img src="https://latex.codecogs.com/png.latex?A">, on the other hand, equal <img src="https://latex.codecogs.com/png.latex?1/2">: odds give the <em>ratio of red balls to blue balls</em>. Since probabilities are between 0 and 1, odds are between 0 and <img src="https://latex.codecogs.com/png.latex?%5Cinfty">. Odds of 0 mean that the event is impossible, while odds of <img src="https://latex.codecogs.com/png.latex?%5Cinfty"> mean that the event is certain. Odds of 1 mean that the event is just as likely to occur as not to occur.</p>
<p>Now here’s an example that you’ve surely seen before:</p>
<blockquote class="blockquote">
<p>One in a hundred women has breast cancer <img src="https://latex.codecogs.com/png.latex?(B)">. If you have breast cancer, there is a 95% chance that you will test positive <img src="https://latex.codecogs.com/png.latex?(+)">; if you do not have breast cancer <img src="https://latex.codecogs.com/png.latex?(B%5EC)">, there is a 2% chance that you will nonetheless test positive <img src="https://latex.codecogs.com/png.latex?(+)">. We know nothing about Alice other than the fact that she tested positive. How likely is it that she has breast cancer?</p>
</blockquote>
<p>It’s easy enough to solve this problem using Bayes’ Theorem, as long as you have pen and paper handy: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AP(B%20%7C%20+)%20&amp;=%20%5Cfrac%7BP(+%7CB)P(B)%7D%7BP(+)%7D%20=%20%5Cfrac%7BP(+%7CB)P(B)%7D%7BP(+%7CB)P(B)%20+%20P(+%7CB%5EC)P(B%5EC)%7D%5C%5C%0A&amp;=%20%5Cfrac%7B0.95%20%5Ctimes%200.01%7D%7B0.95%20%5Ctimes%200.01%20+%200.02%20%5Ctimes%200.99%7D%20%5Capprox%200.32.%0A%5Cend%7Baligned%7D%0A"> But what if I asked you how the result would change if only one in a thousand women had breast cancer? What if I changed the <a href="https://en.wikipedia.org/wiki/Sensitivity_and_specificity">sensitivity</a> of the test from 95% to 99% or the <a href="https://en.wikipedia.org/wiki/Sensitivity_and_specificity">specificity</a> from 98% to 95%? If you’re anything like me, you would struggle to do these calculations in your head. That’s because <img src="https://latex.codecogs.com/png.latex?P(B%7C+)"> is a <em>highly non-linear</em> function of <img src="https://latex.codecogs.com/png.latex?P(B)">, <img src="https://latex.codecogs.com/png.latex?P(+%7CB)">, and <img src="https://latex.codecogs.com/png.latex?P(+%7CB%5EC)">.</p>
<p>In contrast, working with odds makes this problem a snap. The key point is that <img src="https://latex.codecogs.com/png.latex?P(B%7C+)"> and <img src="https://latex.codecogs.com/png.latex?P(B%5EC%7C+)"> have the same denominator, namely <img src="https://latex.codecogs.com/png.latex?P(+)">: <img src="https://latex.codecogs.com/png.latex?%0AP(B%20%7C%20+)%20=%20%5Cfrac%7BP(+%7CB)P(B)%7D%7BP(+)%7D,%20%5Cquad%0AP(B%5EC%20%7C%20+)%20=%20%5Cfrac%7BP(+%7CB%5EC)P(B%5EC)%7D%7BP(+)%7D%0A"> Notice that <img src="https://latex.codecogs.com/png.latex?P(+)"> was the “complicated” term in <img src="https://latex.codecogs.com/png.latex?P(B%7C+)">; the numerator was simple. Since the odds of <img src="https://latex.codecogs.com/png.latex?B"> given <img src="https://latex.codecogs.com/png.latex?(+)"> is defined as the ratio of <img src="https://latex.codecogs.com/png.latex?P(B%7C+)"> to <img src="https://latex.codecogs.com/png.latex?P(B%5EC%7C+)">, the denominator cancels and we’re left with <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BOdds%7D(B%7C+)%20%5Cequiv%20%5Cfrac%7BP(B%7C+)%7D%7BP(B%5EC%7C+)%7D%20=%20%5Cfrac%7BP(+%7CB)%7D%7BP(+%7CB%5EC)%7D%20%5Ctimes%20%5Cfrac%7BP(B)%7D%7BP(B%5EC)%7D.%0A"> In other words, the <em>posterior odds</em> of <img src="https://latex.codecogs.com/png.latex?B"> equal the <em>likelihood ratio</em>, <img src="https://latex.codecogs.com/png.latex?P(+%7CB)/P(+%7CB%5EC)">, multiplied by the <em>prior odds</em> of <img src="https://latex.codecogs.com/png.latex?B">, <img src="https://latex.codecogs.com/png.latex?P(B)/P(B%5EC)">: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BPosterior%20Odds%7D%20=%20%5Ctext%7B(Likelihood%20Ratio)%7D%20%5Ctimes%20%5Ctext%7B(Prior%20Odds)%7D.%0A"> Now we can easily solve the original problem in our head. The prior odds are 1/99 while the likelihood ratio is 95/2. Rounding these to 0.01 and 50 respectively, we find that the posterior odds are around 1/2. This means that Alice’s chance of having breast cancer is roughly equivalent to the chance of drawing a red ball from an urn with one red and two blue balls. There’s no need to convert this back to a probability since we can already answer the question: it’s considerably more likely that Alice <em>does not</em> have breast cancer. But if you insist, odds of 1/2 give a probability of 1/3, so in spite of rounding and calculating in our heads we’re within 0.3% of the exact answer!</p>
<p>Repeat after me: <strong>odds are on a multiplicative scale</strong>. This is their key virtue and the reason why they make it so easy to explore variations on the original problem. If one in a thousand women has breast cancer, the prior odds become 1/999 so we simply divide our previous result by 10, giving posterior odds of around 1/20. If we instead changed the sensitivity from 95% to 99% and the specificity from 98% to 95%, then the likelihood ratio would change from <img src="https://latex.codecogs.com/png.latex?95/2%20%5Capprox%2050"> to <img src="https://latex.codecogs.com/png.latex?99/5%20%5Capprox%2020">.</p>
</section>
<section id="the-solution" class="level2">
<h2 class="anchored" data-anchor-id="the-solution">The Solution</h2>
<p>Have I given you enough time to come up with your own solution? Fantastic! In case you hadn’t already guessed, that little digression about odds served an important purpose: my solution will use odds rather than probabilities. Our goal is to increase the positive predictive value (PPV) of the test, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BPPV%7D%20%5Cequiv%20P(%5Ctext%7BHas%20Disease%7D%7C%5Ctext%7BTest%20Positive%7D),%0A"> by as much as possible, either by improving the test’s sensitivity <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSensitivity%7D%20%5Cequiv%20P(%5Ctext%7BTest%20Positive%7D%20%7C%20%5Ctext%7BHas%20Disease%7D)%0A"> or its specificity <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BSpecificity%7D%20%5Cequiv%20P(%5Ctext%7BTest%20Negative%7D%20%7C%20%5Ctext%7BDoesn't%20Have%20Disease%7D).%0A"> To answer this question, we’ll start by substituting these definitions into the odds form of Bayes’ Theorem introduced above, yielding <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BPosterior%20Odds%7D%20=%20%5Cfrac%7B%5Ctext%7BPPV%7D%7D%7B1%20-%20%5Ctext%7BPPV%7D%7D%20=%20%5Cfrac%7B%5Ctext%7BSensitivity%7D%7D%7B1%20-%20%5Ctext%7BSpecificity%7D%7D%20%5Ctimes%20%5Ctext%7BPrior%20Odds%7D.%0A"> This expression makes it clear that increasing either the sensitivity or specificity of the test increases the posterior odds. And because the PPV is a <em>strictly increasing</em> function of the posterior odds, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BPPV%7D%20=%20%5Cfrac%7B%5Ctext%7BPosterior%20Odds%7D%7D%7B1%20+%20%5Ctext%7BPosterior%20Odds%7D%7D,%0A"> this also increases the PPV. So now the question is: which of these two possibilities gives us the most bang for our buck? A natural idea would be to compare the <em>marginal effect</em> of increasing sensitivity by a small amount to the marginal effect of increasing specificity by the same amount. We can do this by comparing the <em>partial derivatives</em> of the PPV with respect to sensitivity and specificity. But, again, the PPV is an <em>increasing</em> function of the posterior odds, so we can simplify our task by comparing the <em>derivatives</em> of the posterior odds with respect to sensitivity and specificity. By the chain rule, any claim about the relative magnitudes of these derivatives computed for the odds will also hold for the PPV.</p>
<p>But why stop with the odds? We can simplify our task even further by comparing the <em>derivatives</em> of the <em>logarithm</em> of the posterior odds with respect to sensitivity and specificity. This is because the logarithm is, again, an <em>increasing transformation</em> of the odds. Since <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(%5Ctext%7BPosterior%20Odds%7D)%20=%20%5Clog(%5Ctext%7BSensitivity%7D)%20-%20%5Clog(1%20-%20%5Ctext%7BSpecificity%7D)%20%20+%20%5Clog(%5Ctext%7BPrior%20Odds%7D).%0A"> our required derivatives are <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20%5Clog(%5Ctext%7BPosterior%20Odds%7D)%7D%7B%5Cpartial%20%5Ctext%7BSensitivity%7D%7D%20=%20%5Cfrac%7B1%7D%7B%5Ctext%7BSensitivity%7D%7D%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%20%5Cfrac%7B%5Cpartial%20%5Clog(%5Ctext%7BPosterior%20Odds%7D)%7D%7B%5Cpartial%20%5Ctext%7BSpecificity%7D%7D%20=%20%5Cfrac%7B1%7D%7B1%20-%20%5Ctext%7BSpecificity%7D%7D.%0A"> Now for the punchline: the ratio of the derivative with respect to specificity divided by that with respect to sensitivity is <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Cpartial%20%5Clog(%5Ctext%7BPosterior%20Odds%7D)/%5Cpartial%20%5Ctext%7BSpecificity%7D%7D%7B%5Cpartial%20%5Clog(%5Ctext%7BPosterior%20Odds%7D)/%5Cpartial%20%5Ctext%7BSensitivity%7D%7D%20=%20%5Cfrac%7B1/(1%20-%20%5Ctext%7BSpecificity%7D)%7D%7B1/%5Ctext%7BSensitivity%7D%7D%20=%20%5Cfrac%7B%5Ctext%7BSensitivity%7D%7D%7B1%20-%20%5Ctext%7BSpecificity%7D%7D%0A"> and this is <em>precisely the likelihood ratio</em> from the odds form of Bayes’ Theorem! Hence, <strong>whenever the likelihood ratio is greater than one we’d prefer to increase the test’s specificity; whenever it’s less than one we’d prefer to increase the sensitivity.</strong> If the likelihood ratio is equal to one, then it doesn’t matter which we choose.</p>
<p>Case closed, right? Well not quite. We can say a bit more by thinking about what it <em>means</em> for the likelihood ratio to be greater than or less than one. Examining the odds form of Bayes’ Theorem from above, we see that a likelihood ratio less than one means that our posterior probability that a person is sick <em>falls</em> when she tests positive. In other words, this corresponds to a test that is <em>worse than useless</em>: it’s actually <em>misleading</em>. In contrast, a likelihood ratio greater than one means that the test is <em>informative</em>: a positive test result increases our belief that the person is sick. Any real-world diagnostic test will have a likelihood ratio greater than one. Indeed, if we <em>had</em> such an actively mis-leading test, we could easily convert it into an informative one by simply reversing the test’s outcome: if someone tests positive, we tell them they’re negative, and vice versa. This reversal would result in a likelihood ratio greater than one. Therefore, in all cases–whether we start with an informative test or reverse a misleading one–<strong>we should prefer to increase the test’s specificity</strong>.</p>
</section>
<section id="epilogue" class="level2">
<h2 class="anchored" data-anchor-id="epilogue">Epilogue</h2>
<p>Of course, this exercise is predicated upon the assumption that we want to maximize the PPV and that we can freely adjust both the test’s sensitivity and its specificity. In practice, one or more of these assumptions might not hold. Indeed, PPV is not the be all and end all of diagnostic testing. A full accounting would need to consider the relative costs of false positives and false negatives along with the prevalence of the disease. Still, I hope this exercise gives you a flavor of the power of odds for simplifying complex problems in probability and statistics.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I know first-hand that this sentiment is shared by at least one distinguished professor of probability theory, so at least I’m not completely alone in my earlier view of things!↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <category>teaching</category>
  <guid>https://www.econometrics.blog/post/is-it-better-to-improve-sensitivity-or-specificity/</guid>
  <pubDate>Thu, 25 Jul 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>How to Read an Econometrics Paper</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/how-to-read-an-econometrics-paper/</link>
  <description><![CDATA[ 




<p>Reading and understanding econometrics papers can be hard work. Most published articles, even review articles, are written by specialists for specialists. Unless you’re already familiar with the literature, it can be a real uphill battle to make it through a recent paper. In grad school I remember our professors repeatedly admonishing me and the rest of the cohort to “read the papers!” But when I did my best to follow this advice, I nearly always felt like I was banging my head against a wall.</p>
<!--Giving a helpful presentation about an econometrics paper is even harder. You don't only need to understand the paper itself, you also need to understand the *needs of your audience*.-->
<p>Effective reading is a skill that can be learned, and the only way to learn is through practice. But you can learn the easy way or the hard way. The hard way is to keep trying and hope for the best; the easy way is to adjust your approach based on the experiences of others. With that in mind, this post offers some tips and tricks that I’ve picked up through the years for reading technical material efficiently and effectively. My target audience is PhD students in Economics, especially students in the Econometrics Reading Group at Oxford, but I hope that some of the following tips will be helpful for others as well.</p>
<p>If you have any tips of your own, or if you violently agree or disagree with any of mine, I hope to hear from you in the comments section below!</p>
<section id="read-something-else-instead" class="level2">
<h2 class="anchored" data-anchor-id="read-something-else-instead">Read Something Else Instead</h2>
<p>The first question to ask yourself is whether you should even be reading this paper in the first place. Just because White’s (1980) paper on heteroskedasticity-robust standard errors is a “classic” in econometrics, that doesn’t mean that you should read it. In fact, as a graduate student just starting out, you probably shouldn’t! The paper that introduces a new idea or procedure is rarely the paper that gives the clearest explanation. Reading a good textbook explanation is a much more effective way to get to grips with a new idea. You might, for example, try reading the relevant chapters in White’s textbook <em>Asymptotic Theory for Econometricians</em> instead.</p>
<p>But sometimes you have to read a particular paper. Maybe it’s the paper you’ve been assigned to present in a reading group, or maybe it’s highly relevant to your own research. In that case you may still want to <em>start</em> by reading something else. For example, there might be a more recent paper or review article that gives a good summary of the idea or method in question. Reading this paper first can make it much easier for you to tackle the original paper.</p>
<p>So to all those professors out there who keep telling their students to “read the papers!” I say: “read the papers, but only after you’ve read something else first!”</p>
<section id="dont-assume-you-have-to-understand-the-whole-thing" class="level3">
<h3 class="anchored" data-anchor-id="dont-assume-you-have-to-understand-the-whole-thing">Don’t Assume You Have to Understand the Whole Thing</h3>
<p>As a general rule you should <em>not expect to understand everything</em> when you read a paper. You may only get 10% on the first read, but that’s fine! Besides papers I’ve written myself, there are relatively few articles that I’ve checked line-by-line from start to finish. Even if you’ve been assigned to <em>present</em> a paper that doesn’t mean that you need to understand every detail of every lemma in the online technical appendix. Instead your goal should be to understand the <em>key ideas</em> and contributions of the paper. Like anything in life, there are diminishing returns to effort in reading a paper. When reading papers to support your own research, you can be even more selective. The key question becomes: “how is this relevant for what <em>I’m doing</em>?” It may be that you only need to understand a small part of the paper to get what you need.</p>
</section>
<section id="dont-assume-youre-stupid" class="level3">
<h3 class="anchored" data-anchor-id="dont-assume-youre-stupid">Don’t Assume You’re Stupid</h3>
<p>If you’re confused, don’t assume that it’s your fault. Notice your confusion and try to get to the bottom of it without taking things for granted or engaging in negative self-talk. The only way to learn is by getting confused and then unconfusing yourself!</p>
<p>You may be confused because the authors assume you know something that you don’t. They are likely experts in the field who have spent years thinking about this particular question. You, on the other hand, are just starting out. As you gain a bit more context, things may fall rapidly into place. (See my next tip below.)</p>
<p>You may be confused because the paper is confusingly written. Writing is hard, and technical writing is especially hard. The referee process can even make papers <em>more confusing</em>, since our <a href="https://www.sqare.org">present system for evaluating research</a> involves multiple rounds of revisions in which the authors must try to satisfy referees with differing views. The result is that published papers often contain a substantial element of “cruft” that distracts from the main message.</p>
<p>You may even be confused because the paper is wrong! As a good Bayesian, you shouldn’t immediately jump to the conclusion that you, a newcomer to this field, have stumbled upon a crucial error that everyone else has missed. On the other hand, you <em>definitely</em> shouldn’t believe everything that you see in print! All papers are wrong in some way, and some papers are wrong in serious and important ways. If you’re confused, it’s worth considering whether the authors were confused too!</p>
</section>
<section id="spread-yourself-thin" class="level3">
<h3 class="anchored" data-anchor-id="spread-yourself-thin">Spread Yourself Thin</h3>
<p>Let’s say you really need to get to grips with paper X on topic Y. You’ve read the relevant textbook material, you’ve tried a review article, and you’re still struggling. What now? Strange though it may sound, one helpful answer is to read <em>more papers</em> on topic Y in an extremely shallow way. Skim the abstracts, introductions, and conclusions. Note any terms or concepts that keep appearing, especially ones that you don’t understand.</p>
<p>I can think of many occasions when I skimmed nine papers and didn’t understand any of them, but then read a tenth and suddenly everything clicked. The key here is <em>context</em>. When you’re new to topic Y, there will be lots of little things that you’ve never thought of before but that the literature takes for granted. Since most papers are written for specialists by specialists, crucial details are often left out or glossed over as if they were obvious. Just as fish don’t realize that they’re in water, specialists often fail to realize that they’re taking a lot of things for granted. The reason that reading <em>many</em> papers can help is that different specialists will leave out <em>different</em> details. The key that you need to understand paper X might be a seemingly throwaway comment in paper Z!</p>
</section>
<section id="explain-it-to-someone-else" class="level3">
<h3 class="anchored" data-anchor-id="explain-it-to-someone-else">Explain It to Someone Else</h3>
<p>The best way to understand something is by trying to explain it to someone else. This holds true even when the “someone else” in question is just a figment of your imagination. As you read, start by trying to explain the paper to yourself <em>in your own words</em>. I find it helpful to write in the margins of the paper as I go, summarizing the key ideas with less jargon and simpler terminology and notation. When you’re confused about something, try to put your confusion into words; make it concrete and write it down.</p>
<p>Talking to a real person can be even more helpful. If you’re in a reading group, try discussing the paper informally with one your peers who has also read it. You may be surprised at how much two people, neither of whom understands something on their own, can learn from each other. In this brave new world of LLMs like Claude and GPT-4o, you could even try uploading your paper and discussing it with an AI. You <em>cannot</em> assume that the AI will necessarily give you reliable information about the paper, but just like a peer who only partially understands it, an AI can be a useful sounding board for your own ideas and confusions. Noticing mistakes in the AI’s understanding, pointing them out and continuing the conversation can also be a great way to clarify your own thinking.</p>
</section>
<section id="head-straight-for-the-simulation-empirical-example" class="level3">
<h3 class="anchored" data-anchor-id="head-straight-for-the-simulation-empirical-example">Head Straight for the Simulation / Empirical Example</h3>
<p>Ideally every paper would have a fantastic introduction that makes it clear what the paper is about and why it’s important. In real life, introductions can be hit-or-miss. So after reading the introduction, you might consider heading straight for the simulation study and/or empirical example. Most econometrics papers propose a method that solves a particular problem. What is the problem, and why does the particular data generating process (DGP) in the simulation (or the real data in the empirical example) exhibit it? What parameters of the simulation DGP control the extent of the problem? What is the “old” method on which the paper improves? This is likely to be something familiar such as a “textbook” method. How exactly is the new method <em>implemented</em>? In other words, how exactly is it <em>computed</em> from real or simulated data? Try to write down all the steps in the implementation in a sufficiently precise way that you could code it yourself.</p>
<p>Once you know how to answer these questions you’re in a much better position to understand the rest of the paper. As you read through the assumptions and theorems, refer back to the simulation study. Why does the DGP satisfy the assumptions? Can you think of a different DGP in which the assumptions fail? Is there anything “fishy” about the simulation example? Does it seem like the authors have cooked the books in some way, e.g.&nbsp;by introducing a very “mild” version of the central problem, or something else that would be unrealistic in practice? Answering these questions will help you to <em>evaluate</em> the paper, understand its limitations and possibly think about how to improve upon it.</p>
</section>
<section id="make-things-simpler" class="level3">
<h3 class="anchored" data-anchor-id="make-things-simpler">Make Things Simpler</h3>
<p>Many econometrics papers present results at an extremely high level of generality. On the one hand this is a good thing. Much of the power of mathematics comes from abstraction and general results are more widely-applicable. But from an expositional standpoint, this is <em>terrible</em>. The history of mathematics is a history of solutions to concrete problems that were progressively generalized and expanded over time. The history of ideas mirrors the way that the average person learns most effectively: by starting with concrete examples and then generalizing.</p>
<p>With this in mind, try to simplify the theorems and examples in the paper. Getting rid of covariates often cuts down on both algebra and notation, so start with this. Try re-writing the assumptions and theorems in this simpler notation. Are some of the assumptions confusing? Try strengthening them or try to see if you can find a concrete example in which they hold, possibly taken from the simulation DGP.</p>
</section>
<section id="dont-get-hung-up-on-technicalities" class="level3">
<h3 class="anchored" data-anchor-id="dont-get-hung-up-on-technicalities">Don’t Get Hung Up on Technicalities</h3>
<p>Some parts of a paper are “core material” and some parts are “technicalities”. Keeping these separate in your mind will make it much easier to understand a paper. One helpful approach is to make a dependency tree of the assumptions, lemmas, and theorems <em>before trying to understand them.</em> Once you see how things fit together you may notice, for example, that the only role of Proposition 3 is to establish that an appropriate Central Limit Theorem holds and the only role of Assumptions 2-6 is to prove Proposition 3. Fantastic! In this case, just assume the conclusion of Proposition 3 and move on to see where this is needed in the <em>core</em> results. Even when you’re reading assumptions, lemmas, propositions, theorems, and proofs, you should be aiming to get the “big picture” rather than to assimilate every tiny detail.</p>
</section>
<section id="be-appropriately-skeptical-of-asymptotics" class="level3">
<h3 class="anchored" data-anchor-id="be-appropriately-skeptical-of-asymptotics">Be Appropriately Skeptical of Asymptotics</h3>
<p>Asymptotics are a crucial tool in econometrics but remember that it is <em>finite sample</em> properties that we actually care about. The “asymptotic distribution” of an estimator is just a thought experiment, not something you can take to the bank. An asymptotic argument is a kind of approximation that in effect supposes that certain things are “negligible.” This approximation could be fantastic or it could be terrible. It’s only through simulation studies that we can really know which is the case. Or, to quote <a href="https://doi.org/10.1017/CBO9780511802256">van der Vaart (1998)</a>,</p>
<blockquote class="blockquote">
<p>strictly speaking, most asymptotic results that are currently available are logically useless. This is because most asymptotic results are limit results, rather than approximations consisting of an approximating formula plus an accurate error bound … This is why there is good asymptotics and bad asymptotics and why two types of asymptotics sometimes lead to conflicting claims … Because it may be theoretically very hard to ascertain that approximation errors are small, one often takes recourse to simulation studies</p>
</blockquote>
<p>For an example of “good” versus “bad” asymptotics applied to power analysis, see <a href="../../post/local-asymptotics-the-simplest-possible-example/">this post</a>.</p>
<!--
- Assume that your readers haven't read it and don't know anything about the related literature. You will almost certainly be right!
- That said, don't try to *substitute* for reading the paper. Your goal is instead to make it *easier* for someone to read the paper later on and to communicate why the paper is interesting / important / bad / wrong. By all means try to summarize it, but this summary should *not not not* comprehensive and technical.
- Aim lower than you think you should. You will never aim too low, I promise.
- It may help to spend time on background. E.g. a paper that relies on principal components (diffusion index forecasting): present a quick review / overview of PCA to make sure everyone is on the same page.
- Don't use notation you haven't introduced: put things in a natural order.
- Your talk does *not* necessarily have to follow the same organizational plan as the paper. Indeed it *probably should not*. 
- This is surpsingly difficult: curse of knowledge. You understand it and maybe you can't remember a time when you *didn't* understand it. But try! 
- Use pictures / simple examples / simulations to communicate key ideas. It's usually a bad idea to try to prove a theorem. It may even be a bad idea to state a theorem fully rigorously: this usually takes a lot of notation and set-up. The audience has maybe 30-45 minutes to absorb what you have to say and can only hold 7 things in working memory at once. Every additional piece of notation that you introduce is a tax on their attention.
- This is hard and it takes practice. But guess what: if you become an academic this is something that you will spend the rest of your career doing! Teaching and explaining your research to others! Start practicing now, get feedback, and improve. Nothing worth doing is easy.
-->


</section>
</section>

 ]]></description>
  <category>teaching</category>
  <guid>https://www.econometrics.blog/post/how-to-read-an-econometrics-paper/</guid>
  <pubDate>Sat, 20 Jul 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Sims and Uhlig (1991) Replication</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/</link>
  <description><![CDATA[ 




<p>As a teaser for our upcoming (2024-07-23) virtual reading group session on Bayesian macro / time series econometrics, this post replicates a classic paper by <a href="https://ideas.repec.org/a/ecm/emetrp/v59y1991i6p1591-99.html">Sims &amp; Uhlig (1991)</a> contrasting Bayesian and Frequentist inferences for a unit root. In the post I’ll focus on explaining and implementing the authors’ simulation design. In the reading group session (and possibly a future post) we’ll talk more about the paper’s implications for the Bayesian-Frequentist debate and relate it to more recent work by <a href="https://ideas.repec.org/a/taf/jnlasa/v111y2016i515p1233-1241.html">Mueller &amp; Norets (2016)</a>. We’ll also be joined by special guest <a href="https://web.sas.upenn.edu/schorf/">Frank Schorfheide</a> who will help guide us through the recent literature on Bayesian approaches to VARs, including <a href="https://ideas.repec.org/a/tpr/restat/v97y2015i2p436-451.html">Giannone et al (2015)</a> and <a href="https://ideas.repec.org/a/taf/jnlasa/v114y2019i526p565-580.html">(2019)</a>. If you’re an Oxford student or staff member, you can sign up for the reading group <a href="https://edstem.org/us/join/6j2hay">here</a>. Otherwise, send me an email and I’ll add you manually.</p>
<section id="a-simple-example" class="level2">
<h2 class="anchored" data-anchor-id="a-simple-example">A Simple Example</h2>
<p>To set the stage for Sims &amp; Uhlig (1991), consider the following simple example: <img src="https://latex.codecogs.com/png.latex?X_1,%20X_2,%20%5Cdots,%20X_%7B100%7D%20%5Csim%20%5Ctext%7BNormal%7D(%5Cmu,%20%5Csigma%5E2)"> where <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is unknown but <img src="https://latex.codecogs.com/png.latex?%5Csigma"> is known to equal <img src="https://latex.codecogs.com/png.latex?1">. Let <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D%20=%20%5Cfrac%7B1%7D%7B100%7D%20%5Csum_%7Bi=1%7D%5E%7B100%7D%20X_i"> be the sample mean. Then <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D%20%5Cpm%200.2"> is an approximate 95% Frequentist confidence interval for <img src="https://latex.codecogs.com/png.latex?%5Cmu">. In words: among 95% of the possible datasets that we could potentially observe, the interval <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D%20%5Cpm%200.2"> will cover the true, unknown value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">; in the remaining <img src="https://latex.codecogs.com/png.latex?5%5C%25"> of datasets, the interval will not cover <img src="https://latex.codecogs.com/png.latex?%5Cmu">.</p>
<p>The Frequentist interval conditions on <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and treats <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> as random. In contrast, a Bayesian credible interval conditions on <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D"> and treats <img src="https://latex.codecogs.com/png.latex?%5Cmu"> as random. This doesn’t require us to believe that <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is “really” random. Bayesian reasoning simply uses the language of probability to express uncertainty about <em>any quantity that we cannot observe</em>. Let <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bx%7D"> be the observed value of <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D">. Under a vague prior for <img src="https://latex.codecogs.com/png.latex?%5Cmu">, e.g.&nbsp;a Normal(0, 100) distribution, the 95% Bayesian <a href="https://en.wikipedia.org/wiki/Credible_interval">highest posterior density interval</a> for <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is approximately <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bx%7D%20%5Cpm%200.2">. In words: given that we have observed <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BX%7D%20=%20%5Cbar%7Bx%7D">, there is a 95% probability that <img src="https://latex.codecogs.com/png.latex?%5Cmu"> lies in the interval <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bx%7D%20%5Cpm%200.2">.</p>
<p>The comforting thing about this example is that, regardless of whether we choose a Bayesian or Frequentist perspective, our inference remains the same: compute the sample mean, then add and subtract <img src="https://latex.codecogs.com/png.latex?0.2">. This means that the Frequentist interval inherits all the nice properties of Bayesian inferences, and the Bayesian interval has correct Frequentist coverage. This equivalence between Bayesian and Frequentist methods crops up in many simple examples, especially in situations where the sample size is large. But in more complex settings, the two approaches can give radically different answers. And to head off a common mis-understanding, this <em>isn’t</em> because Bayesians use priors. In the limit as we accumulate more and more data, the influence of the prior wanes. The key difference is that Bayesian inference adheres to the <a href="https://en.wikipedia.org/wiki/Likelihood_principle">likelihood principle</a>, whereas common Frequentist methods do not.<sup>1</sup></p>
</section>
<section id="a-not-so-simple-example" class="level2">
<h2 class="anchored" data-anchor-id="a-not-so-simple-example">A Not-so-simple Example</h2>
<p>Sims &amp; Uhlig consider the AR(1) model <img src="https://latex.codecogs.com/png.latex?%0Ay_t%20=%20%5Crho%20y_%7Bt-1%7D%20+%20%5Cvarepsilon_t,%20%5Cquad%20%5Cvarepsilon_t%20%5Csim%20%5Ctext%7Biid%20Normal%7D(0,%201)%0A"> and the conditional maximum likelihood estimator given the initial <img src="https://latex.codecogs.com/png.latex?y_0">, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Cwidehat%7B%5Crho%7D%20=%20%5Cfrac%7B%5Csum_%7Bt=1%7D%5ET%20y_%7Bt-1%7D%20y_t%7D%7B%5Csum_%7Bt=1%7D%5ET%20y_%7Bt-1%7D%5E2%7D.%0A"> Their simulation contrasts the Frequentist sampling distribution of <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Crho%7D%7C%5Crho"> with the Bayesian posterior distribution of <img src="https://latex.codecogs.com/png.latex?%5Crho%7C%5Cwidehat%7B%5Crho%7D"> under a flat prior on <img src="https://latex.codecogs.com/png.latex?%5Crho">. When <img src="https://latex.codecogs.com/png.latex?%5Crho"> is near one, these two distributions differ markedly: while the Bayesian posterior is always symmetric and centered at <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Crho%7D">, the Frequentist sampling distribution is highly skewed when <img src="https://latex.codecogs.com/png.latex?%5Crho"> is close to one. This shows that the Bayesian-Frequentist equivalence we found in our simple population mean example from above breaks down completely in this more complex example.</p>
<p>Sims &amp; Uhlig argue that the Bayesian posterior provides a much more sensible and useful characterization of the information contained in the data and after reading the paper, I’m inclined to agree. My replication code follows below, along with plots of the joint distribution of <img src="https://latex.codecogs.com/png.latex?(%5Crho,%20%5Cwidehat%7B%5Crho%7D)"> under a uniform prior for <img src="https://latex.codecogs.com/png.latex?%5Crho"> and the conditional distributions <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Crho%7D%7C%5Crho=1"> (Frequentist Sampling Distribution) and <img src="https://latex.codecogs.com/png.latex?%5Crho%7C%5Cwidehat%7B%5Crho%7D%20=%201"> (Bayesian Posterior).<sup>2</sup></p>
</section>
<section id="the-replication" class="level2">
<h2 class="anchored" data-anchor-id="the-replication">The Replication</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#-------------------------------------------------------------------------------</span></span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sims, C. A., &amp; Uhlig, H. (1991). Understanding unit rooters: A helicopter tour</span></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#</span></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># (See also: Example 6.10.6 from Poirier "Intermediate Statistics and 'Metrics")</span></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#-------------------------------------------------------------------------------</span></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># In the next section we will proceed to construct, by Monte Carlo, an estimated</span></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># joint pdf for \rho and \hat{\rho} under a uniform prior pdf on \rho. We choose</span></span>
<span id="cb1-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 31 values of \rho, from 0.8 to 1.1 at intervals of 0.01. We draw 10000 100 x 1</span></span>
<span id="cb1-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># iid N(0,1) vectors of random variables to use as realizations of \epsilon. For</span></span>
<span id="cb1-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># each of the 10000 \epsilon vectors and each of the 31 \rho values, we</span></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># construct a y vector with y(0) = 0, y(t) generated by equation (1).</span></span>
<span id="cb1-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#</span></span>
<span id="cb1-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Equation (1): y(t) = \rho y(t-1) + \epsilon(t), t = 0, ..., T</span></span>
<span id="cb1-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#</span></span>
<span id="cb1-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For each of these y vectors, we construct \hat{\rho}. Using as bins the</span></span>
<span id="cb1-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># intervals [-\infty, 0.795), [0.795, 0.805), [0.805, 0.815), etc. we construct</span></span>
<span id="cb1-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># a histogram that estimates the pdf of \hat{rho} for each fixed \rho value.</span></span>
<span id="cb1-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># When these histograms are lined up side by side, they form a surface that is</span></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the joint pdf for \rho and \hat{\rho} under a flat prior on \rho.</span></span>
<span id="cb1-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#-------------------------------------------------------------------------------</span></span>
<span id="cb1-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1693</span>)</span>
<span id="cb1-22"></span>
<span id="cb1-23"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-24"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tictoc)</span>
<span id="cb1-25"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(patchwork)</span>
<span id="cb1-26"></span>
<span id="cb1-27">draw_rho_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(rho) {</span>
<span id="cb1-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Carry out the simulation once for a fixed value of rho; return rho_hat</span></span>
<span id="cb1-29">  nT <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb1-30">  y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, nT <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-31">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (t <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>(nT <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) {</span>
<span id="cb1-32">    y[t] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> y[t <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-33">  }</span>
<span id="cb1-34">  y_t <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> y[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-35">  y_tminus1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> y[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y)]</span>
<span id="cb1-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(y_t <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> y_tminus1) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(y_tminus1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-37">}</span>
<span id="cb1-38"></span>
<span id="cb1-39"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function to run the simulation for a fixed value of rho (10000 times)</span></span>
<span id="cb1-40">run_sim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(rho) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e4</span>, \(i) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">draw_rho_hat</span>(rho))</span>
<span id="cb1-41"></span>
<span id="cb1-42"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tic</span>()</span>
<span id="cb1-43">foo <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">run_sim</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>)</span>
<span id="cb1-44"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toc</span>() <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~0.6 seconds on my machine</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>0.368 sec elapsed</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Full sequence of rho values from Sims &amp; Uhlig (1991)</span></span>
<span id="cb3-2">rho <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">from =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">to =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tic</span>()</span>
<span id="cb3-5">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rho =</span> rho,</span>
<span id="cb3-6">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rho_hat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(rho, run_sim)) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># List columns</span></span>
<span id="cb3-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toc</span>() <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~17 seconds on my machine (1991 was a long time ago!)</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>11.287 sec elapsed</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The results tibble uses a list column for rho_hat. This is convenient for</span></span>
<span id="cb5-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># making histograms of the frequentist sampling distribution (rho fixed) but</span></span>
<span id="cb5-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># not for making histograms of the Bayesian posterior (rho_hat) fixed. For the</span></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># latter, we will use the unnest() function to "expand" the list column rho_hat</span></span>
<span id="cb5-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># into a regular column. This is the "joint" distribution of rho and rho_hat.</span></span>
<span id="cb5-6">joint <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(rho_hat)</span>
<span id="cb5-8"></span>
<span id="cb5-9">joint <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rho, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> rho_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density2d_filled</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_cartesian</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Restrict rho_hat axis</span></span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Joint Distribution"</span>,</span>
<span id="cb5-14">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(rho),</span>
<span id="cb5-15">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hat</span>(rho))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">joint <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rho_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.995</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> rho_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.005</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rho)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">binwidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skyblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hat</span>(rho) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb6-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(rho),</span>
<span id="cb6-7">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Frequency"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">joint <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rho_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">binwidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skyblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb7-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hat</span>(rho)),</span>
<span id="cb7-7">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Frequency"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-3.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function that makes the preceding two plots, puts them side-by-side and lets</span></span>
<span id="cb8-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the user specify the value of rho/rho_hat that we condition on:</span></span>
<span id="cb8-3">plot_Bayes_vs_Freq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> \(r) {</span>
<span id="cb8-4">  p1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> joint <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rho_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> rho_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rho)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">after_stat</span>(density)),</span>
<span id="cb8-8">                   <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">binwidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skyblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> r, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bquote</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hat</span>(rho) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> .(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(r, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))),</span>
<span id="cb8-11">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(rho)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span>
<span id="cb8-13"></span>
<span id="cb8-14">  p2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> joint <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rho_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">after_stat</span>(density)),</span>
<span id="cb8-18">                   <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">binwidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skyblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-19">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> r, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dashed"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-20">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bquote</span>(rho <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> .(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(r, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))),</span>
<span id="cb8-21">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expression</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hat</span>(rho))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-22">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span>
<span id="cb8-23"></span>
<span id="cb8-24">  p1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> p2</span>
<span id="cb8-25">}</span>
<span id="cb8-26"></span>
<span id="cb8-27"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_Bayes_vs_Freq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.98</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-4.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_Bayes_vs_Freq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-5.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_Bayes_vs_Freq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-6.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_Bayes_vs_Freq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.01</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-7.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_Bayes_vs_Freq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.02</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/index_files/figure-html/unnamed-chunk-1-8.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>A detailed discussion of the likelihood principle would require at least a whole post of its own. If you want to learn more, I highly recommend the classic monograph by <a href="https://external.dandelon.com/download/attachments/dandelon/ids/DE004496C87987070706BC125794B00403A1A.pdf">Berger &amp; Wolpert</a>.↩︎</p></li>
<li id="fn2"><p>For further discussion of Sims and Uhlig’s illuminating simulation experiment, see Chapter 6 of <a href="https://mitpress.mit.edu/9780262660945/intermediate-statistics-and-econometrics/">Poirier</a>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>time series</category>
  <guid>https://www.econometrics.blog/post/sims-and-uhlig-1991-replication/</guid>
  <pubDate>Mon, 15 Jul 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>The Return of econometrics.blog!</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/the-return-of-econometrics-blog/</link>
  <description><![CDATA[ 




<p>After a year-long hiatus, I’m excited to return to regular blogging about econometrics! I have a long list of posts that I’m eager to write, and I hope you’ll find them interesting. To whet your appetite, here’s a preview of some of the topics I plan to cover in the coming weeks:</p>
<ul>
<li>Bayesian versus Frequentist Approaches to Unit Roots</li>
<li>How Not To Do Regression Adjustment</li>
<li>Understanding the James-Stein Estimator</li>
</ul>
<p>In the meantime, I have a few econometrics-related announcements:</p>
<ol type="1">
<li>I’ll be teaching a summer course on causal inference at Oxford this September. If you’re interested in attending here are the <a href="https://ouess.web.ox.ac.uk/event/treatment-effects-beyond-the-basics-econometrics">registration details</a> and here’s the <a href="https://www.treatment-effects.com/beyond/">course website</a>.</li>
<li>I’m currently running a virtual summer reading group on Bayesian Econometrics that will continue at least until September and potentially beyond, depending on interest. If you have an email address that ends in <code>.ox.ac.uk</code> you can self-register <a href="https://edstem.org/us/join/6j2hay">here</a>. If you don’t have an Oxford email address, send me an email and I’ll add you manually.</li>
<li><a href="https://users.ox.ac.uk/~econ0610/">Martin Weidner</a> and I recently kicked off an initiative to change the way that research in econometrics is assessed. To find out more, visit <a href="https://sqare.org/">sqare.org</a>.</li>
</ol>
<p>I’m looking forward to getting back to regular posting. If you have any special requests, please add them in the comments below.</p>



 ]]></description>
  <category>meta</category>
  <guid>https://www.econometrics.blog/post/the-return-of-econometrics-blog/</guid>
  <pubDate>Sun, 14 Jul 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>A Good Instrument is a Bad Control</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/a-good-instrument-is-a-bad-control/</link>
  <description><![CDATA[ 




<p>Here’s a puzzle for you. What will happen if we regress some outcome of interest on <em>both</em> an endogenous regressor <em>and</em> a valid instrument for that regressor? I hadn’t thought about this question until 2018, when one of my undergraduate students asked it during class. If memory serves, my off-the-cuff answer left much to be desired.<sup>1</sup> Five years later I’m finally ready to give a fully satisfactory answer; better late than never I suppose!</p>
<section id="the-model" class="level1">
<h1>The Model</h1>
<p>We’ll start by being a bit more precise about the setup. Suppose that <img src="https://latex.codecogs.com/png.latex?Y"> is related to <img src="https://latex.codecogs.com/png.latex?X"> according to the following <strong>linear causal model</strong> <img src="https://latex.codecogs.com/png.latex?%0AY%20%5Cleftarrow%20%5Calpha%20+%20%5Cbeta%20X%20+%20U%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> is the causal effect of interest and <img src="https://latex.codecogs.com/png.latex?U"> represents unobserved causes of <img src="https://latex.codecogs.com/png.latex?Y"> that may be related to <img src="https://latex.codecogs.com/png.latex?X">. Now, for <em>any</em> observed random variable <img src="https://latex.codecogs.com/png.latex?Z">, we can define <img src="https://latex.codecogs.com/png.latex?%0AV%20%5Cequiv%20X%20-%20(%5Cpi_0%20+%20%5Cpi_1%20Z),%20%5Cquad%20%5Cpi_0%20%5Cequiv%20%5Cmathbb%7BE%7D%5BX%5D%20-%20%5Cpi_1%20%5Cmathbb%7BE%7D%5BZ%5D,%20%5Cquad%20%5Cpi_1%20%5Cequiv%20%5Cfrac%7B%5Ctext%7BCov%7D(X,Z)%7D%7B%5Ctext%7BVar%7D(Z)%7D.%0A"> This is the <strong>population linear regression</strong> of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z">. By construction it satisfies <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BV%5D%20=%20%5Ctext%7BCov%7D(Z,V)%20=%200">.<sup>2</sup> Thus we can write, <img src="https://latex.codecogs.com/png.latex?%0AX%20=%20%5Cpi_0%20+%20%5Cpi_1%20Z%20+%20V,%20%5Cquad%20%5Cmathbb%7BE%7D%5BV%5D%20=%20%5Ctext%7BCov%7D(Z,V)%20=%200%0A"> for <em>any</em> random variables <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">, simply by constructing <img src="https://latex.codecogs.com/png.latex?V"> as described above. If <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20%5Cneq%200">, we say that <img src="https://latex.codecogs.com/png.latex?Z"> is <strong>relevant</strong>. If <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,U)%20=%200">, we say that <img src="https://latex.codecogs.com/png.latex?Z"> is <strong>exogenous</strong>. If <img src="https://latex.codecogs.com/png.latex?Z"> is both relevant and exogenous, we say that it is a <strong>valid instrument</strong> for <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>As we’ve defined it above, <img src="https://latex.codecogs.com/png.latex?V"> is simply a regression residual. But if <img src="https://latex.codecogs.com/png.latex?Z"> is a valid instrument, it turns out that we can think of <img src="https://latex.codecogs.com/png.latex?V"> as the “endogenous part” of <img src="https://latex.codecogs.com/png.latex?X">. To see why, expand <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)"> as follows: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(X,U)%20=%20%5Ctext%7BCov%7D(%5Cpi_0%20+%20%5Cpi_1%20Z%20+%20V,%20%5C,U)%20=%20%5Cpi_1%20%5Ctext%7BCov%7D(Z,U)%20+%20%5Ctext%7BCov%7D(U,V)%20=%20%5Ctext%7BCov%7D(U,V)%0A"> since we have assumed that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,U)%20=%200">. In words, the endogeneity of <img src="https://latex.codecogs.com/png.latex?X"> is <em>precisely the same thing</em> as the covariance between <img src="https://latex.codecogs.com/png.latex?U"> and <img src="https://latex.codecogs.com/png.latex?V">.</p>
<p>Here’s a helpful way of thinking about this. If <img src="https://latex.codecogs.com/png.latex?Z"> is exogenous then our regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z"> <em>partitions</em> the overall variation in <img src="https://latex.codecogs.com/png.latex?X"> into two components: the “good” (exogenous) variation <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20Z"> is uncorrelated with <img src="https://latex.codecogs.com/png.latex?U">, while the “bad” (endogenous) variation <img src="https://latex.codecogs.com/png.latex?V"> is correlated with <img src="https://latex.codecogs.com/png.latex?U">. The logic of two-stage least squares is that regressing <img src="https://latex.codecogs.com/png.latex?Y"> on the “good” variation, <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20Z"> allows us to recover <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, the causal effect of interest.<sup>3</sup></p>
</section>
<section id="a-simulation-example" class="level1">
<h1>A Simulation Example</h1>
<p>Using the model and derivations from above, let’s run a little simulation. To simulate a valid instrument <img src="https://latex.codecogs.com/png.latex?Z"> and an endogenous regressor <img src="https://latex.codecogs.com/png.latex?X"> we can proceed as follows. First generate independent standard normal draws <img src="https://latex.codecogs.com/png.latex?%5C%7BZ_i%5C%7D_%7Bi=1%7D%5En">. Next independently generate pairs of correlated standard normal draws <img src="https://latex.codecogs.com/png.latex?%5C%7B(U_i,%20V_i)%5C%7D_%7Bi=1%7D%5En"> with <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCorr%7D(U_i,%20V_i)%20=%20%5Crho">. Finally, set <img src="https://latex.codecogs.com/png.latex?%0AX_i%20=%20%5Cpi_0%20+%20%5Cpi_1%20Z_i%20+%20V_i%20%5Cquad%20%5Ctext%7Band%7D%20%5Cquad%0AY_i%20=%20%5Calpha%20+%20%5Cbeta%20X_i%20+%20U_i%0A"> for each value of <img src="https://latex.codecogs.com/png.latex?i"> between <img src="https://latex.codecogs.com/png.latex?1"> and <img src="https://latex.codecogs.com/png.latex?n">.<sup>4</sup> The following chunk of R code runs this simulation with <img src="https://latex.codecogs.com/png.latex?n%20=%205000">, <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%200.5">, <img src="https://latex.codecogs.com/png.latex?%5Cpi_0%20=%200.5">, <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20=%200.8">, <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%20-0.3"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20=%201">:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb1-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span></span>
<span id="cb1-3">z <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mvtnorm)</span>
<span id="cb1-6">Rho <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, </span>
<span id="cb1-7">                <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">byrow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-8">errors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rmvnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> Rho)</span>
<span id="cb1-9"></span>
<span id="cb1-10">u <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> errors[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-11">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> errors[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb1-12">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> v</span>
<span id="cb1-13">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> u</span></code></pre></div></div>
</div>
<p>In the simulation <img src="https://latex.codecogs.com/png.latex?Z"> is a valid instrument, <img src="https://latex.codecogs.com/png.latex?X"> is an endogenous regressor, and the true causal effect of interest equals one. Using our simulation data, let’s test out three possible estimators:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_%5Ctext%7BOLS%7D%5Cequiv"> the slope coefficient from an OLS regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X">.</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_%5Ctext%7BIV%7D%5Cequiv"> slope coefficient from an IV regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> with <img src="https://latex.codecogs.com/png.latex?Z"> as an instrument.</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_%7BX.Z%7D%5Cequiv"> the coefficient on <img src="https://latex.codecogs.com/png.latex?X"> in an OLS regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">.</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb2-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b_OLS =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(x, y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(x), </span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b_IV =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(z, y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(z, x), </span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">b_x.z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> z))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># unname() makes the names prettier!</span></span>
<span id="cb2-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>truth b_OLS  b_IV b_x.z 
 1.00  1.31  1.01  1.49 </code></pre>
</div>
</div>
<p>As expected, OLS is far from the truth while IV pretty much nails it. Interestingly, the regression of <code>y</code> on <code>x</code> and <code>z</code> gives the worst performance of all! Is this just a fluke? Perhaps it’s an artifact of the simulation parameters I chose, or just bad luck arising from some unusual simulation draws. To find out, we’ll need a bit more algebra. But stay with me: the payoff is worth it, and there’s not too much extra math required!</p>
</section>
<section id="the-general-result" class="level1">
<h1>The General Result</h1>
<section id="regression-of-y-on-x-and-z" class="level2">
<h2 class="anchored" data-anchor-id="regression-of-y-on-x-and-z">Regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z"></h2>
<p>The coefficient on <img src="https://latex.codecogs.com/png.latex?X"> in a population linear regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z"> is given by <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_%7BX.Z%7D%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(%5Ctilde%7BX%7D,%20Y)%7D%7B%5Ctext%7BVar%7D(%5Ctilde%7BX%7D)%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BX%7D"> is defined as the <em>residual</em> in another population linear regression: the regression of <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z">.<sup>5</sup> But wait a minute: we’ve <em>already seen</em> this residual! Above we called it <img src="https://latex.codecogs.com/png.latex?V"> and used it to write <img src="https://latex.codecogs.com/png.latex?X%20=%20%5Cpi_0%20+%20%5Cpi_1%20Z%20+%20V">. Using this equation, along with the linear causal model relating <img src="https://latex.codecogs.com/png.latex?Y"> to <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?U">, we can re-express <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7BX.Z%7D"> as <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cbeta_%7BX.Z%7D%20&amp;=%20%5Cfrac%7B%5Ctext%7BCov%7D(V,%20Y)%7D%7B%5Ctext%7BVar%7D(V)%7D%20=%20%5Cfrac%7B%5Ctext%7BCov%7D(V,%20%5Calpha%20+%20%5Cbeta%20X%20+%20U)%7D%7B%5Ctext%7BVar%7D(V)%7D%5C%5C%0A&amp;=%20%5Cfrac%7B%5Ctext%7BCov%7D(U,V)%20+%20%5Cbeta%5Ctext%7BCov%7D(V,%20%5Cpi_0%20+%20%5Cpi_1%20Z%20+%20V)%7D%7B%5Ctext%7BVar%7D(V)%7D%5C%5C%0A&amp;=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(U,V)%7D%7B%5Ctext%7BVar%7D(V)%7D%0A%5Cend%7Baligned%7D%0A"> since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(Z,%20V)%20=%200"> by construction. We have some simulation data at our disposal, so let’s check this calculation. In the simulation <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20=%201"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Ctext%7BCov%7D(U,%20V)%7D%7B%5Ctext%7BVar%7D(V)%7D%20=%200.5%0A"> since <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(U)%20=%20%5Ctext%7BVar%7D(V)%20=%201"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(U,%20V)%20=%200.5">. Therefore <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7BX.Z%7D%20=%201.5">. And, indeed, this is almost <em>exactly</em> the value of our estimate from our simulation above.</p>
</section>
<section id="regression-of-y-on-x-only" class="level2">
<h2 class="anchored" data-anchor-id="regression-of-y-on-x-only">Regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> Only</h2>
<p>So far so good. Now what about the “usual” OLS estimator? A quick calculation gives <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_%7B%5Ctext%7BOLS%7D%7D%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(X,U)%7D%7B%5Ctext%7BVar%7D(X)%7D%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(V,U)%7D%7B%5Ctext%7BVar%7D(X)%7D%0A"> using the fact that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,U)%20=%20%5Ctext%7BCov%7D(U,V)">, as explained above. Again, we can check this against our simulation results. We know that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(V,U)%20=%200.5"> and <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(X)%20=%20%5Ctext%7BVar%7D(%5Cpi_0%20+%20%5Cpi_1%20Z%20+%20V)%20=%20%5Cpi_1%5E2%20%5Ctext%7BVar%7D(Z)%20+%20%5Ctext%7BVar%7D(V)%20=%20(0.8)%5E2%20+%201%20=%2041/25%0A"> since <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?V"> are uncorrelated by construction, <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(Z)%20=%20%5Ctext%7BVar%7D(V)%20=%201"> and <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20=%200.8"> in the simulation design. Hence, <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7B%5Ctext%7BOLS%7D%7D%20=%201%20+%2025/82%20%5Capprox%201.305">. Again, this agrees almost perfectly with our simulation.</p>
</section>
<section id="comparing-the-results" class="level2">
<h2 class="anchored" data-anchor-id="comparing-the-results">Comparing the Results</h2>
<p>To summarize, we have shown that <img src="https://latex.codecogs.com/png.latex?%0A%5Cbeta_%7BX.Z%7D%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(U,V)%7D%7B%5Ctext%7BVar%7D(V)%7D,%20%5Cquad%20%5Ctext%7Bwhile%7D%20%5Cquad%0A%5Cbeta_%7B%5Ctext%7BOLS%7D%7D%20=%20%5Cbeta%20+%20%5Cfrac%7B%5Ctext%7BCov%7D(U,V)%7D%7B%5Ctext%7BVar%7D(X)%7D.%0A"> There is only one difference between these two expressions: <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7BX.Z%7D"> has <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(V)"> where <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7B%5Ctext%7BOLS%7D%7D"> has <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(X)">. Returning to our expression for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(X)"> from above, <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BVar%7D(X)%20=%20%5Cpi_1%5E2%20%5Ctext%7BVar%7D(Z)%20+%20%5Ctext%7BVar%7D(V)%20%3E%20%5Ctext%7BVar%7D(V)%0A"> as long as <img src="https://latex.codecogs.com/png.latex?%5Cpi_1%20%5Cneq%200"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D(Z)%20%5Cneq%200">. In other words, there is always <em>more</em> variation in <img src="https://latex.codecogs.com/png.latex?X"> than there is in <img src="https://latex.codecogs.com/png.latex?V">, since <img src="https://latex.codecogs.com/png.latex?V"> is the “leftover” part of <img src="https://latex.codecogs.com/png.latex?X"> after regressing on <img src="https://latex.codecogs.com/png.latex?Z">. Because the variances of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?V"> appear in the denominators of our expressions from above, it follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%7C%20%5Ctext%7BCov%7D(U,V)/%5Ctext%7BVar%7D(V)%5Cright%7C%20%3E%20%5Cleft%7C%20%5Ctext%7BCov%7D(U,V)/%5Ctext%7BVar%7D(X)%5Cright%7C.%0A"> In other words, <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7BX.Z%7D"> is <strong>always farther from the truth</strong> than <img src="https://latex.codecogs.com/png.latex?%5Cbeta_%7BOLS%7D">, exactly as we found in our simulation.</p>
</section>
</section>
<section id="some-intuition" class="level1">
<h1>Some Intuition</h1>
<p>In our simulation, <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_%7BX.Z%7D"> gave a <em>worse</em> estimate of <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> than <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cbeta%7D_%5Ctext%7BOLS%7D">. The derivations from above show that this wasn’t a fluke: adding a valid instrument <img src="https://latex.codecogs.com/png.latex?Z"> as an additional control regressor only makes the bias in our estimated causal effect <em>worse</em> than it was to begin with. This holds for any valid instrument and any endogenous regressor in a linear causal model. I hope you found the derivations from above convincing. But even so, you may be wondering if there’s an intuitive explanation for this phenomenon. I am pleased to inform you that the answer is yes!</p>
<p>In an <a href="../../post/three-ways-of-thinking-about-instrumental-variables/">earlier post</a> I described the <strong>control function</strong> approach to instrumental variables regression. That post showed that the coefficient on <img src="https://latex.codecogs.com/png.latex?X"> in a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?V"> gives the <em>correct</em> causal effect. We don’t know <img src="https://latex.codecogs.com/png.latex?V">, but we can estimate it by regressing <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Z"> and saving the residuals. The logic of multiple regression shows that including <img src="https://latex.codecogs.com/png.latex?V"> as a control regressor “soaks up” the portion of <img src="https://latex.codecogs.com/png.latex?X"> that is explained by <img src="https://latex.codecogs.com/png.latex?V">. Because <img src="https://latex.codecogs.com/png.latex?V"> represents the “bad” (endogenous) variation in <img src="https://latex.codecogs.com/png.latex?X">, this solves our endogeneity problem. In effect, <img src="https://latex.codecogs.com/png.latex?V"> captures the unobserved “omitted variables” that play havoc with a naive regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>Now, contrast this with a regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">. In this case, we soak up the variation in <img src="https://latex.codecogs.com/png.latex?X"> that is explained by <img src="https://latex.codecogs.com/png.latex?Z">. But <img src="https://latex.codecogs.com/png.latex?Z"> represents the <strong>good</strong> (exogenous) variation in <img src="https://latex.codecogs.com/png.latex?X">! Soaking up this variation leaves only the bad variation behind, making our endogeneity problem worse than it was to begin with. In this example, <img src="https://latex.codecogs.com/png.latex?Z"> is what is known as a <a href="https://ditraglia.com/erm/16-DAGs-bad-controls.pdf">bad control</a>, a control regressor that makes things worse rather than better. A common piece of advice for avoiding bad controls is to only include control regressors that are correlated with <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> but are <em>not themselves</em> caused by <img src="https://latex.codecogs.com/png.latex?X">. The example in this post shows that this advice <strong>is wrong</strong>. Here <img src="https://latex.codecogs.com/png.latex?Z"> is not caused by <img src="https://latex.codecogs.com/png.latex?X">, and is correlated with both <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">. Nevertheless, it is a bad control. In short, a valid instrument provides a powerful way to carry out causal inference from observational data, but only if you use it in the right way. A good instrument is a bad control!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I seem to recall saying something like “this won’t in general give us the causal effect we’re interested in, but I don’t think it’s possible to say anything more without extra assumptions.” Fortunately my lackluster response didn’t derail the student who asked the question: he’s currently pursuing a PhD in Economics at UChicago!↩︎</p></li>
<li id="fn2"><p>Check if you don’t believe me: substitute the expressions for <img src="https://latex.codecogs.com/png.latex?%5Cpi_0"> and <img src="https://latex.codecogs.com/png.latex?%5Cpi_1">, take expectations / covariances, and simplify.↩︎</p></li>
<li id="fn3"><p>See <a href="../../post/three-ways-of-thinking-about-instrumental-variables/">this blog post</a> for more discussion.↩︎</p></li>
<li id="fn4"><p>We don’t necessarily need <img src="https://latex.codecogs.com/png.latex?Z_i"> to be normally distributed, as long as it’s independent of <img src="https://latex.codecogs.com/png.latex?(U_i,%20V_i)">, so you could use e.g.&nbsp;uniform draws if you prefer. Generating <img src="https://latex.codecogs.com/png.latex?(U_i,%20V_i)"> from a bivariate normal distribution isn’t necessary either, but it’s a simple way of controlling the endogeneity in <img src="https://latex.codecogs.com/png.latex?X">.↩︎</p></li>
<li id="fn5"><p>This is a special case of the so-called <a href="https://en.wikipedia.org/wiki/Frisch%E2%80%93Waugh%E2%80%93Lovell_theorem">FWL Theorem</a>, although I’d argue that we should call it “Yule’s Rule” since <a href="https://en.wikipedia.org/wiki/Udny_Yule">George Udny Yule</a> was arguably the first person to popularize it, decades before F, W, or L.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <category>causal inference</category>
  <guid>https://www.econometrics.blog/post/a-good-instrument-is-a-bad-control/</guid>
  <pubDate>Thu, 29 Jun 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>The R Formula Cheatsheet</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/the-r-formula-cheatsheet/</link>
  <description><![CDATA[ 




<p>R’s formula syntax is extremely powerful but can be confusing for beginners.<sup>1</sup> This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: <code>~, +, ., -, 1, :, *, ^</code>, and <code>I()</code>. You may never use some of these in practice, but it’s nice to know that they exist. It was many years before I realized that I could simply type <code>y ~ x * z</code> instead of the lengthier <code>y ~ x + z + x:z</code>, for example. While R formulas crop up in a variety of places, they are probably most familiar as the first argument of <code>lm()</code>. For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict <code>y</code> using a number of regressors <code>x</code>, <code>z</code>, and <code>w</code>.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 14%">
<col style="width: 29%">
<col style="width: 15%">
<col style="width: 40%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Symbol</th>
<th style="text-align: left;">Purpose</th>
<th style="text-align: left;">Example</th>
<th style="text-align: left;">In Words</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><code>~</code></td>
<td style="text-align: left;">separate LHS and RHS of formula</td>
<td style="text-align: left;"><code>y ~ x</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code></td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>+</code></td>
<td style="text-align: left;">add variable to a formula</td>
<td style="text-align: left;"><code>y ~ x + z</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code> <em>and</em> <code>z</code></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>.</code></td>
<td style="text-align: left;">denotes “everything else”</td>
<td style="text-align: left;"><code>y ~ .</code></td>
<td style="text-align: left;">regress <code>y</code> on <em>all other variables</em> in a data frame</td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>-</code></td>
<td style="text-align: left;">remove variable from a formula</td>
<td style="text-align: left;"><code>y ~ . - x</code></td>
<td style="text-align: left;">regress <code>y</code> on all other variables <em>except</em> <code>x</code></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>1</code></td>
<td style="text-align: left;">denotes intercept</td>
<td style="text-align: left;"><code>y ~ x - 1</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code> <em>without an intercept</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>:</code></td>
<td style="text-align: left;">construct interaction term</td>
<td style="text-align: left;"><code>y ~ x + z + x:z</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code>, <code>z</code>, and the product <code>x</code> times <code>z</code></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>*</code></td>
<td style="text-align: left;">shorthand for levels plus interaction</td>
<td style="text-align: left;"><code>y ~ x * z</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code>, <code>z</code>, and the product <code>x</code> times <code>z</code></td>
</tr>
<tr class="even">
<td style="text-align: left;"><code>^</code></td>
<td style="text-align: left;">higher order interactions</td>
<td style="text-align: left;"><code>y ~ (x + z + w)^3</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code>, <code>z</code>, <code>w</code>, all two-way interactions, and the three-way interactions</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><code>I()</code></td>
<td style="text-align: left;">“as-is” - override special meanings of other symbols from this table</td>
<td style="text-align: left;"><code>y ~ x + I(x^2)</code></td>
<td style="text-align: left;">regress <code>y</code> on <code>x</code> and <code>x</code> squared</td>
</tr>
</tbody>
</table>




<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Fun fact: R’s formula syntax originated in <a href="https://www.jstor.org/stable/2346786">this 1973 paper</a> by Wilkinson and Rogers.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>computing</category>
  <guid>https://www.econometrics.blog/post/the-r-formula-cheatsheet/</guid>
  <pubDate>Wed, 19 Apr 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Random Variables Cheatsheet</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/random-variables-cheatsheet/</link>
  <description><![CDATA[ 




<p>To do well in an econometrics or statistics course at any level, you need to have a large number of simple properties of random variables at your fingertips. Some years back I made a handout containing the most important properties for my undergraduate students at the University of Pennsylvania. In the hopes that this might be of use to others, I’ve released an <a href="https://github.com/fditraglia/random-variables-cheatsheet/blob/main/random-variables-cheatsheet.pdf">updated pdf on github</a>. You can fork the repository <a href="https://github.com/fditraglia/random-variables-cheatsheet">here</a>. If you spot any errors or want to suggest any additions, feel free to raise an <a href="https://github.com/fditraglia/random-variables-cheatsheet/issues">issue</a> or send me a <a href="https://github.com/fditraglia/random-variables-cheatsheet/pulls">pull request</a>.</p>



 ]]></description>
  <category>statistics</category>
  <guid>https://www.econometrics.blog/post/random-variables-cheatsheet/</guid>
  <pubDate>Sat, 07 Jan 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Why Econometrics is Confusing Part II: The Independence Zoo</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/why-econometrics-is-confusing-part-ii-the-independence-zoo/</link>
  <description><![CDATA[ 




<p>In econometrics it’s absolutely crucial to keep track of which things are <em>dependent</em> and which are <em>independent</em>. To make this as confusing as possible for students, a typical introductory econometrics course moves back and forth between different notions of dependence, stopping occasionally to mention that they’re not equivalent but never fully explaining why, on the premise that “you’ve certainly already learned this in your introductory probability and statistics course.” I remember finding this extremely frustrating as a student, but only recently managed to translate this frustration into meaningful changes in my own teaching.<sup>1</sup> Building on some of my recent teaching materials, this post is a field guide to the menagerie–or at least petting zoo–of “dependence” notions that appear regularly in econometrics. We’ll examine each property on its own along with the relationships between them, using the simple examples to build your intuition. Since a picture is worth a thousand words, here’s one that summarizes the entire post:</p>
<!-- Note that the following requires the magick and pdftools R packages. These can be installed from cran.-->
<div class="cell">
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/why-econometrics-is-confusing-part-ii-the-independence-zoo/index_files/figure-html/tikz-ex-1.png" class="img-fluid figure-img" width="480"></p>
<figcaption>Different notions of dependence in econometrics and their relationships. A directed double arrow indicates that one property implies another.</figcaption>
</figure>
</div>
</div>
</div>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">Prerequisites</h2>
<p>While written at an introductory level, this post assumes basic familiarity with calculations involving discrete and continuous random variables. In particular, I assume that:</p>
<ul>
<li>You know the definitions of expected value, variance, covariance, and correlation.</li>
<li>You are comfortable working with joint, marginal, and conditional distributions of a pair of discrete random variables.</li>
<li>You understand the uniform distribution and how to compute its moments (mean, variance, etc.).</li>
<li>You’ve encountered the notion of conditional expectation and the law of iterated expectations.</li>
</ul>
<p>If you’re a bit rusty on this material, lectures 7-11 from <a href="http://ditraglia.com/Econ103Public/slides/lecture_slides.pdf">these slides</a> should be helpful. For bivariate, discrete distributions I also suggest watching <a href="https://vimeo.com/119881985">this video</a> from 1:07:00 to the end and <a href="https://vimeo.com/141473625">this other video</a> from 0:00:00 up to the one hour mark.</p>
</section>
<section id="two-examples" class="level2">
<h2 class="anchored" data-anchor-id="two-examples">Two Examples</h2>
<section id="example-1---discrete-rvs-xy" class="level3">
<h3 class="anchored" data-anchor-id="example-1---discrete-rvs-xy">Example #1 - Discrete RVs <img src="https://latex.codecogs.com/png.latex?(X,Y)"></h3>
<p>My first example involves two discrete random variables <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> with joint probability mass function <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D(x,y)"> given by</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: right;"></th>
<th style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?Y=0"></th>
<th style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?Y=1"></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;"><img src="https://latex.codecogs.com/png.latex?X%20=%20-1"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?1/3"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?0"></td>
</tr>
<tr class="even">
<td style="text-align: right;"><img src="https://latex.codecogs.com/png.latex?X%20=%200"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?0"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?1/3"></td>
</tr>
<tr class="odd">
<td style="text-align: right;"><img src="https://latex.codecogs.com/png.latex?X=%201"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?1/3"></td>
<td style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?0"></td>
</tr>
</tbody>
</table>
<p>Even without doing any math, we see that knowing <img src="https://latex.codecogs.com/png.latex?X"> conveys information about <img src="https://latex.codecogs.com/png.latex?Y">, and <em>vice-versa</em>. For example, if <img src="https://latex.codecogs.com/png.latex?X%20=%20-1"> then we know that <img src="https://latex.codecogs.com/png.latex?Y"> must equal zero. Similarly, if <img src="https://latex.codecogs.com/png.latex?Y=1"> then <img src="https://latex.codecogs.com/png.latex?X"> must equal zero. Spend a bit of time thinking about this joint distribution before reading further. We’ll have plenty of time for mathematics below, but it’s always worth seeing where our intuition takes us <em>before</em> calculating everything.</p>
<p>To streamline our discussion below, it will be helpful to work out a few basic results about <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">. A quick calculation with <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D"> shows that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(XY)%20%5Cequiv%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20%5Csum_%7B%5Ctext%7Ball%20%7D%20y%7D%20x%20y%20%5Ccdot%20p_%7BXY%7D(x,y)%20=%200.%0A"> Calculating the marginal pmfs for <img src="https://latex.codecogs.com/png.latex?X"> we see that <img src="https://latex.codecogs.com/png.latex?%0Ap_X(-1)%20=%20p_X(0)%20=%20p_X(1)%20=%201/3%20%5Cimplies%20%5Cmathbb%7BE%7D(X)%20%5Cequiv%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20p_X(x)%20=%200.%0A"> Similarly, calculating the marginal pmf of <img src="https://latex.codecogs.com/png.latex?Y">, we obtain <img src="https://latex.codecogs.com/png.latex?%0Ap_Y(0)%20=%202/3,%5C,%20p_Y(1)%20=%201/3%20%5Cimplies%20%5Cmathbb%7BE%7D(Y)%20%5Cequiv%20%5Csum_%7B%5Ctext%7Ball%20%7D%20y%7D%20y%20%5Ccdot%20p_Y(y)%20=%201/3.%0A"> We’ll use these results as ingredients below as we explain and relate three key notions of dependence: <em>correlation</em>, <em>conditional mean independence</em>, and <em>statistical independence</em>.</p>
</section>
<section id="example-2---continuous-rvs-wz" class="level3">
<h3 class="anchored" data-anchor-id="example-2---continuous-rvs-wz">Example #2 - Continuous RVs <img src="https://latex.codecogs.com/png.latex?(W,Z)"></h3>
<p>My second example concerns two continuous random variables <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z">, where <img src="https://latex.codecogs.com/png.latex?W%20%5Csim%20%5Ctext%7BUniform%7D(-1,%201)"> and <img src="https://latex.codecogs.com/png.latex?Z%20=%20W%5E2">. In this example, <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are very strongly related: if I tell you that the realization of <img src="https://latex.codecogs.com/png.latex?W"> is <img src="https://latex.codecogs.com/png.latex?w">, then you know for sure that the realization of <img src="https://latex.codecogs.com/png.latex?Z"> must be <img src="https://latex.codecogs.com/png.latex?w%5E2">. Again, keep this intuition in mind as we work through the mathematics below.</p>
<p>In the remainder of the post, we’ll find it helpful to refer to a few properties of <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z">, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D%5BW%5D%20&amp;%5Cequiv%20%5Cint_%7B-%5Cinfty%7D%5E%5Cinfty%20w%5Ccdot%20f_W(w)%5C,%20dw%20=%20%5Cint_%7B-1%7D%5E1%20w%5Ccdot%20%5Cfrac%7B1%7D%7B2%7D%5C,dw%20=%20%5Cleft.%20%5Cfrac%7Bw%5E2%7D%7B4%7D%5Cright%7C_%7B-1%7D%5E1%20=%200%5C%5C%0A%5Cmathbb%7BE%7D%5BZ%5D%20&amp;%5Cequiv%20%5Cmathbb%7BE%7D%5BW%5E2%5D%20=%20%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20w%5E2%20%5Ccdot%20f_W(w)%5C,%20dw%20=%20%5Cint_%7B-1%7D%5E1%20w%5E2%20%5Ccdot%20%5Cfrac%7B1%7D%7B2%7D%20%5C,%20dw%20=%20%5Cleft.%20%5Cfrac%7Bw%5E3%7D%7B6%7D%5Cright%7C_%7B-1%7D%5E1%20=%20%5Cfrac%7B1%7D%7B3%7D%5C%5C%0A%5Cmathbb%7BE%7D%5BWZ%5D%20&amp;=%20%5Cmathbb%7BE%7D%5BW%5E3%5D%20%5Cequiv%20%5Cint_%7B-%5Cinfty%7D%5E%5Cinfty%20w%5E3%20%5Ccdot%20f_W(w)%5C,%20dw%20=%5Cint_%7B-1%7D%5E1%20w%5E3%20%5Ccdot%20%5Cfrac%7B1%7D%7B2%7D%5C,%20dw%20=%20%5Cleft.%20%5Cfrac%7Bw%5E4%7D%7B8%7D%20%20%5Cright%7C_%7B-1%7D%5E1%20=%200.%0A%5Cend%7Baligned%7D%0A"> Since <img src="https://latex.codecogs.com/png.latex?W"> is uniform on the interval <img src="https://latex.codecogs.com/png.latex?%5B-1,1%5D">, its pdf is simply <img src="https://latex.codecogs.com/png.latex?1/2"> on this interval, and zero otherwise. All else equal, I prefer easy integration problems!</p>
<!-- # Example #3
**This example is too complicated when it comes to calculating $E(V|U)$. The post is getting too long already, so ditch this one and just finish up based on the other two examples.** 

$U|V=v \sim \text{Normal}(0, 1 + 3v)$ where $V\sim \text{Bernoulli}(1/2)$. Need to think about whether the math is too complicated here. Maybe it would be better to have a Uniform(-1,1) conditional on $V=0$ and a Uniform$(-1-v, 1+v)$ conditional on $V=1$? The idea of this example is to show that conditional mean independence does *not* imply independence. This is relevant in econometrics courses when studies encounter heteroskedasticity. Poisson regression is a useful case here, and robust standard errors. Don't get too deep into this, but I could provide some references for further reading. 
-->
</section>
</section>
<section id="uncorrelatedness" class="level2">
<h2 class="anchored" data-anchor-id="uncorrelatedness">Uncorrelatedness</h2>
<p>Recall that the correlation between two random variables <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> is defined as <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCorr%7D(X,Y)%20%5Cequiv%20%5Cfrac%7B%5Ctext%7BCov%7D(X,Y)%7D%7B%5Ctext%7BSD%7D(X)%5Ctext%7BSD%7D(Y)%7D%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5B(X%20-%20%5Cmu_X)(Y%20-%20%5Cmu_Y)%5D%7D%7B%5Csqrt%7B%5Cmathbb%7BE%7D%5B(X%20-%20%5Cmu_X)%5E2%5D%5Cmathbb%7BE%7D%5B(Y%20-%20%5Cmu_Y)%5E2%5D%7D%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmu_X%20%5Cequiv%20%5Cmathbb%7BE%7D(X)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu_Y%20%5Cequiv%20%5Cmathbb%7BE%7D(Y)">. We say that <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are <strong>uncorrelated</strong> if <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCorr%7D(X,Y)=%200">. Unless <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are both constants their variances must be positive. This means that the denominator of our expression for <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCorr%7D(X,Y)"> is likewise positive. It follows that <em>zero correlation is the same thing as zero covariance</em>. Correlation is simply covariance <em>rescaled</em> so that the units of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> cancel out and the result always lies between <img src="https://latex.codecogs.com/png.latex?-1"> and <img src="https://latex.codecogs.com/png.latex?1">.</p>
<p>Correlation and covariance are both measures of <em>linear dependence</em>. If <img src="https://latex.codecogs.com/png.latex?X"> is, on average, above its mean when <img src="https://latex.codecogs.com/png.latex?Y"> is above its mean, then <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCorr%7D(X,Y)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,Y)"> are both positive. If <img src="https://latex.codecogs.com/png.latex?X"> is, on average, below its mean when <img src="https://latex.codecogs.com/png.latex?Y"> is above its mean, then <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCorr%7D(X,Y)"> and <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(X,Y)"> are both negative. If there is, on average, no linear relationship between <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">, then both the correlation and covariance between them are zero. Using the “shortcut formula” for covariance, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(X,Y)%20%5Cequiv%20%5Cmathbb%7BE%7D%5B(X%20-%20%5Cmu_X)(Y%20-%20%5Cmu_Y)%5D%20=%20%5Cmathbb%7BE%7D%5BXY%5D%20-%20%5Cmathbb%7BE%7D%5BX%5D%5Cmathbb%7BE%7D%5BY%5D,%0A"> it follows that uncorrelatedness is equivalent to <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5BXY%5D%20=%20%5Cmathbb%7BE%7D%5BX%5D%5Cmathbb%7BE%7D%5BY%5D.%0A"> Rendering this in English rather than mathematics,</p>
<blockquote class="blockquote">
<p>Two random variables <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are <strong>uncorrelated</strong> if and only if the expectation of their product equals the product of their expectations.</p>
</blockquote>
<section id="example-1-x-and-y-are-uncorrelated." class="level3">
<h3 class="anchored" data-anchor-id="example-1-x-and-y-are-uncorrelated.">Example #1: <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are uncorrelated.</h3>
<p>In Example #1 from above, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BXY%5D=0"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X)%5Cmathbb%7BE%7D(Y)%20=%200%20%5Ctimes%201/3%20=%200"> so <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are uncorrelated. Lack of correlation is one possible way in which two random variables can be thought of as “unrelated.” But it is a relatively <em>weak</em> property. Indeed, <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are in fact <em>highly dependent</em> in Example #1. For example, if <img src="https://latex.codecogs.com/png.latex?X=-1"> then we know for sure that <img src="https://latex.codecogs.com/png.latex?Y=0">. I simply cooked up the numbers to ensure that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BXY%5D=%5Cmathbb%7BE%7D%5BX%5D%5Cmathbb%7BE%7D%5BY%5D"> in spite of this.</p>
</section>
<section id="example-2-w-and-z-are-uncorrelated." class="level3">
<h3 class="anchored" data-anchor-id="example-2-w-and-z-are-uncorrelated.">Example #2: <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are uncorrelated.</h3>
<p>Because Example #1 is discrete, it can be a bit tricky to think about what it would mean for a dependence relationship to be <em>nonlinear</em>. Here Example #2 can help. As mentioned above, there is clearly a relationship between <img src="https://latex.codecogs.com/png.latex?Z"> and <img src="https://latex.codecogs.com/png.latex?W">. But this relationship is <em>nonlinear</em> in that <img src="https://latex.codecogs.com/png.latex?Z"> is a <em>quadratic</em> function of <img src="https://latex.codecogs.com/png.latex?W">. Since <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(WZ)%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(W)%20%5Ctimes%20%5Cmathbb%7BE%7D(Z)%20=%200%20%5Ctimes%20%5Cmathbb%7BE%7D(Z)%20=%200">, we see that <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are <em>uncorrelated</em>. Another way to see this is by simulating some data with the same properties as Example #2</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1983</span>)</span>
<span id="cb1-2">n_sims <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span></span>
<span id="cb1-3">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n_sims, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-4">z <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> w<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(w, z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.008379101</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(w, z)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abline</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> w))</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/why-econometrics-is-confusing-part-ii-the-independence-zoo/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The regression line is flat despite there being an obvious relationship between <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z">. When <img src="https://latex.codecogs.com/png.latex?W"> is positive, there is a positive relationship between the two RVs; but when <img src="https://latex.codecogs.com/png.latex?W"> is negative the picture is reverses. The line of best fit “averages out” the increasing and decreasing relationships on either side of zero to give an overall slope of zero.<sup>2</sup></p>
</section>
</section>
<section id="conditional-mean-independence" class="level2">
<h2 class="anchored" data-anchor-id="conditional-mean-independence">Conditional Mean Independence</h2>
<p>We say that <img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X"> if <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)%20=%20%5Cmathbb%7BE%7D(Y)">. In words,</p>
<blockquote class="blockquote">
<p><img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X"> if the conditional mean of <img src="https://latex.codecogs.com/png.latex?Y"> given <img src="https://latex.codecogs.com/png.latex?X"> equals the unconditional mean of <img src="https://latex.codecogs.com/png.latex?Y">.</p>
</blockquote>
<p>Just to make things confusing, this property is sometimes called “conditional mean independence” and sometimes called simply “mean independence.” The terms are completely interchangeable. Reversing the roles of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">, we say that <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y"> if the conditional mean of <img src="https://latex.codecogs.com/png.latex?X"> given <img src="https://latex.codecogs.com/png.latex?Y"> is the same as the unconditional mean of <img src="https://latex.codecogs.com/png.latex?X">. <strong>Spoiler alert:</strong> it is possible for <img src="https://latex.codecogs.com/png.latex?X"> to be mean independent of <img src="https://latex.codecogs.com/png.latex?Y"> while <img src="https://latex.codecogs.com/png.latex?Y"> is <em>not</em> mean independent of <img src="https://latex.codecogs.com/png.latex?X">. We’ll discuss this further below.</p>
<p>To better understand the concept of mean independence, let’s quickly review the difference between an unconditional mean and a conditional mean. The unconditional mean <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)">, also known as the “expected value” or “expectation” of <img src="https://latex.codecogs.com/png.latex?Y">, is a <em>constant number</em>.<sup>3</sup> If <img src="https://latex.codecogs.com/png.latex?Y"> is discrete, this is simply the probability-weighted average of all possible realizations of <img src="https://latex.codecogs.com/png.latex?Y">, namely <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(Y)%20=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20y%7D%20y%20%5Ccdot%20p_Y(y).%0A"> If <img src="https://latex.codecogs.com/png.latex?Y"> is continuous, it’s the same idea but with an integral replacing the sum and a probability density <img src="https://latex.codecogs.com/png.latex?f_Y(y)"> multiplied by <img src="https://latex.codecogs.com/png.latex?dy"> replacing the probability mass function <img src="https://latex.codecogs.com/png.latex?p_Y(y)">. Either way, we’re simply multiplying numbers together and adding up the result. Despite the similarity in notation, the conditional expectation <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> is a <em>function of <img src="https://latex.codecogs.com/png.latex?X"></em> that tells us how the mean of <img src="https://latex.codecogs.com/png.latex?Y"> varies with <img src="https://latex.codecogs.com/png.latex?X">. Since <img src="https://latex.codecogs.com/png.latex?X"> is a random variable, so is <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)">. If <img src="https://latex.codecogs.com/png.latex?Y"> is conditionally mean independent of <img src="https://latex.codecogs.com/png.latex?X"> then <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> equals <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)">. In words, the mean of <img src="https://latex.codecogs.com/png.latex?Y"> <em>does not vary with <img src="https://latex.codecogs.com/png.latex?X"></em>. Regardless of the value that <img src="https://latex.codecogs.com/png.latex?X"> takes on, the mean of <img src="https://latex.codecogs.com/png.latex?Y"> is the same: <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)">.</p>
<p>There’s another way to think about this property in terms of <em>prediction</em>. With a bit of calculus, we can show that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> solves the following optimization problem: <img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7B%5Ctext%7Ball%20constants%20%7D%20c%7D%20%5Cmathbb%7BE%7D%5B(Y%20-%20c)%5E2%5D.%0A"> In other words, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> is the <em>constant number</em> that is as close as possible to <img src="https://latex.codecogs.com/png.latex?Y"> on average, where “close” is measured by squared euclidean distance. In this sense, we can think of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> as our “best guess” of the value that <img src="https://latex.codecogs.com/png.latex?Y"> will take. Again using a bit of calculus, it turns out that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> solves the following optimization problem: <img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7B%5Ctext%7Ball%20functions%20%7D%20g%7D%20%5Cmathbb%7BE%7D%5B%5C%7BY%20-%20g(X)%20%5C%7D%5E2%5D.%0A"> (See <a href="https://drive.explaineverything.com/thecode/YZFFBCH">this video</a> for a proof.) Thus, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> is the <em>function of <img src="https://latex.codecogs.com/png.latex?X"></em> that is <em>as close as possible</em> to <img src="https://latex.codecogs.com/png.latex?Y"> on average, where “close” is measured using squared Euclidean distance. Thus, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> is our “best guess” of <img src="https://latex.codecogs.com/png.latex?Y"> after observing <img src="https://latex.codecogs.com/png.latex?X">. We have seen that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)"> are the solutions to two related but distinct optimization problems; the former is a <em>constant number</em> that doesn’t depend on the realization of <img src="https://latex.codecogs.com/png.latex?X"> whereas the latter is a <em>function of <img src="https://latex.codecogs.com/png.latex?X"></em>. Mean independence is the special case in which the solutions to the two optimization problems coincide: <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)%20=%20%5Cmathbb%7BE%7D(Y)">. Therefore,</p>
<blockquote class="blockquote">
<p><img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X"> if our best guess of <img src="https://latex.codecogs.com/png.latex?Y"> taking <img src="https://latex.codecogs.com/png.latex?X"> into account is the same as our best guess of <img src="https://latex.codecogs.com/png.latex?Y"> ignoring <img src="https://latex.codecogs.com/png.latex?X">, where “best” is defined by “minimizes average squared distance to <img src="https://latex.codecogs.com/png.latex?Y">.”</p>
</blockquote>
<section id="example-1-x-is-mean-independent-of-y." class="level3">
<h3 class="anchored" data-anchor-id="example-1-x-is-mean-independent-of-y.">Example #1: <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">.</h3>
<p>Using the table of joint probabilities for Example #1 above, we found that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X)%20=%200">. To determine whether <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">, we need to calculate <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X%7CY=y)">, which we can accomplish as follows: <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D(X%7CY=0)%20&amp;=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20%5Cmathbb%7BP%7D(X=x%7CY=0)%20=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20%5Cfrac%7B%5Cmathbb%7BP%7D(X=x,Y=0)%7D%7B%5Cmathbb%7BP%7D(Y=0)%7D%5C%5C%20%5C%5C%0A%5Cmathbb%7BE%7D(X%7CY=1)%20&amp;=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20%5Cmathbb%7BP%7D(X=x%7CY=1)%20=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20%5Cfrac%7B%5Cmathbb%7BP%7D(X=x,Y=1)%7D%7B%5Cmathbb%7BP%7D(Y=1)%7D.%0A%5Cend%7Baligned%7D%0A"> Substituting the joint and marginal probabilities from the table above, we find that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(X%7CY=0)%20=%200,%20%5Cquad%0A%5Cmathbb%7BE%7D(X%7CY=1)%20=%200.%0A"> Thus <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X%7CY=y)"> simply equals zero, regardless of the realization <img src="https://latex.codecogs.com/png.latex?y"> of <img src="https://latex.codecogs.com/png.latex?Y">. Since <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X)%20=%200"> we have shown that <img src="https://latex.codecogs.com/png.latex?X"> is conditionally mean independent of <img src="https://latex.codecogs.com/png.latex?Y">.</p>
</section>
<section id="example-1-y-is-not-mean-independent-of-x." class="level3">
<h3 class="anchored" data-anchor-id="example-1-y-is-not-mean-independent-of-x.">Example #1: <img src="https://latex.codecogs.com/png.latex?Y"> is <em>NOT</em> mean independent of <img src="https://latex.codecogs.com/png.latex?X">.</h3>
<p>To determine whether <img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X"> we need to calculate <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)">. But this is easy. From the table we see that <img src="https://latex.codecogs.com/png.latex?Y"> is <em>known with certainty</em> after we observe <img src="https://latex.codecogs.com/png.latex?X">: if <img src="https://latex.codecogs.com/png.latex?X%20=%20-1"> then <img src="https://latex.codecogs.com/png.latex?Y%20=%200">, if <img src="https://latex.codecogs.com/png.latex?X%20=%200"> then <img src="https://latex.codecogs.com/png.latex?Y%20=%201">, and if <img src="https://latex.codecogs.com/png.latex?X%20=%201"> then <img src="https://latex.codecogs.com/png.latex?Y%20=%200">. Thus, without doing any math at all we find that <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(Y%7CX=-1)%20=%200,%20%5Cquad%0A%5Cmathbb%7BE%7D(Y%7CX=0)%20=%201,%20%5Cquad%0A%5Cmathbb%7BE%7D(Y%7CX=1)%20=%200.%0A"> (If you don’t believe me, work through the arithmetic yourself!) This <em>clearly</em> depends on <img src="https://latex.codecogs.com/png.latex?X">, so <img src="https://latex.codecogs.com/png.latex?Y"> is <em>not</em> mean independent of <img src="https://latex.codecogs.com/png.latex?X">.</p>
</section>
<section id="example-2-z-is-not-mean-independent-of-w." class="level3">
<h3 class="anchored" data-anchor-id="example-2-z-is-not-mean-independent-of-w.">Example #2: <img src="https://latex.codecogs.com/png.latex?Z"> is <em>NOT</em> mean independent of <img src="https://latex.codecogs.com/png.latex?W">.</h3>
<p>Above we calculated that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Z)%20=%20%5Cmathbb%7BE%7D(W%5E2)%20=%201/3">. But the conditional expectation is <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(Z%7CW)%20=%20%5Cmathbb%7BE%7D(W%5E2%7CW)%20=%20W%5E2%0A"> using the “taking out what is known” property: conditional on <img src="https://latex.codecogs.com/png.latex?W">, we know <img src="https://latex.codecogs.com/png.latex?W%5E2"> and can hence treat it as though it were a constant in an unconditional expectation, pulling it in front of the <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D"> operator. We see that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Z%7CW)"> does not equal <img src="https://latex.codecogs.com/png.latex?1/3">: its value depends on <img src="https://latex.codecogs.com/png.latex?W">. Therefore <img src="https://latex.codecogs.com/png.latex?Z"> is not mean independent of <img src="https://latex.codecogs.com/png.latex?W">.</p>
</section>
<section id="example-2-w-is-mean-independent-of-z." class="level3">
<h3 class="anchored" data-anchor-id="example-2-w-is-mean-independent-of-z.">Example #2: <img src="https://latex.codecogs.com/png.latex?W"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Z">.</h3>
<p>This one is trickier. To keep this post at an elementary level, my explanation won’t be completely rigorous. For more details <a href="https://math.stackexchange.com/questions/829779/conditional-expectation-of-x-given-x2">see here</a>. We need to calculate <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(W%7CZ)">. Since <img src="https://latex.codecogs.com/png.latex?Z%20%5Cequiv%20W%5E2"> this is the same thing as <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(W%7CW%5E2)">. Let’s start with an example. Suppose we observe <img src="https://latex.codecogs.com/png.latex?Z%20=%201">. This means that <img src="https://latex.codecogs.com/png.latex?W%5E2%20=%201"> so <img src="https://latex.codecogs.com/png.latex?W"> either equals <img src="https://latex.codecogs.com/png.latex?1"> or <img src="https://latex.codecogs.com/png.latex?-1">. How likely is each of these possible realizations of <img src="https://latex.codecogs.com/png.latex?W"> given that <img src="https://latex.codecogs.com/png.latex?W%5E2%20=%201">? Because the density of <img src="https://latex.codecogs.com/png.latex?W"> is <em>symmetric about zero</em>, <img src="https://latex.codecogs.com/png.latex?f_W(-1)%20=%20f_W(1)">. So given that <img src="https://latex.codecogs.com/png.latex?W%5E2%20=%201">, it is just as likely that <img src="https://latex.codecogs.com/png.latex?W%20=%201"> as it is that <img src="https://latex.codecogs.com/png.latex?W%20=%20-1">. Therefore, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(W%7CW%5E2%20=%201)%20=%200.5%20%5Ctimes%201%20+%200.5%20%5Ctimes%20-1%20=%200.%0A"> Generalizing this idea, if we observe <img src="https://latex.codecogs.com/png.latex?Z%20=%20z"> then <img src="https://latex.codecogs.com/png.latex?W%20=%20%5Csqrt%7Bz%7D"> or <img src="https://latex.codecogs.com/png.latex?-%5Csqrt%7Bz%7D">. But since <img src="https://latex.codecogs.com/png.latex?f_W(%5Ccdot)"> is symmetric about zero, these possibilities are equally likely. Therefore, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(W%7CZ=z)%20=%200.5%20%5Ctimes%20%5Csqrt%7Bz%7D%20-%200.5%20%5Ctimes%20%5Csqrt%7Bz%7D%20=%200.%0A"> Above we calculated that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(W)%20=%200">. Therefore, <img src="https://latex.codecogs.com/png.latex?W"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Z">.</p>
</section>
</section>
<section id="statistical-independence" class="level2">
<h2 class="anchored" data-anchor-id="statistical-independence">Statistical Independence</h2>
<p>When you see the word “independent” without any qualification, this means “statistically independent.” In keeping with this usage, I often write “independent” rather than “statistically independent.” Whichever terminology you prefer, there are three equivalent ways of defining this idea:</p>
<blockquote class="blockquote">
<p><img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are statistically independent if and only if:</p>
<ol type="1">
<li>their joint distribution equals the product of their marginals, or</li>
<li>the conditional distribution of <img src="https://latex.codecogs.com/png.latex?Y%7CX"> equals the unconditional distribution of <img src="https://latex.codecogs.com/png.latex?Y">, or</li>
<li>the conditional distribution of <img src="https://latex.codecogs.com/png.latex?X%7CY"> equals the unconditional distribution of <img src="https://latex.codecogs.com/png.latex?X">.</li>
</ol>
</blockquote>
<p>The link between these three alternatives is the <em>definition of conditional probability</em>. Suppose that <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are discrete random variables with joint pmf <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D">, marginal pmfs <img src="https://latex.codecogs.com/png.latex?p_X"> and <img src="https://latex.codecogs.com/png.latex?p_Y">, and conditional pmfs <img src="https://latex.codecogs.com/png.latex?p_%7BX%7CY%7D"> and <img src="https://latex.codecogs.com/png.latex?p_%7BY%7CX%7D">. Version 1 requires that <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D(x,y)%20=%20p_X(x)%20p_Y(y)"> for all realizations <img src="https://latex.codecogs.com/png.latex?x,y">. But by the definition of conditional probability, <img src="https://latex.codecogs.com/png.latex?%0Ap_%7BX%7CY%7D(x%7Cy)%20%5Cequiv%20%5Cfrac%7Bp_%7BXY%7D(x,y)%7D%7Bp_Y(y)%7D,%20%5Cquad%0Ap_%7BY%7CX%7D(y%7Cx)%20%5Cequiv%20%5Cfrac%7Bp_%7BXY%7D(x,y)%7D%7Bp_X(x)%7D.%0A"> If <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D%20=%20p_X%20p_Y">, these expressions simplify to <img src="https://latex.codecogs.com/png.latex?%0Ap_%7BX%7CY%7D(x%7Cy)%20%5Cequiv%20%5Cfrac%7Bp_%7BX%7D(x)p_Y(y)%7D%7Bp_Y(y)%7D%20=%20p_X(x),%20%5Cquad%0Ap_%7BY%7CX%7D(y%7Cx)%20%5Cequiv%20%5Cfrac%7Bp_%7BX%7D(x)p_Y(y)%7D%7Bp_X(x)%7D%20=%20p_Y(y)%0A"> so 1 implies 2 and 3. Similarly, if <img src="https://latex.codecogs.com/png.latex?p_%7BX%7CY%7D=p_X"> then by the definition of conditional probability <img src="https://latex.codecogs.com/png.latex?%0Ap_%7BX%7CY%7D(x%7Cy)%20%5Cequiv%20%5Cfrac%7Bp_%7BXY%7D(x,y)%7D%7Bp_Y(y)%7D%20=%20p_X(x).%0A"> Re-arranging, this shows that <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D%20=%20p_X%20p_Y">, so 3 implies 1. An almost identical argument shows that 2 implies 1, completing our proof that these three seemingly different definitions of statistical independence are equivalent. If <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are continuous, the idea is the same but with densities replacing probability mass functions, e.g.&nbsp;<img src="https://latex.codecogs.com/png.latex?f_%7BXY%7D(x,y)%20=%20f_X(x)%20f_Y(y)"> and so on.</p>
<p>In most examples, it’s easier to show independence (or the lack thereof) using 2 or 3 rather than 1. These latter two definitions are also more intuitively appealing. To say that the conditional distribution of <img src="https://latex.codecogs.com/png.latex?X%7CY"> is the same as the unconditional distribution of <img src="https://latex.codecogs.com/png.latex?X"> is the same thing as saying that</p>
<blockquote class="blockquote">
<p><img src="https://latex.codecogs.com/png.latex?Y"> provides absolutely no information about <img src="https://latex.codecogs.com/png.latex?X"> whatsoever.</p>
</blockquote>
<p>If learning <img src="https://latex.codecogs.com/png.latex?Y"> tells us anything at all about <img src="https://latex.codecogs.com/png.latex?X">, then <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are not independent. Similarly, if <img src="https://latex.codecogs.com/png.latex?X"> tells us anything about <img src="https://latex.codecogs.com/png.latex?Y"> at all, then <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are not independent.</p>
<section id="example-1-x-and-y-are-not-independent." class="level3">
<h3 class="anchored" data-anchor-id="example-1-x-and-y-are-not-independent.">Example #1: <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are <em>NOT</em> independent.</h3>
<p>If I tell you that <img src="https://latex.codecogs.com/png.latex?X%20=%200">, then you know for sure that <img src="https://latex.codecogs.com/png.latex?Y%20=%200">. Before I told you this, you did not know that <img src="https://latex.codecogs.com/png.latex?Y"> would equal zero: it’s a random variable with support set <img src="https://latex.codecogs.com/png.latex?%5C%7B0,1%5C%7D">. Since learning <img src="https://latex.codecogs.com/png.latex?X"> has the potential to tell you something about <img src="https://latex.codecogs.com/png.latex?Y">, <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are not independent. That was easy! For extra credit, <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D(-1,0)%20=%201/3"> but <img src="https://latex.codecogs.com/png.latex?p_X(-1)p_Y(0)%20=%201/3%20%5Ctimes%202/3%20=%202/9">. Since these are not equal, <img src="https://latex.codecogs.com/png.latex?p_%7BXY%7D%5Cneq%20p_X%20p_Y"> so the joint doesn’t equal the product of the marginals. We didn’t need to check this, but it’s reassuring to see that everything works out as it should.</p>
</section>
<section id="example-2-w-and-z-are-not-independent." class="level3">
<h3 class="anchored" data-anchor-id="example-2-w-and-z-are-not-independent.">Example #2: <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are <em>NOT</em> independent.</h3>
<p>Again, this one is easy: learning that <img src="https://latex.codecogs.com/png.latex?W%20=%20w"> tells us that <img src="https://latex.codecogs.com/png.latex?Z%20=%20w%5E2">. We didn’t know this before, so <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> cannot be independent.</p>
</section>
</section>
<section id="relating-the-three-properties" class="level2">
<h2 class="anchored" data-anchor-id="relating-the-three-properties">Relating the Three Properties</h2>
<p>Now that we’ve described uncorrelatedness, mean independence, and statistical independence, we’re ready to see how these properties relate to one another. Let’s start by reviewing what we learned from the examples given above. In example #1:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are uncorrelated</li>
<li><img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y"></li>
<li><img src="https://latex.codecogs.com/png.latex?Y"> is <em>not mean independent</em> of <img src="https://latex.codecogs.com/png.latex?X"></li>
<li><img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are <em>not</em> independent.</li>
</ul>
<p>In example #2, we found that</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are uncorrelated</li>
<li><img src="https://latex.codecogs.com/png.latex?W"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Z">.</li>
<li><img src="https://latex.codecogs.com/png.latex?Z"> is <em>not</em> mean independent of <img src="https://latex.codecogs.com/png.latex?W">.</li>
<li><img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z"> are <em>not</em> independent.</li>
</ul>
<p>These are worth remembering, because they are relatively simple and provide a source of <em>counterexamples</em> to help you avoid making tempting but incorrect statements about correlation, mean independence, and statistical independence. For example:</p>
<ol type="1">
<li>Uncorrelatedness does <strong>NOT IMPLY</strong> statistical independence: <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are not independent, but they are uncorrelated. (Ditto for <img src="https://latex.codecogs.com/png.latex?W"> and <img src="https://latex.codecogs.com/png.latex?Z">.)</li>
<li>Mean independence does <strong>NOT IMPLY</strong> statistical independence: <img src="https://latex.codecogs.com/png.latex?W"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Z"> but these random variables are not independent.</li>
<li>Mean independence is <strong>NOT SYMMETRIC</strong>: <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">, but <img src="https://latex.codecogs.com/png.latex?Y"> is not mean independent of <img src="https://latex.codecogs.com/png.latex?X">.</li>
</ol>
<p>Now that we have a handle on what’s <em>not true</em>, let’s see what can be said about correlation, mean independence, and statistical independence.</p>
<section id="uncorrelatedness-and-statistical-independence-are-symmetric" class="level3">
<h3 class="anchored" data-anchor-id="uncorrelatedness-and-statistical-independence-are-symmetric">Uncorrelatedness and Statistical Independence are Symmetric</h3>
<p>In the equality <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(XY)%20=%20%5Cmathbb%7BE%7D(X)%20%5Cmathbb%7BE%7D(Y)">, nothing changes if we swap the roles of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">; this statement is equivalent to <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(YX)%20=%20%5Cmathbb%7BE%7D(Y)%20%5Cmathbb%7BE%7D(X)">. This shows that uncorrelatedness is <em>symmetric</em>. The same goes for statistical independence: we showed that <img src="https://latex.codecogs.com/png.latex?p_%7BY%7CX%7D%20=%20p_Y"> is equivalent to <img src="https://latex.codecogs.com/png.latex?p_%7BX%7CY%7D%20=%20p_X"> above. In contrast, mean independence is not symmetric: <img src="https://latex.codecogs.com/png.latex?X"> can be mean independent of <img src="https://latex.codecogs.com/png.latex?Y"> without <img src="https://latex.codecogs.com/png.latex?Y"> being mean independent of <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>Here’s an analogy: uncorrelatedness and independence are like the relation “being biological siblings.” If <img src="https://latex.codecogs.com/png.latex?X"> is the sibling of <img src="https://latex.codecogs.com/png.latex?Y">, then <img src="https://latex.codecogs.com/png.latex?Y"> must be the sibling of <img src="https://latex.codecogs.com/png.latex?X"> because “being siblings” is defined as “having the same parents.” In contrast, mean independence is like the relation “being in love.” Sadly, it’s possible for <img src="https://latex.codecogs.com/png.latex?X"> to be in love with <img src="https://latex.codecogs.com/png.latex?Y"> despite <img src="https://latex.codecogs.com/png.latex?Y"> not being in love with <img src="https://latex.codecogs.com/png.latex?X">.<sup>4</sup></p>
</section>
<section id="statistical-independence-implies-conditional-mean-independence" class="level3">
<h3 class="anchored" data-anchor-id="statistical-independence-implies-conditional-mean-independence">Statistical Independence Implies Conditional Mean Independence</h3>
<p>Statistical independence is the “strongest” of the three properties: it implies both mean independence and uncorrelatedness. We’ll show this in two steps. In the first step, we’ll show that statistical independence implies mean independence. In the second step we’ll show that mean independence implies uncorrelatedness. Then we’ll bring this overly-long blog post to a close! Suppose that <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are discrete random variables. (For the continuous case, replace sums with integrals.) If <img src="https://latex.codecogs.com/png.latex?X"> is statistically independent of <img src="https://latex.codecogs.com/png.latex?Y">, then <img src="https://latex.codecogs.com/png.latex?p_%7BY%7CX%7D%20=%20p_Y"> and <img src="https://latex.codecogs.com/png.latex?p_%7BX%7CY%7D%20=%20p_X">. Hence, <img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Cmathbb%7BE%7D(Y%7CX=x)%20&amp;%5Cequiv%20%5Csum_%7B%5Ctext%7Ball%20%7D%20y%7D%20y%20%5Ccdot%20p_%7BY%7CX%7D(y%7Cx)%20=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20y%7D%20y%20%5Ccdot%20p_Y(y)%20%5Cequiv%20%5Cmathbb%7BE%7D(Y)%5C%5C%0A%5Cmathbb%7BE%7D(X%7CY=y)%20&amp;%5Cequiv%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20p_%7BX%7CY%7D(x%7Cy)%20=%20%5Csum_%7B%5Ctext%7Ball%20%7D%20x%7D%20x%20%5Ccdot%20p_X(x)%20%5Cequiv%20%5Cmathbb%7BE%7D(X)%0A%5Cend%7Baligned%7D%0A"> so <img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">.</p>
<!--**Bonus fact: statistical independence implies $\mathbb{E}[g(X) h(Y)] = \mathbb{E}[g(X)]\mathbb{E}[h(Y)]$ for any functions $h$ and $g$.**-->
</section>
<section id="conditional-mean-independence-implies-uncorrelatedness" class="level3">
<h3 class="anchored" data-anchor-id="conditional-mean-independence-implies-uncorrelatedness">Conditional Mean Independence Implies Uncorrelatedness</h3>
<p>If <em>either</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)%20=%20%5Cmathbb%7BE%7D(Y)"> <em>or</em> <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X%7CY)%20=%20%5Cmathbb%7BE%7D(X)">, then <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are uncorrelated. To show this, we use the <strong>Law of Iterated Expectations</strong> and the “taking out what is known” property, along with the fact that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X)"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y)"> are constants. Suppose first that <img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X">, i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(Y%7CX)%20=%20%5Cmathbb%7BE%7D(Y)">. Then, taking iterated expectations over <img src="https://latex.codecogs.com/png.latex?X">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(XY)%20=%20%5Cmathbb%7BE%7D%5B%5Cmathbb%7BE%7D(XY%7CX)%5D%20=%20%5Cmathbb%7BE%7D%5BX%20%5Cmathbb%7BE%7D(Y%7CX)%5D%20=%20%5Cmathbb%7BE%7D%5BX%20%5Cmathbb%7BE%7D(Y)%5D%20=%20%5Cmathbb%7BE%7D(X)%20%5Cmathbb%7BE%7D(Y).%0A"> Alternatively, suppose that <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">, i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D(X%7CY)%20=%20%5Cmathbb%7BE%7D(X)">. Then, taking iterated expectations over <img src="https://latex.codecogs.com/png.latex?Y">, <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D(XY)%20=%20%5Cmathbb%7BE%7D%5B%5Cmathbb%7BE%7D(XY%7CY)%5D%20=%20%5Cmathbb%7BE%7D%5BY%5Cmathbb%7BE%7D(X%7CY)%5D%20=%20%5Cmathbb%7BE%7D%5BY%20%5Cmathbb%7BE%7D(X)%5D%20=%20%5Cmathbb%7BE%7D(Y)%20%5Cmathbb%7BE%7D(X).%0A"> Therefore, if either <img src="https://latex.codecogs.com/png.latex?X"> is mean independent of <img src="https://latex.codecogs.com/png.latex?Y">, or <img src="https://latex.codecogs.com/png.latex?Y"> is mean independent of <img src="https://latex.codecogs.com/png.latex?X">, or both, then <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are uncorrelated. Since statistical independence implies mean independence, it follows that statistical independence implies uncorrelatedness. And we’re finally done!</p>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>In this post we have shown that:</p>
<ul>
<li>Statistical Independence <img src="https://latex.codecogs.com/png.latex?%5Cimplies"> Mean Independence <img src="https://latex.codecogs.com/png.latex?%5Cimplies"> Uncorrelatedness.</li>
<li>Uncorrelatedness does not imply mean independence or statistical independence.</li>
<li>Mean independence does not imply statistical independence.</li>
<li>Statistical independence and correlation are symmetric; mean independence is not.</li>
</ul>
<p>Reading the figure from the very beginning of this post from top to bottom: statistical independence is the <em>strongest</em> notion, followed by mean independence, followed by uncorrelatedness.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>It turns out that teaching well is extremely hard. I am incredibly grateful to those intrepid souls who bravely raise their hand and inform me that no one in the room has any idea what I’m talking about!↩︎</p></li>
<li id="fn2"><p>I used a small number of simulation draws so it would be easier to see the data in the plot. If you use a larger number of simulations, the correlation will be even closer to zero and the line almost perfectly flat.↩︎</p></li>
<li id="fn3"><p>Throughout this post, I make the tacit assumption that all means–conditional or unconditional–exist and are finite.↩︎</p></li>
<li id="fn4"><p>But on the plus side, we got a lot of great pop songs out of the deal!↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>econometrics</category>
  <category>teaching</category>
  <guid>https://www.econometrics.blog/post/why-econometrics-is-confusing-part-ii-the-independence-zoo/</guid>
  <pubDate>Sun, 01 Jan 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>From the Poisson Distribution to Stirling’s Approximation</title>
  <dc:creator>Francis J. DiTraglia</dc:creator>
  <link>https://www.econometrics.blog/post/from-the-poisson-distribution-to-stirling-s-approximation/</link>
  <description><![CDATA[ 




<p>The <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson distribution</a> is the most famous probability model for <em>counts</em>, non-negative integer values. Many real-world phenomena are well approximated by this distribution, including the <a href="https://www.jstor.org/stable/41138751">number of German bombs</a> that landed in 1/4km grid squares in south London during WWII. Formally, we say that a discrete random variable <img src="https://latex.codecogs.com/png.latex?X"> follows a Poisson distribution with rate parameter <img src="https://latex.codecogs.com/png.latex?%5Cmu%20%3E%200">, abbreviated <img src="https://latex.codecogs.com/png.latex?X%20%5Csim%20%5Ctext%7BPoisson%7D(%5Cmu)">, if <img src="https://latex.codecogs.com/png.latex?X"> has support set <img src="https://latex.codecogs.com/png.latex?%5C%7B0,%201,%202,%20...%5C%7D"> and probability mass function <img src="https://latex.codecogs.com/png.latex?%0Ap(x)%20%5Cequiv%20%5Cmathbb%7BP%7D(X=x)%20=%20%5Cfrac%7Be%5E%7B-%5Cmu%20%7D%5Cmu%5Ex%7D%7Bx!%7D.%0A"> Using some <a href="https://drive.explaineverything.com/thecode/CHAKTHR">clever algebra with sums</a> it’s not too hard to show that the rate parameter, <img src="https://latex.codecogs.com/png.latex?%5Cmu">, is <em>both the mean and the variance</em> of <img src="https://latex.codecogs.com/png.latex?X">.</p>
<section id="numerical-problems-try-taking-logs." class="level2">
<h2 class="anchored" data-anchor-id="numerical-problems-try-taking-logs.">Numerical problems? Try taking logs.</h2>
<p>Now, suppose that we wanted to plot the pmf of a Poisson RV with rate <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%20171">. The R function for the pmf of a Poisson RV is <code>dpois()</code>, so we can make our plot as follows (indicating the rate parameter as a vertical line)</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dpois</span>(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(x, p)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_vline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Poisson(171) pmf'</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.econometrics.blog/post/from-the-poisson-distribution-to-stirling-s-approximation/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>For such a large value of <img src="https://latex.codecogs.com/png.latex?%5Cmu">, this distribution looks decidedly bell-shaped. And indeed, it turns out to be extremely well-approximated by a normal distribution, as we’ll see below. It’s also clear that <img src="https://latex.codecogs.com/png.latex?X"> is most likely to take on a value relatively close to 171. We can use <code>dpois()</code> to calculate the exact probability that <img src="https://latex.codecogs.com/png.latex?X%20=%20171"> as follows: the answer is just over 3%.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dpois</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.03049301</code></pre>
</div>
</div>
<p>Now let’s try to calculate exactly the same probability <em>by hand</em>, that is by using the formula for the Poisson pmf from above.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">my_dpois <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x, mu) {</span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>mu) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mu<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factorial</span>(x)</span>
<span id="cb4-3">}</span>
<span id="cb4-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">my_dpois</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] NaN</code></pre>
</div>
</div>
<p>What gives?! The abbreviation <code>NaN</code> stands for “not a number.” The problem in this case is that both the numerator and denominator of the fraction inside of <code>my_dpois()</code> evaluate to infinity when <code>mu</code> and <code>x</code> are 171, and the ratio <img src="https://latex.codecogs.com/png.latex?%5Cinfty/%5Cinfty"> is undefined.<sup>1</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">numerator =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">denominator =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factorial</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  numerator denominator 
        Inf         Inf </code></pre>
</div>
</div>
<p>As I discussed in an <a href="../../post/street-fighting-numerical-analysis-part-1/">earlier post</a>, computers can only store a finite number of distinct numeric values. It’s not literally true that <code>factorial(171)</code> equals <img src="https://latex.codecogs.com/png.latex?%5Cinfty">. What’s really going on here is that <code>factorial(171)</code> is <em>such a large number</em> that it can’t be stored as a <a href="https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html">floating-point number</a>. In this case there’s a very simple fix. If you haven’t seen this trick before, it’s a helpful one to keep up your sleeves: <strong>if you run into numerical problems with very large or very small values, try taking logs.</strong><sup>2</sup> The log of the Poisson pmf is simply <img src="https://latex.codecogs.com/png.latex?%0A%5Clog%20p(x)%20=%20-%5Cmu%20+%20x%20%5Clog(%5Cmu)%20-%20%5Clog(x!).%0A"> R even has a convenient, built-in function for evaluating the natural log of a factorial: <code>lfactorial()</code>. Now we can compute the log of our desired probability as follows:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lfactorial</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -3.490258</code></pre>
</div>
</div>
<p>To obtain the probability, simply exponentiate:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lfactorial</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">171</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.03049301</code></pre>
</div>
</div>
<p>Of course this just passes the buck to <code>lfactorial()</code>. So how does this mysterious function work? The bad news is that I’m not going to tell you; the good news is that I’m going to show you something <em>even better</em>, namely <a href="https://en.wikipedia.org/wiki/Stirling%27s_approximation">Stirling’s approximation</a>: a way to understand how <img src="https://latex.codecogs.com/png.latex?n!"> behaves <em>qualitatively</em> that turns out to give a pretty darned good approximation to <code>lfactorial()</code>. This may seem like an odd topic for a blog devoted to econometrics and statistics, so allow me to offer a few words of justification. First, computations involving <img src="https://latex.codecogs.com/png.latex?n!"> come up all the time in applied work. Second, it can be extremely helpful for certain theoretical arguments to have good approximations to <img src="https://latex.codecogs.com/png.latex?n!"> for large values of <img src="https://latex.codecogs.com/png.latex?n">. Finally, and most importantly from my perspective, the heuristic argument I’ll use below relies on none other than the <a href="../../post/thirty-isn-t-the-magic-number/">central limit theorem</a>. So even if you’ve seen a more traditional proof of Stirling’s approximation, I hope you’ll enjoy this alternative approach.<sup>3</sup></p>
</section>
<section id="stirlings-approximation" class="level2">
<h2 class="anchored" data-anchor-id="stirlings-approximation">Stirling’s Approximation</h2>
<p>The key step in our argument is to show that the pmf of a <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BPoisson%7D(%5Cmu)"> random variable is well-approximated by the <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BNormal%7D(%5Cmu,%20%5Cmu)"> density. This explains the bell-shaped curve that we plotted above. To obtain this result, we’ll use the central limit theorem. But there is one fact that you will need to take on faith if you don’t already know it: if <img src="https://latex.codecogs.com/png.latex?X_1%20%5Csim%20%5Ctext%7BPoisson%7D(%5Cmu_1)"> is independent of <img src="https://latex.codecogs.com/png.latex?X_2%20%5Csim%20%5Ctext%7BPoisson%7D(%5Cmu_2)"> then <img src="https://latex.codecogs.com/png.latex?X_1%20+%20X_2%20%5Csim%20%5Ctext%7BPoisson%7D(%5Cmu_1%20+%20%5Cmu_2)">. Proceeding <a href="https://en.wikipedia.org/wiki/Mathematical_induction">by induction</a> we can view a Poisson(171) random variable as the sum of 171 independent Poisson(1) random variables. More generally, we can view a Poisson RV with rate parameter <img src="https://latex.codecogs.com/png.latex?n"> as the sum of <img src="https://latex.codecogs.com/png.latex?n"> iid Poisson(1) random variables. By the <a href="../../post/thirty-isn-t-the-magic-number/">central limit theorem</a>, it follows that <img src="https://latex.codecogs.com/png.latex?%0A%5Csqrt%7Bn%7D(%5Cbar%7BX%7D_n%20-%201)%20%5Crightarrow_d%20%5Ctext%7BN%7D(0,1)%0A"> since the mean and variance of a Poisson(1) RV are both equal to one. From a practical perspective, this means that <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D(%5Cbar%7BX%7D_n%20-%201)"> is approximately equal to <img src="https://latex.codecogs.com/png.latex?Z">, a standard normal random variable. Re-arranging, <img src="https://latex.codecogs.com/png.latex?%0AX_1%20+%20X_2%20+%20...%20+%20X_n%20=%20n%5Cbar%7BX%7D_n%20=%20n%20+%20%5Csqrt%7Bn%7D%20%5Ctimes%20%5B%5Csqrt%7Bn%7D(%5Cbar%7BX%7D_n%20-%201)%5D%20%5Capprox%20n%20+%20%5Csqrt%7Bn%7D%20Z%0A"> and <img src="https://latex.codecogs.com/png.latex?n%20+%20%5Csqrt%7Bn%7D%20Z"> is simply a <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BN%7D(n,%20n)"> random variable! This is a quick way of seeing why the <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BPoisson%7D(%5Cmu)"> distribution is well-approximated by the <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BN%7D(%5Cmu,%20%5Cmu)"> distribution when <img src="https://latex.codecogs.com/png.latex?%5Cmu"> is large.</p>
<p>Now let’s run with this. As we just saw, for large <img src="https://latex.codecogs.com/png.latex?%5Cmu"> the Poisson<img src="https://latex.codecogs.com/png.latex?(%5Cmu)"> pmf is well-approximated by the Normal<img src="https://latex.codecogs.com/png.latex?(%5Cmu,%20%5Cmu)"> density: <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Be%5E%7B-%5Cmu%7D%5Cmu%5Ex%7D%7Bx!%7D%20%5Capprox%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%20%5Cmu%7D%7D%20%5Cexp%5Cleft%5C%7B%20-%5Cfrac%7B1%7D%7B2%7D%5Cleft(%20%5Cfrac%7Bx%20-%20%5Cmu%7D%7B%5Csqrt%7B%5Cmu%7D%7D%5Cright)%5E2%5Cright%5C%7D%0A"> This approximation is particularly accurate for <img src="https://latex.codecogs.com/png.latex?x"> near the <em>mean</em>. This is convenient, because substituting <img src="https://latex.codecogs.com/png.latex?%5Cmu"> for <img src="https://latex.codecogs.com/png.latex?x"> considerably simplifies the right hand side: <img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7Be%5E%7B-%5Cmu%7D%5Cmu%5E%5Cmu%7D%7B%5Cmu!%7D%20%5Capprox%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%5Cmu%7D%7D%0A"> Re-arranging, we obtain <img src="https://latex.codecogs.com/png.latex?%0A%5Cmu!%20%5Capprox%20%5Cmu%5E%5Cmu%20e%5E%7B-%5Cmu%7D%20%5Csqrt%7B2%20%5Cpi%20%5Cmu%7D%0A"> Taking logs of both sides gives: <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(%5Cmu!)%20%5Capprox%20%5Cmu%20%5Clog(%5Cmu)%20-%20%5Cmu%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Clog(2%20%5Cpi%20%5Cmu)%0A"> Writing this with <img src="https://latex.codecogs.com/png.latex?n"> in place of <img src="https://latex.codecogs.com/png.latex?%5Cmu"> gives the following: <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(n!)%20%5Capprox%20n%20%5Clog(n)%20-%20n%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Clog(2%20%5Cpi%20n)%0A"> This is called <em>Stirling’s Approximation</em>. The usual way of writing this excludes the <img src="https://latex.codecogs.com/png.latex?%5Clog(2%5Cpi%20n)/2"> term, yielding <img src="https://latex.codecogs.com/png.latex?%5Clog(n!)%20%5Capprox%20n%5Clog(n)%20-%20n">, which is fairly easy to remember. Including the extra term, however, gives increased accuracy for smaller values of <img src="https://latex.codecogs.com/png.latex?n">. While I haven’t formally proved this, it turns out that <img src="https://latex.codecogs.com/png.latex?%0A%5Clog(n!)%20%5Csim%20n%20%5Clog(n)%20-%20n%20+%20%5Cfrac%7B1%7D%7B2%7D%20%5Clog(2%20%5Cpi%20n)%0A"> as <img src="https://latex.codecogs.com/png.latex?n%20%5Crightarrow%20%5Cinfty">. In other words, the ratio of the LHS and RHS tends to one in the large <img src="https://latex.codecogs.com/png.latex?n"> limit. Perhaps surprisingly, this approximation is extremely accurate even for fairly small values of <img src="https://latex.codecogs.com/png.latex?n">, as we can see by comparing it against <code>lfactorial()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">stirling1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n) n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n</span>
<span id="cb12-2">stirling2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n) n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n)</span>
<span id="cb12-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Stirling1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stirling1</span>(n),</span>
<span id="cb12-5">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Stirling2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stirling2</span>(n),</span>
<span id="cb12-6">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">R =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lfactorial</span>(n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-7">  knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kable</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: right;">n</th>
<th style="text-align: right;">Stirling1</th>
<th style="text-align: right;">Stirling2</th>
<th style="text-align: right;">R</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: right;">-1.000</td>
<td style="text-align: right;">-0.081</td>
<td style="text-align: right;">0.000</td>
</tr>
<tr class="even">
<td style="text-align: right;">2</td>
<td style="text-align: right;">-0.614</td>
<td style="text-align: right;">0.652</td>
<td style="text-align: right;">0.693</td>
</tr>
<tr class="odd">
<td style="text-align: right;">3</td>
<td style="text-align: right;">0.296</td>
<td style="text-align: right;">1.764</td>
<td style="text-align: right;">1.792</td>
</tr>
<tr class="even">
<td style="text-align: right;">4</td>
<td style="text-align: right;">1.545</td>
<td style="text-align: right;">3.157</td>
<td style="text-align: right;">3.178</td>
</tr>
<tr class="odd">
<td style="text-align: right;">5</td>
<td style="text-align: right;">3.047</td>
<td style="text-align: right;">4.771</td>
<td style="text-align: right;">4.787</td>
</tr>
<tr class="even">
<td style="text-align: right;">6</td>
<td style="text-align: right;">4.751</td>
<td style="text-align: right;">6.565</td>
<td style="text-align: right;">6.579</td>
</tr>
<tr class="odd">
<td style="text-align: right;">7</td>
<td style="text-align: right;">6.621</td>
<td style="text-align: right;">8.513</td>
<td style="text-align: right;">8.525</td>
</tr>
<tr class="even">
<td style="text-align: right;">8</td>
<td style="text-align: right;">8.636</td>
<td style="text-align: right;">10.594</td>
<td style="text-align: right;">10.605</td>
</tr>
<tr class="odd">
<td style="text-align: right;">9</td>
<td style="text-align: right;">10.775</td>
<td style="text-align: right;">12.793</td>
<td style="text-align: right;">12.802</td>
</tr>
<tr class="even">
<td style="text-align: right;">10</td>
<td style="text-align: right;">13.026</td>
<td style="text-align: right;">15.096</td>
<td style="text-align: right;">15.104</td>
</tr>
<tr class="odd">
<td style="text-align: right;">11</td>
<td style="text-align: right;">15.377</td>
<td style="text-align: right;">17.495</td>
<td style="text-align: right;">17.502</td>
</tr>
<tr class="even">
<td style="text-align: right;">12</td>
<td style="text-align: right;">17.819</td>
<td style="text-align: right;">19.980</td>
<td style="text-align: right;">19.987</td>
</tr>
<tr class="odd">
<td style="text-align: right;">13</td>
<td style="text-align: right;">20.344</td>
<td style="text-align: right;">22.546</td>
<td style="text-align: right;">22.552</td>
</tr>
<tr class="even">
<td style="text-align: right;">14</td>
<td style="text-align: right;">22.947</td>
<td style="text-align: right;">25.185</td>
<td style="text-align: right;">25.191</td>
</tr>
<tr class="odd">
<td style="text-align: right;">15</td>
<td style="text-align: right;">25.621</td>
<td style="text-align: right;">27.894</td>
<td style="text-align: right;">27.899</td>
</tr>
<tr class="even">
<td style="text-align: right;">16</td>
<td style="text-align: right;">28.361</td>
<td style="text-align: right;">30.667</td>
<td style="text-align: right;">30.672</td>
</tr>
<tr class="odd">
<td style="text-align: right;">17</td>
<td style="text-align: right;">31.165</td>
<td style="text-align: right;">33.500</td>
<td style="text-align: right;">33.505</td>
</tr>
<tr class="even">
<td style="text-align: right;">18</td>
<td style="text-align: right;">34.027</td>
<td style="text-align: right;">36.391</td>
<td style="text-align: right;">36.395</td>
</tr>
<tr class="odd">
<td style="text-align: right;">19</td>
<td style="text-align: right;">36.944</td>
<td style="text-align: right;">39.335</td>
<td style="text-align: right;">39.340</td>
</tr>
<tr class="even">
<td style="text-align: right;">20</td>
<td style="text-align: right;">39.915</td>
<td style="text-align: right;">42.331</td>
<td style="text-align: right;">42.336</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<section id="epilogue" class="level2">
<h2 class="anchored" data-anchor-id="epilogue">Epilogue</h2>
<p>I have a bad habit of trying to add a “moral” or “lesson” to the end of my posts, but I suppose there’s no point trying to break the habit today! While there are easier ways to derive Stirling’s approximation, there are two things I enjoy about this one. First, we get a more accurate approximation than <img src="https://latex.codecogs.com/png.latex?n%20%5Clog(n)%20-%20n"> with practically no effort. Second, making unexpected connections between facts that we already know both <em>deepens our understanding</em> and helps us “compress” information. If you ever forget Stirling’s approximation, now you know how to very quickly re-derive it on the spot!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>There’s an important but subtle difference between <code>NA</code> and <code>NaN</code>. The former is synonymous with “missing.” If <code>x</code> equals <code>NA</code> this means “we don’t know the value of <code>x</code>.” If instead <code>x</code> equals <code>NaN</code>, this means “<code>x</code> isn’t missing, but it’s not a well-defined numeric value either.”↩︎</p></li>
<li id="fn2"><p>Unless otherwise specified log always means “natural logarithm” on this blog :)↩︎</p></li>
<li id="fn3"><p>I first came across this argument from the late David MacKay’s fantastic book <a href="http://www.inference.org.uk/mackay/itila/book.html">Information Theory, Inference, and Learning Algorithms</a>. His book on <a href="http://www.withouthotair.com/">sustainable energy</a>, while a bit out-of-date at this point, is also spectacularly good.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <category>computing</category>
  <guid>https://www.econometrics.blog/post/from-the-poisson-distribution-to-stirling-s-approximation/</guid>
  <pubDate>Fri, 18 Nov 2022 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
