IV — Motivation

Natasha Kang

Xiamen University, Chow Institute

May, 2026

Roadmap

  1. When conditioning fails: three sources of endogeneity
  2. Measurement error and attenuation bias
  3. The IV idea: relevance and exogeneity
  4. Identifying the effect: the Wald ratio
  5. What does IV estimate? LATE

The Story So Far

  • Parts II–III: OLS identifies causal effects when we can condition on the right controls.
  • Lect4c: DAGs help us choose controls. But what if…
  • The confounder is unobservable (ability, motivation, talent)?
  • No set of observed controls can close the back-door path.
  • We need a fundamentally different strategy.

Three Sources of Endogeneity

All three violate the zero conditional mean assumption \(E[u \mid X] \neq 0\):

Source Description Example
Omitted variables Unobserved confounder in \(u\) Ability in wage equation
Simultaneity \(X\) and \(Y\) determined jointly Price and quantity
Measurement error Observed \(X\) differs from true \(X^*\) Self-reported schooling

In all three cases, OLS is inconsistent — more data does not help.

Simultaneity

Goal: estimate the demand curve — how does quantity demanded respond to price?

\[ Q^d = \alpha_0 + \alpha_1 P + U^d \]

But price is not set exogenously — it is determined by the interaction of supply and demand:

\[ Q^s = \gamma_0 + \gamma_1 P + U^s \]

In equilibrium, \(Q^d = Q^s\) and \(P\) adjusts to clear the market.

The observed price \(P\) reflects both demand shocks (\(U^d\)) and supply shocks (\(U^s\)).

Simultaneity — The Naive OLS

Suppose we collect equilibrium \((P, Q)\) data and regress \(Q\) on \(P\):

The OLS slope is neither the demand slope nor the supply slope. Why?

Simultaneity — Identification

To estimate the demand curve, we need a variable that shifts only supply (e.g., input costs, weather). Conversely, to estimate the supply curve, we need one that shifts only demand (e.g., income, tastes). This is the instrumental variables idea — and historically, simultaneity was its original motivation (Wright, 1928).

Measurement Error

A different source of endogeneity — the regressor itself is measured with error.

Source Example
Self-reporting Individuals misreport income, schooling, hours worked
Proxy variables IQ score as proxy for ability
Data processing Rounding, aggregation, privacy masking

Canonical case: self-reported schooling. Wages respond to true schooling \(E^*\), but the Census records the reported value \(E = E^* + V\). People misremember, round, or inflate.

Classical vs Non-Classical ME

  • Classical ME: error is random, uncorrelated with the truth. \(\Rightarrow\) predictable bias toward zero (attenuation).
  • Non-classical ME: error is systematic or correlated with the truth (e.g., high earners underreport income). \(\Rightarrow\) bias direction depends on the structure.

We derive the classical case next.

Classical Measurement Error — Setup

Suppose the true model is:

\[ Y = \beta_0 + \beta_1 E^* + U, \quad E[U \mid E^*] = 0 \]

But we observe \(E = E^* + V\) instead of \(E^*\).

Classical measurement error assumptions:

  1. \(E[V] = 0\)
  2. \(\text{Cov}(V, E^*) = 0\) — error is uncorrelated with the truth
  3. \(\text{Cov}(V, U) = 0\) — error is uncorrelated with the structural error

OLS with Measurement Error

Substituting \(E^* = E - V\) into the true model:

\[ Y = \beta_0 + \beta_1 E + (U - \beta_1 V) \]

Define the composite error \(W = U - \beta_1 V\). Then:

\[ \text{Cov}(E, W) = \text{Cov}(E^* + V, \; U - \beta_1 V) = -\beta_1 \sigma_V^2 \neq 0 \]

  • The regressor \(E\) is correlated with the composite error \(W\).
  • ZCM fails, so OLS is inconsistent.

Attenuation Bias — Derivation

\[ \text{plim}\, \hat\beta_1 = \frac{\text{Cov}(E, Y)}{\text{Var}(E)} \]

The numerator:

\[ \text{Cov}(E, Y) = \text{Cov}(E^* + V, \; \beta_0 + \beta_1 E^* + U) = \beta_1 \sigma_{E^*}^2 \]

The denominator:

\[ \text{Var}(E) = \text{Var}(E^* + V) = \sigma_{E^*}^2 + \sigma_V^2 \]

Therefore:

\[ \text{plim}\, \hat\beta_1 = \beta_1 \cdot \underbrace{\frac{\sigma_{E^*}^2}{\sigma_{E^*}^2 + \sigma_V^2}}_{\lambda} \]

The Attenuation Factor

\[ \lambda = \frac{\sigma_{E^*}^2}{\sigma_{E^*}^2 + \sigma_V^2}, \quad 0 < \lambda < 1 \]

  • \(\lambda\) is a signal-to-total-variance ratio.
  • \(\hat\beta_1\) is biased toward zero — this is attenuation bias.
  • More noise (\(\sigma_V^2 \uparrow\)) \(\Rightarrow\) more attenuation (\(\lambda \downarrow\)).

Schooling: if half the variance in reported schooling is noise (\(\sigma_V^2 = \sigma_{E^*}^2\)), then \(\lambda = 1/2\) — OLS recovers only half of \(\beta_1\).

What About Measurement Error in \(Y\)?

Suppose instead \(Y\) is measured with error: \(\tilde{Y} = Y + \eta\), with \(E[\eta] = 0\), \(\text{Cov}(\eta, E) = 0\).

\[ \tilde{Y} = \beta_0 + \beta_1 E + (U + \eta) \]

The composite error \((U + \eta)\) is still uncorrelated with \(E\). ZCM holds.

  • OLS is consistent — the slope is unbiased.
  • But \(\text{Var}(U + \eta) > \text{Var}(U)\): larger error variance \(\Rightarrow\) less precision, wider confidence intervals.

Bottom line: ME in the regressor biases the slope. ME in \(Y\) only inflates the variance.

Bridge to IV

All three sources of endogeneity share the same structure: \(\text{Cov}(X, u) \neq 0\), and OLS is inconsistent — more data does not help.

The common solution: find a variable \(Z\) that is:

  1. Correlated with \(X\) (relevance)
  2. Uncorrelated with the error \(u\) (exogeneity)

Such a variable \(Z\) is called an instrumental variable (or instrument).

  • Omitted variables: a variable that shifts \(X\) via a channel unrelated to the confounder.
  • Simultaneity: a supply shifter (input costs, weather) is an instrument for price.
  • Measurement error: a variable correlated with the true \(X^*\), not with the noise \(V\).

IV Conditions

We seek a variable \(Z\) (the instrument) satisfying:

1. Relevance: \(\text{Cov}(Z, X) \neq 0\)

  • \(Z\) is correlated with the endogenous regressor \(X\).
  • Testable — we can check this in the data (more in 5b).

2. Exogeneity: \(\text{Cov}(Z, u) = 0\)

  • \(Z\) is uncorrelated with the error.
  • Partially testable — overidentification tests (more in 5b), but primarily argued from theory/institutional knowledge.

DAG Representation

Example: Vietnam Draft Lottery (Angrist 1990)

Background: During the Vietnam War, the U.S. drafted young men into the military. The pre-1969 Selective Service System granted deferments — most prominently for college enrollment — which fell disproportionately on the less-educated and was widely seen as unfair.

The lottery (1970–72): To equalize exposure, draft priority was assigned by random lottery on date of birth — capsules drawn live on national TV. Each date received a random sequence number (RSN) from 1 to 365; low numbers were called first. For the 1950 cohort, \(\text{RSN} \leq 195\) meant draft-eligible.

Question: What is the causal effect of military service on later civilian earnings?

  • \(Y\): civilian earnings in 1981 (10 years after)
  • \(D\): Vietnam-era veteran status (binary)
  • \(Z\): lottery eligibility (\(\text{RSN} \leq 195\), binary)

Why the Lottery as Instrument?

Relevance: Eligibility raises the probability of veteran status.

  • Eligible: 35% became veterans
  • Non-eligible: 19% became veterans
  • Effect of \(Z\) on \(D\): ~16 percentage points

Exogeneity: RSN is assigned by date of birth — literally random. No selection on ability, motivation, or family background.

Why we still need IV: not everyone eligible served (deferments, failed physicals); some non-eligible served voluntarily. So \(Z \neq D\) — the lottery shifts but does not determine veteran status.

IV Intuition

OLS comparing veterans to non-veterans uses all variation in \(D\) — including the part driven by self-selection.

IV uses only the variation in \(D\) driven by the lottery — the random part.

  • Since \(Z\) is random, that variation is exogenous by design.
  • The price: most of the variation in \(D\) is discarded (only ~16 pp is moved by \(Z\)), so IV is less efficient.

Three Natural Estimators

1. ITT (Intent-to-Treat): \(\widehat{\text{ITT}} = \bar Y_{Z=1} - \bar Y_{Z=0}\)

Effect of being eligible — clean, but not the effect of serving.

2. As-Treated: \(\hat\beta^{OLS} = \bar Y_{D=1} - \bar Y_{D=0}\)

Effect of being a veteran — but biased: \(D\) is self-selected on unobservables.

3. IV / Wald: \(\hat\beta^{IV} = \dfrac{\bar Y_{Z=1} - \bar Y_{Z=0}}{\bar D_{Z=1} - \bar D_{Z=0}}\)

ITT scaled by the effect of \(Z\) on \(D\). Why is this the right thing to compute?

IV — The Wald Ratio

Start from \(Y = \beta_0 + \beta_1 D + U\) with exogeneity \(\text{Cov}(Z, U) = 0\).

Take expectations conditional on \(Z\):

\[ E[Y \mid Z = z] = \beta_0 + \beta_1 E[D \mid Z = z] + E[U \mid Z = z] \]

Differencing \(Z=1\) and \(Z=0\), with \(E[U \mid Z=1] = E[U \mid Z=0]\) by exogeneity:

\[ E[Y \mid Z{=}1] - E[Y \mid Z{=}0] = \beta_1 \big(E[D \mid Z{=}1] - E[D \mid Z{=}0]\big) \]

\[ \beta_1 = \frac{E[Y \mid Z{=}1] - E[Y \mid Z{=}0]}{E[D \mid Z{=}1] - E[D \mid Z{=}0]} \quad \text{(Wald ratio)} \]

Angrist Numbers

White men born 1950, 1981 earnings (MHE Table 4.1.3):

Quantity Estimate SE
Mean 1981 earnings $16,461
\(\bar Y_{Z=1} - \bar Y_{Z=0}\) -$435.8 210.5
\(\bar D_{Z=1} - \bar D_{Z=0}\) 0.159 0.040
Wald -$2,741 1,324
  • Wald estimate is ~17% of mean civilian earnings.

What Does IV Estimate?

The Wald ratio identifies \(\beta_1\) — a regression coefficient. But what causal quantity does \(\beta_1\) correspond to? Under what assumptions?

  • RCT case (Lecture 1): \(D \perp\!\!\!\perp (Y(0), Y(1)) \Rightarrow \beta_1 = E[Y(1) - Y(0)] = \text{ATE}\).
  • IV case: \(D\) is endogenous — so \(\beta_1\) is not the ATE.

Preview: IV identifies the ATE for a specific subpopulation — the compliers (units whose treatment status is moved by \(Z\)). This is the Local Average Treatment Effect (LATE).

To derive this, we need potential outcomes for the treatment as well as \(Y\).

Potential Outcomes in IV

In an IV setting, both \(D\) and \(Y\) have potential values:

Potential treatment: \(D_i(z)\) = the treatment unit \(i\) would take if \(Z = z\).

Captures how \(Z\) moves \(D\) for each individual.

Potential outcome: \(Y_i(z, d)\) = the outcome unit \(i\) would have if \(Z = z\) and \(D = d\).

The two arguments allow \(Z\) to enter \(Y\) both through \(D\) and directly. For IV to be valid, \(Z\) must enter only through \(D\): \(Y_i(z, d) = Y_i(d)\) — the exclusion restriction.

The observed pair \((D_i, Y_i)\) corresponds to the realized \(Z_i\).

Compliance Types

Each unit’s \((D_i(0), D_i(1))\) classifies how it responds to \(Z\):

Type \(D(0)\) \(D(1)\) Lottery analog
Complier 0 1 Drafted; would not have served otherwise
Always-taker 1 1 Volunteer; serves regardless of lottery
Never-taker 0 0 Deferred or disqualified
Defier 1 0 Opposite (no natural analog here)
  • Types are latent: we observe \(D_i\), not \((D_i(0), D_i(1))\) separately.
  • Only compliers respond to \(Z\) — always-takers and never-takers are unmoved by the instrument.

LATE Assumptions

  1. Exogeneity (structural — on the counterfactual DGP):
    • Independence: \(Z \perp\!\!\!\perp (Y(z, d), D(z))\) for all \(z, d\). Lottery: guaranteed by randomization.
    • Exclusion: \(Y_i(z, d) = Y_i(d)\) — no direct \(Z \to Y\) channel. Lottery: violated if draft-eligible (\(Z=1\)) men enroll in college to avoid serving, shifting \(Y\) through schooling rather than \(D\).
    These structural restrictions imply the IV moment condition \(\text{Cov}(Z, U) = 0\).
  1. Relevance: \(P(D=1 \mid Z=1) \neq P(D=1 \mid Z=0)\).
  1. Monotonicity: \(D(1) \geq D(0)\) for all units — no defiers.

LATE — Statement

Theorem (Imbens & Angrist, 1994): Under Exogeneity + Relevance + Monotonicity,

\[ \frac{E[Y \mid Z=1] - E[Y \mid Z=0]}{E[D \mid Z=1] - E[D \mid Z=0]} = E[Y(1) - Y(0) \mid D(1) > D(0)] \]

  • LHS: the IV estimand for binary \(Z\) (the Wald ratio).
  • RHS: the Local Average Treatment Effect (LATE) — the ATE among compliers, i.e., units with \(D(1) > D(0)\).

LATE is the ATE for a specific subpopulation — those whose treatment is moved by the instrument — not the population ATE nor the ATT.

LATE — Proof Sketch (Numerator)

Three types by monotonicity: \(C, A, N\). Decompose:

\[ E[Y \mid Z=z] = \sum_{T \in \{C, A, N\}} P(T)\, E[Y \mid Z=z, T] \]

  • \(A\) (\(D=1\)): \(E[Y \mid Z=z, A] \underset{\text{excl.}}{=} E[Y(1) \mid Z=z, A] \underset{\text{indep.}}{=} E[Y(1) \mid A]\).
  • \(N\) (\(D=0\)): same logic → \(E[Y(0) \mid N]\).
  • \(C\) (\(D=Z\)): \(E[Y \mid Z=z, C] \underset{\text{excl.}}{=} E[Y(z) \mid Z=z, C] \underset{\text{indep.}}{=} E[Y(z) \mid C]\).

\[ E[Y \mid Z=1] - E[Y \mid Z=0] = P(C) \cdot E[Y(1) - Y(0) \mid C] \]

LATE — Proof Sketch (Denominator and Ratio)

Denominator: same type decomposition applied to \(D\):

  • \(A\): \(D=1\) always → contributes \(P(A)\) under both \(z\), cancels.
  • \(N\): \(D=0\) always → contributes \(0\), cancels.
  • \(C\): \(D=1\) when \(Z=1\), \(D=0\) when \(Z=0\) → contributes \(P(C)\).

\[ E[D \mid Z=1] - E[D \mid Z=0] = P(C) \]

Ratio:

\[ \frac{P(C) \cdot E[Y(1) - Y(0) \mid C]}{P(C)} = E[Y(1) - Y(0) \mid C] = \text{LATE} \qquad \square \]

LATE — Intuition

  • Only compliers change their treatment status in response to \(Z\).
  • They generate all the identifying variation.
  • Always-takers and never-takers are unaffected by \(Z\), so they contribute nothing.

LATE is a local effect: it applies to the complier subpopulation, not the full population.

LATE — External Validity

Different instruments move different compliers \(\Rightarrow\) potentially different LATEs.

  • External validity: LATE does not generalize unless compliers are representative.
  • Policy relevance: matches universal (ATE) or targeted (ATT) policies only when compliers coincide with the target group.
  • Cross-study: different instruments for the same treatment can give different LATEs.

Question

When is LATE = ATE?

Think about what property of individual treatment effects \(Y_i(1) - Y_i(0)\) would make the “complier-specific” distinction irrelevant.

LATE — Beyond the Binary Case

The LATE theorem is for binary \(Z\) and binary \(D\). In practice:

  • Non-binary \(D\) (e.g., years of schooling): IV identifies a weighted average of effects along the treatment ladder.
  • Non-binary \(Z\) (e.g., a multi-valued instrument): IV identifies a weighted average of pairwise LATEs.

LATE — Angrist Revisited

Who are the compliers in Angrist (1990)?

  • Men whose veteran status is shifted by the lottery — they would serve if drafted, but not voluntarily.
  • Pulled into service by a low draft number.
  • Always-takers: volunteers — would serve regardless of the lottery.
  • Never-takers: deferred or disqualified — would not serve regardless.
  • Compliers: would prefer civilian life but accepted the draft.
  • LATE \(\approx\) -17% of earnings is the effect on compliers — men with relatively better civilian alternatives.
  • The ATE (averaged over volunteers + compliers) is likely smaller in magnitude — volunteers self-selected into service, so it was (in expectation) beneficial for them.

Threats to the Story

The Wald ratio identifies \(\beta_1\) only under exogeneity: \(\text{Cov}(Z, U) = 0\). We argued this from random assignment.

But random \(Z\) alone is not enough. If \(Z\) shifts \(Y\) through channels other than \(D\), those channels live in \(U\) — and \(\text{Cov}(Z, U) \neq 0\) even when \(Z\) is random.

Lottery responses (men with low numbers had months to act):

  • Enrolled in college for student deferments (II-S).
  • Volunteered preemptively to choose their branch.
  • Marriage / dependents → hardship deferments.

These shift schooling, family timing, occupation — affecting earnings without going through veteran status.

So the IV may be biased even though \(Z\) was randomly assigned. Random assignment is necessary for exogeneity, not sufficient.

Sanity Checks (Angrist 1990)

What can we check, given these threats?

Pre-service placebo: if \(Z\) is truly random, it shouldn’t predict outcomes from before service. Computing \(\bar Y_{Z=1} - \bar Y_{Z=0}\) on 1969 earnings (pre-lottery): -$2 (SE 34.5) — statistically zero. ✓ Independence is empirically supported.

Schooling response (Angrist & Krueger 1995): the lottery raised completed schooling by ~0.1 extra years on average. The exclusion violation through schooling exists but is small in magnitude.

Stability: Angrist’s IV estimates are similar across race subgroups, birth cohorts (1950–53), and outcome years (1981–84).

None of these prove exclusion. They make the identification more credible.

Summary

When OLS fails, an instrument \(Z\) that is relevant (shifts \(X\)) and exogenous (uncorrelated with \(U\)) identifies \(\beta_1\) via the Wald ratio.

Causal interpretation (LATE): with binary \(Z, D\) + monotonicity, the Wald ratio identifies the average effect on compliers — those moved by \(Z\). Not the ATE.

  • LATE is local: applies only to compliers; different instruments give different LATEs.
  • Beyond binary \(Z\) or \(D\): IV identifies a weighted average of effects.

Exogeneity is the binding assumption — argued from theory, not testable in the just-identified case. Even random assignment doesn’t make it free.

What’s Next

Lecture 5b — IV Estimation:

  • The IV estimator: sample analog and consistency
  • 2SLS: multiple instruments and controls
  • Diagnostics: weak instruments, overidentification, endogeneity test
  • Applications