Xiamen University, Chow Institute
May, 2026
All three violate the zero conditional mean assumption \(E[u \mid X] \neq 0\):
| Source | Description | Example |
|---|---|---|
| Omitted variables | Unobserved confounder in \(u\) | Ability in wage equation |
| Simultaneity | \(X\) and \(Y\) determined jointly | Price and quantity |
| Measurement error | Observed \(X\) differs from true \(X^*\) | Self-reported schooling |
In all three cases, OLS is inconsistent — more data does not help.
Goal: estimate the demand curve — how does quantity demanded respond to price?
\[ Q^d = \alpha_0 + \alpha_1 P + U^d \]
But price is not set exogenously — it is determined by the interaction of supply and demand:
\[ Q^s = \gamma_0 + \gamma_1 P + U^s \]
In equilibrium, \(Q^d = Q^s\) and \(P\) adjusts to clear the market.
The observed price \(P\) reflects both demand shocks (\(U^d\)) and supply shocks (\(U^s\)).
Suppose we collect equilibrium \((P, Q)\) data and regress \(Q\) on \(P\):
The OLS slope is neither the demand slope nor the supply slope. Why?
To estimate the demand curve, we need a variable that shifts only supply (e.g., input costs, weather). Conversely, to estimate the supply curve, we need one that shifts only demand (e.g., income, tastes). This is the instrumental variables idea — and historically, simultaneity was its original motivation (Wright, 1928).
A different source of endogeneity — the regressor itself is measured with error.
| Source | Example |
|---|---|
| Self-reporting | Individuals misreport income, schooling, hours worked |
| Proxy variables | IQ score as proxy for ability |
| Data processing | Rounding, aggregation, privacy masking |
Canonical case: self-reported schooling. Wages respond to true schooling \(E^*\), but the Census records the reported value \(E = E^* + V\). People misremember, round, or inflate.
We derive the classical case next.
Suppose the true model is:
\[ Y = \beta_0 + \beta_1 E^* + U, \quad E[U \mid E^*] = 0 \]
But we observe \(E = E^* + V\) instead of \(E^*\).
Classical measurement error assumptions:
Substituting \(E^* = E - V\) into the true model:
\[ Y = \beta_0 + \beta_1 E + (U - \beta_1 V) \]
Define the composite error \(W = U - \beta_1 V\). Then:
\[ \text{Cov}(E, W) = \text{Cov}(E^* + V, \; U - \beta_1 V) = -\beta_1 \sigma_V^2 \neq 0 \]
\[ \text{plim}\, \hat\beta_1 = \frac{\text{Cov}(E, Y)}{\text{Var}(E)} \]
The numerator:
\[ \text{Cov}(E, Y) = \text{Cov}(E^* + V, \; \beta_0 + \beta_1 E^* + U) = \beta_1 \sigma_{E^*}^2 \]
The denominator:
\[ \text{Var}(E) = \text{Var}(E^* + V) = \sigma_{E^*}^2 + \sigma_V^2 \]
Therefore:
\[ \text{plim}\, \hat\beta_1 = \beta_1 \cdot \underbrace{\frac{\sigma_{E^*}^2}{\sigma_{E^*}^2 + \sigma_V^2}}_{\lambda} \]
\[ \lambda = \frac{\sigma_{E^*}^2}{\sigma_{E^*}^2 + \sigma_V^2}, \quad 0 < \lambda < 1 \]
Schooling: if half the variance in reported schooling is noise (\(\sigma_V^2 = \sigma_{E^*}^2\)), then \(\lambda = 1/2\) — OLS recovers only half of \(\beta_1\).
Suppose instead \(Y\) is measured with error: \(\tilde{Y} = Y + \eta\), with \(E[\eta] = 0\), \(\text{Cov}(\eta, E) = 0\).
\[ \tilde{Y} = \beta_0 + \beta_1 E + (U + \eta) \]
The composite error \((U + \eta)\) is still uncorrelated with \(E\). ZCM holds.
Bottom line: ME in the regressor biases the slope. ME in \(Y\) only inflates the variance.
All three sources of endogeneity share the same structure: \(\text{Cov}(X, u) \neq 0\), and OLS is inconsistent — more data does not help.
The common solution: find a variable \(Z\) that is:
Such a variable \(Z\) is called an instrumental variable (or instrument).
We seek a variable \(Z\) (the instrument) satisfying:
1. Relevance: \(\text{Cov}(Z, X) \neq 0\)
2. Exogeneity: \(\text{Cov}(Z, u) = 0\)
Background: During the Vietnam War, the U.S. drafted young men into the military. The pre-1969 Selective Service System granted deferments — most prominently for college enrollment — which fell disproportionately on the less-educated and was widely seen as unfair.
The lottery (1970–72): To equalize exposure, draft priority was assigned by random lottery on date of birth — capsules drawn live on national TV. Each date received a random sequence number (RSN) from 1 to 365; low numbers were called first. For the 1950 cohort, \(\text{RSN} \leq 195\) meant draft-eligible.
Question: What is the causal effect of military service on later civilian earnings?
Relevance: Eligibility raises the probability of veteran status.
Exogeneity: RSN is assigned by date of birth — literally random. No selection on ability, motivation, or family background.
Why we still need IV: not everyone eligible served (deferments, failed physicals); some non-eligible served voluntarily. So \(Z \neq D\) — the lottery shifts but does not determine veteran status.
OLS comparing veterans to non-veterans uses all variation in \(D\) — including the part driven by self-selection.
IV uses only the variation in \(D\) driven by the lottery — the random part.
1. ITT (Intent-to-Treat): \(\widehat{\text{ITT}} = \bar Y_{Z=1} - \bar Y_{Z=0}\)
Effect of being eligible — clean, but not the effect of serving.
2. As-Treated: \(\hat\beta^{OLS} = \bar Y_{D=1} - \bar Y_{D=0}\)
Effect of being a veteran — but biased: \(D\) is self-selected on unobservables.
3. IV / Wald: \(\hat\beta^{IV} = \dfrac{\bar Y_{Z=1} - \bar Y_{Z=0}}{\bar D_{Z=1} - \bar D_{Z=0}}\)
ITT scaled by the effect of \(Z\) on \(D\). Why is this the right thing to compute?
Start from \(Y = \beta_0 + \beta_1 D + U\) with exogeneity \(\text{Cov}(Z, U) = 0\).
Take expectations conditional on \(Z\):
\[ E[Y \mid Z = z] = \beta_0 + \beta_1 E[D \mid Z = z] + E[U \mid Z = z] \]
Differencing \(Z=1\) and \(Z=0\), with \(E[U \mid Z=1] = E[U \mid Z=0]\) by exogeneity:
\[ E[Y \mid Z{=}1] - E[Y \mid Z{=}0] = \beta_1 \big(E[D \mid Z{=}1] - E[D \mid Z{=}0]\big) \]
\[ \beta_1 = \frac{E[Y \mid Z{=}1] - E[Y \mid Z{=}0]}{E[D \mid Z{=}1] - E[D \mid Z{=}0]} \quad \text{(Wald ratio)} \]
White men born 1950, 1981 earnings (MHE Table 4.1.3):
| Quantity | Estimate | SE |
|---|---|---|
| Mean 1981 earnings | $16,461 | — |
| \(\bar Y_{Z=1} - \bar Y_{Z=0}\) | -$435.8 | 210.5 |
| \(\bar D_{Z=1} - \bar D_{Z=0}\) | 0.159 | 0.040 |
| Wald | -$2,741 | 1,324 |
The Wald ratio identifies \(\beta_1\) — a regression coefficient. But what causal quantity does \(\beta_1\) correspond to? Under what assumptions?
Preview: IV identifies the ATE for a specific subpopulation — the compliers (units whose treatment status is moved by \(Z\)). This is the Local Average Treatment Effect (LATE).
To derive this, we need potential outcomes for the treatment as well as \(Y\).
In an IV setting, both \(D\) and \(Y\) have potential values:
Potential treatment: \(D_i(z)\) = the treatment unit \(i\) would take if \(Z = z\).
Captures how \(Z\) moves \(D\) for each individual.
Potential outcome: \(Y_i(z, d)\) = the outcome unit \(i\) would have if \(Z = z\) and \(D = d\).
The two arguments allow \(Z\) to enter \(Y\) both through \(D\) and directly. For IV to be valid, \(Z\) must enter only through \(D\): \(Y_i(z, d) = Y_i(d)\) — the exclusion restriction.
The observed pair \((D_i, Y_i)\) corresponds to the realized \(Z_i\).
Each unit’s \((D_i(0), D_i(1))\) classifies how it responds to \(Z\):
| Type | \(D(0)\) | \(D(1)\) | Lottery analog |
|---|---|---|---|
| Complier | 0 | 1 | Drafted; would not have served otherwise |
| Always-taker | 1 | 1 | Volunteer; serves regardless of lottery |
| Never-taker | 0 | 0 | Deferred or disqualified |
| Defier | 1 | 0 | Opposite (no natural analog here) |
Theorem (Imbens & Angrist, 1994): Under Exogeneity + Relevance + Monotonicity,
\[ \frac{E[Y \mid Z=1] - E[Y \mid Z=0]}{E[D \mid Z=1] - E[D \mid Z=0]} = E[Y(1) - Y(0) \mid D(1) > D(0)] \]
LATE is the ATE for a specific subpopulation — those whose treatment is moved by the instrument — not the population ATE nor the ATT.
Three types by monotonicity: \(C, A, N\). Decompose:
\[ E[Y \mid Z=z] = \sum_{T \in \{C, A, N\}} P(T)\, E[Y \mid Z=z, T] \]
\[ E[Y \mid Z=1] - E[Y \mid Z=0] = P(C) \cdot E[Y(1) - Y(0) \mid C] \]
Denominator: same type decomposition applied to \(D\):
\[ E[D \mid Z=1] - E[D \mid Z=0] = P(C) \]
Ratio:
\[ \frac{P(C) \cdot E[Y(1) - Y(0) \mid C]}{P(C)} = E[Y(1) - Y(0) \mid C] = \text{LATE} \qquad \square \]
LATE is a local effect: it applies to the complier subpopulation, not the full population.
Different instruments move different compliers \(\Rightarrow\) potentially different LATEs.
When is LATE = ATE?
Think about what property of individual treatment effects \(Y_i(1) - Y_i(0)\) would make the “complier-specific” distinction irrelevant.
The LATE theorem is for binary \(Z\) and binary \(D\). In practice:
Who are the compliers in Angrist (1990)?
The Wald ratio identifies \(\beta_1\) only under exogeneity: \(\text{Cov}(Z, U) = 0\). We argued this from random assignment.
But random \(Z\) alone is not enough. If \(Z\) shifts \(Y\) through channels other than \(D\), those channels live in \(U\) — and \(\text{Cov}(Z, U) \neq 0\) even when \(Z\) is random.
Lottery responses (men with low numbers had months to act):
These shift schooling, family timing, occupation — affecting earnings without going through veteran status.
So the IV may be biased even though \(Z\) was randomly assigned. Random assignment is necessary for exogeneity, not sufficient.
What can we check, given these threats?
Pre-service placebo: if \(Z\) is truly random, it shouldn’t predict outcomes from before service. Computing \(\bar Y_{Z=1} - \bar Y_{Z=0}\) on 1969 earnings (pre-lottery): -$2 (SE 34.5) — statistically zero. ✓ Independence is empirically supported.
Schooling response (Angrist & Krueger 1995): the lottery raised completed schooling by ~0.1 extra years on average. The exclusion violation through schooling exists but is small in magnitude.
Stability: Angrist’s IV estimates are similar across race subgroups, birth cohorts (1950–53), and outcome years (1981–84).
None of these prove exclusion. They make the identification more credible.
When OLS fails, an instrument \(Z\) that is relevant (shifts \(X\)) and exogenous (uncorrelated with \(U\)) identifies \(\beta_1\) via the Wald ratio.
Causal interpretation (LATE): with binary \(Z, D\) + monotonicity, the Wald ratio identifies the average effect on compliers — those moved by \(Z\). Not the ATE.
Exogeneity is the binding assumption — argued from theory, not testable in the just-identified case. Even random assignment doesn’t make it free.
Lecture 5b — IV Estimation: