Xiamen University, Chow Institute
March, 2026
| Exact (finite-sample) | Asymptotic (large-sample) | |
|---|---|---|
| Assumptions | MLR.1–MLR.6 (normality) | MLR.1–MLR.5 (no normality) |
| Distributions | Exact \(t\) and \(F\) | Approximate (via CLT) |
| Sample size | Any \(n\) | Requires \(n\) large |
We need two key tools: the Law of Large Numbers and the Central Limit Theorem.
Let \(\theta_n\) be a sequence of random variables indexed by \(n\). We say \(\theta_n\) converges in probability to \(\theta\) if
\[ P(|\theta_n - \theta| \geq \varepsilon) \to 0 \quad \text{for all } \varepsilon > 0 \]
Notation: \(\theta_n \xrightarrow{p} \theta\) or \(\text{plim}\, \theta_n = \theta\).
Informally: as \(n\) grows, \(\theta_n\) is increasingly likely to be close to \(\theta\).
Let \(X_1, X_2, \ldots, X_n\) be i.i.d. with \(E[X_i] = \mu\) and \(\text{Var}(X_i) < \infty\). Then:
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{p} \mu \]
If \(\theta_n \xrightarrow{p} \theta\) and \(\phi_n \xrightarrow{p} \phi\):
Continuous mapping: if \(g(\cdot)\) is continuous at \(\theta\), then \(g(\theta_n) \xrightarrow{p} g(\theta)\).
An estimator \(\hat\theta_n\) is consistent for \(\theta\) if \(\hat\theta_n \xrightarrow{p} \theta\).
“If you can’t get it right as \(n\) goes to infinity, you shouldn’t be in this business.” — C. W. J. Granger
Theorem: Under MLR.1–MLR.4, the OLS estimator \(\hat\beta_j\) is consistent for \(\beta_j\).
Note: we do not need homoskedasticity (MLR.5) or normality (MLR.6).
Write \(\hat\beta_1 = \beta_1 + \frac{n^{-1}\sum(x_i - \bar{x})u_i}{n^{-1}\sum(x_i - \bar{x})^2}\).
Denominator: \(\frac{1}{n}\sum(x_i - \bar{x})^2 = \frac{1}{n}\sum x_i^2 - \bar{x}^2 \xrightarrow{p} E[x_i^2] - (E[x_i])^2 = \text{Var}(x_i)\)
Numerator: expand as \(\frac{1}{n}\sum x_i u_i - \bar{x}\cdot\frac{1}{n}\sum u_i\).
The proof only used \(E[x_i u_i] = 0\), not the full \(E(u \mid X) = 0\). So we can replace MLR.4 with a weaker assumption:
Example: True model is \(y = \beta_0 + \beta_1 x + \beta_2 x^2 + u\), with \(E(u \mid x) = 0\) and \(x \sim N(0,1)\).
Suppose we estimate the misspecified model: \(y = \alpha_0 + \beta_1 x + v\).
When \(\text{Cov}(x_j, u) \neq 0\), OLS is inconsistent. In the SLR case:
\[ \text{plim}\, \hat\beta_1 = \beta_1 + \frac{\text{Cov}(x_i, u_i)}{\text{Var}(x_i)} \]
True model: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v\), with \(E(v \mid x_1, x_2) = 0\).
Misspecified model omits \(x_2\): \(y = \beta_0 + \beta_1 x_1 + u\).
\[ \text{plim}\, \tilde\beta_1 = \beta_1 + \beta_2 \frac{\text{Cov}(x_1, x_2)}{\text{Var}(x_1)} \]
Same structure as finite-sample OVB, but now stated as a probability limit.
A sequence \(W_n\) converges in distribution to \(W\) if:
\[ P(W_n \leq x) \to P(W \leq x) \]
at every point \(x\) where the CDF of \(W\) is continuous.
Notation: \(W_n \xrightarrow{d} W\).
Let \(X_1, \ldots, X_n\) be i.i.d. with \(E[X_i] = \mu\) and \(\text{Var}(X_i) = \sigma^2 < \infty\). Then:
\[ \sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2) \]
If \(W_n \xrightarrow{d} W\) and \(\theta_n \xrightarrow{p} \theta\):
If \(Z_n \xrightarrow{d} N(0,1)\), then \(Z_n^2 \xrightarrow{d} \chi^2_1\).
These results let us combine convergence in probability (for consistent estimators) with convergence in distribution (from the CLT).
Theorem: Under MLR.1–MLR.4 and \(\text{Var}(u \mid X) = \sigma^2\) (MLR.5):
\[ \sqrt{n}(\hat\beta_1 - \beta_1) \xrightarrow{d} N\!\left(0, \; \frac{\sigma^2}{\text{Var}(x_i)}\right) \]
\[ \sqrt{n}(\hat\beta_1 - \beta_1) = \frac{\frac{1}{\sqrt{n}}\sum(x_i - \bar{x})u_i}{\frac{1}{n}\sum(x_i - \bar{x})^2} \]
Denominator: \(\xrightarrow{p} \text{Var}(x_i)\) (as in the consistency proof)
Numerator: expand as \(\frac{1}{\sqrt{n}}\sum(x_i - E[x_i])u_i + (E[x_i] - \bar{x})\frac{1}{\sqrt{n}}\sum u_i\).
Under homoskedasticity (MLR.5):
\[ E[(x_i - E[x_i])^2 u_i^2] = \sigma^2 \text{Var}(x_i) \]
Combining by Slutsky’s theorem:
\[ \sqrt{n}(\hat\beta_1 - \beta_1) \xrightarrow{d} N\!\left(0, \; \frac{\sigma^2}{\text{Var}(x_i)}\right) \]
In SLR, the asymptotic variance involves \(\text{Var}(x_i)\): the total variation in \(x\).
In MLR, the variation that identifies \(\beta_j\) is the variation in \(x_j\) after partialling out the other regressors.
General result (MLR.1–MLR.5):
\[ \hat\beta_j \overset{a}{\sim} N\!\left(\beta_j, \; \frac{\sigma^2}{\text{SST}_j(1 - R_j^2)}\right) \]
where \(\text{SST}_j = \sum(x_{ij} - \bar{x}_j)^2\) and \(R_j^2\) is the \(R^2\) from regressing \(x_j\) on the other regressors. The denominator \(\text{SST}_j(1 - R_j^2) = \sum \hat{r}_{ij}^2\) is the residual variation in \(x_j\).
To test \(H_0: \beta_j = \beta_{j,0}\):
\[ t = \frac{\hat\beta_j - \beta_{j,0}}{\text{se}(\hat\beta_j)} \xrightarrow{d} N(0,1) \quad \text{under } H_0 \]
The standard error \(\text{se}(\hat\beta_j)\) is called the asymptotic standard error when MLR.6 is not assumed.
| CLM (MLR.1–6) | Gauss-Markov (MLR.1–5) | |
|---|---|---|
| \(t\) and \(F\) distributions | Exact, any \(n\) | Approximate, large \(n\) |
| Normality assumed? | Yes | No |
Theorem: Under MLR.1–MLR.5, OLS is asymptotically efficient among linear estimators: no other linear, consistent, asymptotically normal estimator has a smaller asymptotic variance.
This is the large-sample analog of the Gauss-Markov theorem (BLUE).
Finite-sample (MLR.1–MLR.6):
Large-sample (MLR.1–MLR.5):
The key tradeoff: we drop normality (MLR.6), but results are now approximations that require large \(n\).
All results above assumed MLR.5 (homoskedasticity):
\[ \text{Var}(u \mid X) = \sigma^2 \]
What if MLR.5 fails?
If the error variance depends on \(X\):
\[ \text{Var}(u \mid X) = \sigma^2(X) \]
the errors are heteroskedastic.
Under heteroskedasticity (MLR.5 fails, MLR.1–4 hold):
Still valid:
No longer valid:
The asymptotic variance of \(\hat\beta_1\) (SLR) is:
\[ V_1 = \frac{E[(x_i - E[x_i])^2 u_i^2]}{[\text{Var}(x_i)]^2} \]
Under homoskedasticity, \(E[u_i^2 \mid x_i] = \sigma^2\), so we can simplify:
\[ \begin{aligned} E[(x_i - E[x_i])^2 u_i^2] &= E\!\big[(x_i - E[x_i])^2\, E[u_i^2 \mid x_i]\big] \\ &= \sigma^2\, E[(x_i - E[x_i])^2] = \sigma^2 \text{Var}(x_i) \end{aligned} \]
and \(V_1 = \sigma^2 / \text{Var}(x_i)\).
Under heteroskedasticity, \(E[u_i^2 \mid x_i]\) varies with \(x_i\), so \(\sigma^2\) cannot be pulled out. The usual formula understates or overstates the true variance, leading to invalid inference.
Idea: estimate the general \(V_1\) directly, without assuming homoskedasticity.
In SLR:
\[ \hat{V}_1^{HC} = \frac{n^{-1}\sum(x_i - \bar{x})^2 \hat{u}_i^2}{\left(n^{-1}\text{SST}_x\right)^2} \]
The resulting standard errors are called heteroskedasticity-robust (or HC) standard errors (White, 1980).
With robust standard errors, we form the \(t\)-statistic as before:
\[ t = \frac{\hat\beta_j - \beta_{j,0}}{\text{se}^{HC}(\hat\beta_j)} \]
Caveat: robust standard errors rely on large-sample theory. In small samples, they can be unreliable.
\[ \begin{aligned} \widehat{\log(\text{wage})} = \underset{(.105)\; [.107]}{-1.28} &+ \underset{(.0075)\; [.0078]}{.0904}\; \text{educ} + \underset{(.0052)\; [.0050]}{.0410}\; \text{exper} \\ &- \underset{(.0001)\; [.0001]}{.0007}\; \text{exper}^2 \end{aligned} \]
Usual standard errors in \((\cdot)\), robust in \([\cdot]\).
Heteroskedasticity is the norm in cross-sectional data, not the exception.
Lecture 4a — Functional Form & Scaling: