Inference
Natasha Kang
Xiamen University, Chow Institute
March, 2026
From Estimation to Inference
- In Lecture 2, we estimated \(\hat\beta_j\) and studied its properties (unbiasedness, variance).
- But an estimate alone is just a number — it doesn’t tell us whether the true \(\beta_j\) is zero, positive, or equal to some hypothesized value.
- Statistical inference uses the sampling distribution of estimators to make statements about population parameters.
- Two main tools: hypothesis tests and confidence intervals.
Hypothesis Testing
- A hypothesis is a statement about the unknown population parameter \(\beta_j\).
- Using data, we assess whether the evidence supports or contradicts this statement.
- Null hypothesis (\(H_0\)): the claim held to be true unless the data provide sufficient evidence against it.
- Alternative hypothesis (\(H_1\)): the claim we are trying to establish.
- The econometrician carries the burden of proof — must show the data provide enough evidence to reject \(H_0\) in favor of \(H_1\).
Hypothesis Testing as a Trial
| Defendant is innocent |
\(H_0\) is true |
| Defendant is guilty |
\(H_1\) is true |
| Prosecutor presents evidence |
Econometrician computes test statistic |
| Jury decides |
Reject or fail to reject \(H_0\) |
- Type I error: wrongful conviction — rejecting \(H_0\) when it is true.
- Type II error: letting a guilty person go free — failing to reject \(H_0\) when \(H_1\) is true.
The Tradeoff
The Tradeoff (cont.)
- We control Type I error by choosing how much evidence we require to reject \(H_0\).
- The significance level (\(\alpha\)) is the probability of Type I error we are willing to tolerate:
\[
P(\text{Reject } H_0 \mid H_0 \text{ is true}) = \alpha
\]
- But lowering \(\alpha\) comes at a cost: requiring stronger evidence also makes it harder to reject when \(H_1\) is true (Type II error increases).
- By convention, \(\alpha\) is set at 5% or 1%.
Steps of Hypothesis Testing
- Specify \(H_0\) and \(H_1\).
- Choose a significance level \(\alpha\).
- Define a decision rule (critical region).
- Compute the test statistic and see if it falls in the critical region.
To carry out step 3, we need to know the distribution of the test statistic under \(H_0\).
What Do We Need?
- We know \(E[\hat\beta_j] = \beta_j\) and \(\text{Var}(\hat\beta_j \mid X)\) under MLR.1–MLR.5.
- But mean and variance alone don’t determine a distribution — we need more.
- This requires one more assumption.
The Normality Assumption
MLR.6 — Normality: \(U_i \mid X \sim N(0, \sigma^2)\) for each \(i\)
- Justification: \(U\) is the sum of many unobserved factors. By the CLT, such sums are approximately normal.
- Limitations:
- How many factors? Are they independent? Is the combination additive?
- Normality may be poor in skewed or heavy-tailed settings (e.g., wages, which are bounded below).
- For binary or count outcomes, the linear model itself is often problematic for broader reasons.
- With large samples, we can relax this assumption using asymptotic approximations (Lecture 3b).
Normal Sampling Distribution
Theorem: Under MLR.1–MLR.6 (the Classical Linear Model assumptions), conditional on \(X\):
\[
\hat\beta_j \sim N\left(\beta_j, \, \text{Var}(\hat\beta_j \mid X)\right), \qquad j = 0, 1, \ldots, k
\]
- Denote \(\text{sd}(\hat\beta_j) = \sqrt{\text{Var}(\hat\beta_j \mid X)} = \sigma / \sqrt{\text{SST}_j(1 - R_j^2)}\) for the slope coefficients (\(j = 1, \ldots, k\)). Standardizing:
\[
\frac{\hat\beta_j - \beta_j}{\text{sd}(\hat\beta_j)} \sim N(0, 1)
\]
Two-Sided Test: Setup
Consider testing:
\[
H_0: \beta_j = \beta_{j,0} \qquad \text{vs.} \qquad H_1: \beta_j \neq \beta_{j,0}
\]
where \(\beta_{j,0}\) is a known value specified under \(H_0\).
- If \(\sigma^2\) were known, we could use:
\[
Z = \frac{\hat\beta_j - \beta_{j,0}}{\color{red}{\text{sd}(\hat\beta_j)}} \sim N(0, 1) \quad \text{under } H_0
\]
- Reject \(H_0\) if \(|Z| > z_{1-\alpha/2}\) (the critical value): the value chosen so that \(P(|Z| > z_{1-\alpha/2}) = \alpha\).
The \(t\)-Statistic
Since \(\sigma^2\) is unknown, we replace \(\color{red}{\text{sd}(\hat\beta_j)}\) with \(\color{blue}{\text{se}(\hat\beta_j)} = \hat\sigma / \sqrt{\text{SST}_j(1-R_j^2)}\):
\[
t = \frac{\hat\beta_j - \beta_{j,0}}{\color{blue}{\text{se}(\hat\beta_j)}}
\]
- But this substitution has a cost: \(\hat\sigma\) is itself estimated from data, introducing extra randomness. The resulting statistic is not \(N(0,1)\).
- Gosset (1908), working with small samples at the Guinness brewery under the pseudonym “Student,” showed that it follows a distribution with heavier tails — the \(t\)-distribution.
The \(t\)-Distribution
Theorem: Under the CLM assumptions (MLR.1–MLR.6):
\[
t \sim t_{n-k-1} \quad \text{under } H_0
\]
where \(k\) is the number of slope regressors (excluding the intercept), so \(\text{df} = n - k - 1\).
- Decision rule: reject \(H_0\) if \(|t| > t_{1-\alpha/2, \, n-k-1}\) (the critical value).
- This is an exact finite-sample result under MLR.1–MLR.6. In Lecture 3b, we develop large-sample alternatives that do not require normality.
Under the Alternative
What happens to the \(t\)-statistic when \(H_0\) is false?
\[
t = \frac{\hat\beta_j - \beta_{j,0}}{\text{se}(\hat\beta_j)} = \underbrace{\frac{\hat\beta_j - \beta_j}{\text{se}(\hat\beta_j)}}_{\sim \, t_{n-k-1}} + \underbrace{\frac{\beta_j - \beta_{j,0}}{\text{se}(\hat\beta_j)}}_{\text{nonzero shift}}
\]
- The \(t\)-statistic is shifted away from zero — the larger the shift, the more likely we reject.
- The probability of correctly rejecting \(H_0\) when \(H_1\) is true is called power.
- Power is higher when:
- The effect is larger: \(|\beta_j - \beta_{j,0}|\) is big
- The estimation is more precise: \(\text{se}(\hat\beta_j)\) is small
The Tradeoff, Revisited
![]()
- Moving \(c\) outward (lowering \(\alpha\)): blue area shrinks, but red area grows — fewer false rejections, more missed detections.
Power Under Different Alternatives
![]()
- The further \(\beta_j\) is from \(\beta_{j,0}\), the more the \(H_1\) distribution shifts away from \(H_0\) — and the larger the power (shaded area beyond \(c\)).
\(p\)-Values
- Rather than fixing \(\alpha\) and looking up a critical value, we can ask: how strong is the evidence against \(H_0\)?
- The \(p\)-value is the smallest significance level at which \(H_0\) would be rejected.
\[
p\text{-value} = P(|T| > |t|) \quad \text{where } T \sim t_{n-k-1}
\]
- Decision rule: reject \(H_0\) if \(p\text{-value} < \alpha\).
- The \(p\)-value summarizes the strength of evidence against \(H_0\) — smaller means stronger evidence.
Testing \(H_0: \beta_j = 0\)
- The null \(H_0: \beta_j = 0\) is the most commonly tested hypothesis.
- Software automatically reports the \(t\)-statistic and \(p\)-value for this test.
- If \(H_0: \beta_j = 0\) is rejected: \(X_j\) is statistically significant.
- If not rejected: \(X_j\) is statistically insignificant.
- Caution: failing to reject does not mean \(H_0\) is true — it may mean we lack the precision to detect the effect (low power).
Example: Determinants of College GPA
\[
\text{colGPA} = \beta_0 + \beta_1 \, \text{hsGPA} + \beta_2 \, \text{ACT} + \beta_3 \, \text{skipped} + U
\]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.38955 0.33155 4.191 4.95e-05 ***
hsGPA 0.41182 0.09367 4.396 2.19e-05 ***
ACT 0.01472 0.01056 1.393 0.16578
skipped -0.08311 0.02600 -3.197 0.00173 **
Test \(H_0: \beta_{\text{skipped}} = 0\) against \(H_1: \beta_{\text{skipped}} \neq 0\) at the 5% level. Do you reject?
- \(t = -0.08311 / 0.02600 = -3.197\)
- Critical value at 5% with 137 df: \(\approx 1.98\)
- \(|t| = 3.197 > 1.98\): reject \(H_0\). Skipping classes is statistically significant.
One-Sided Tests
When theory suggests a direction, we use a one-sided test — the entire \(\alpha\) is placed in one tail.
![]()
- Reject \(H_0\) if \(t\) falls in the shaded rejection region.
- One-sided tests have more power to detect effects in the hypothesized direction.
- Important: the direction must be chosen before seeing the data. Choosing after seeing \(\hat\beta_j\) inflates the Type I error.
Example: Wages and Experience
\[
\log(\text{wage}) = \beta_0 + \beta_1 \, \text{educ} + \beta_2 \, \text{exper} + \beta_3 \, \text{tenure} + U
\]
Estimate Std. Error t value
educ .0920 .0073 12.56
exper .0041 .0017 2.41
tenure .0221 .0031 7.13
Test whether experience has a positive effect on wages, after controlling for education and tenure.
- \(H_0: \beta_{\text{exper}} \leq 0\) vs. \(H_1: \beta_{\text{exper}} > 0\)
- \(t = 0.0041 / 0.0017 = 2.41\)
- Critical value at 5% (522 df): \(\approx 1.65\)
- \(t = 2.41 > 1.65\): reject \(H_0\).
Testing Other Values of \(\beta_j\)
The \(t\)-test works for any hypothesized value, not just zero.
In the college GPA example, test \(H_0: \beta_{\text{skipped}} = -0.1\) against a two-sided alternative.
\[
t = \frac{-0.08311 - (-0.1)}{0.026} = \frac{0.01689}{0.026} \approx 0.65
\]
- \(|t| = 0.65 < 1.98\): fail to reject. The data are consistent with each skipped class reducing GPA by 0.1 points.
Confidence Intervals
We know that under the CLM assumptions:
\[
P\!\left( |t| \leq t_{1-\alpha/2, \, n-k-1} \right) = 1 - \alpha
\]
Substituting \(t = (\hat\beta_j - \beta_j) / \text{se}(\hat\beta_j)\) and rearranging:
\[
P\!\left( \hat\beta_j - t_{1-\alpha/2, \, n-k-1} \cdot \text{se}(\hat\beta_j) \leq \beta_j \leq \hat\beta_j + t_{1-\alpha/2, \, n-k-1} \cdot \text{se}(\hat\beta_j) \right) = 1 - \alpha
\]
The \(100(1-\alpha)\%\) confidence interval for \(\beta_j\):
\[
\hat\beta_j \pm t_{1-\alpha/2, \, n-k-1} \cdot \text{se}(\hat\beta_j)
\]
Interpreting Confidence Intervals
- Interpretation: in repeated sampling, \(100(1-\alpha)\%\) of such intervals will contain the true \(\beta_j\).
- For a given sample, \(\beta_j\) is either in the interval or not — the probability is 0 or 1.
What is the relationship between a CI and a two-sided test?
- Fail to reject \(H_0: \beta_j = \beta_{j,0}\) at level \(\alpha\) \(\iff\) \(\beta_{j,0}\) lies inside the \(100(1-\alpha)\%\) CI.
- The CI is the set of all values that would not be rejected by a two-sided test.
CI Example: College GPA
95% CI for \(\beta_{\text{skipped}}\)?
\[
-0.08311 \pm 1.98 \times 0.026 = [-0.135, -0.032]
\]
- Zero is outside this interval — consistent with rejecting \(H_0: \beta_{\text{skipped}} = 0\).
- \(-0.1\) is inside this interval — consistent with failing to reject \(H_0: \beta_{\text{skipped}} = -0.1\).
Economic vs. Statistical Significance
Statistical significance tells us whether an effect is detectable — not whether it is important.
Example: 401(k) participation and firm size
\[
\text{prate} = \beta_0 + \beta_1 \, \text{mrate} + \beta_2 \, \text{age} + \beta_3 \, \text{totemp} + U
\]
where prate = participation rate (%), mrate = employer match rate, age = plan age (years), totemp = total employees.
Estimate Std. Error t value Pr(>|t|)
totemp -1.291e-04 3.666e-05 -3.521 0.000443 ***
- \(\hat\beta_3\) is highly statistically significant (\(p < 0.001\)).
- But a 10,000-employee increase reduces participation by only \(10{,}000 \times 0.00013 = 1.3\) percentage points.
- Is that economically meaningful? That depends on context — statistical significance alone doesn’t answer the question.
Testing Linear Combinations of \(\beta_j\)
Sometimes the hypothesis involves multiple parameters.
Example: Is one year at a junior college worth as much as one year at a university?
\[
\log(\text{wage}) = \beta_0 + \beta_1 \, \text{jc} + \beta_2 \, \text{univ} + \beta_3 \, \text{exper} + U
\]
\[
H_0: \beta_1 = \beta_2 \qquad \text{vs.} \qquad H_1: \beta_1 < \beta_2
\]
\[
t = \frac{\hat\beta_1 - \hat\beta_2}{\text{se}(\hat\beta_1 - \hat\beta_2)}
\]
- But \(\text{se}(\hat\beta_1 - \hat\beta_2)\) requires \(\text{Cov}(\hat\beta_1, \hat\beta_2)\), which is not always reported.
- In practice, software handles this directly (
linearHypothesis in R, test in Stata). But the idea behind it is instructive.
The Reparametrization Trick
Define \(\theta = \beta_1 - \beta_2\). Then \(H_0: \theta = 0\).
Substitute \(\beta_1 = \theta + \beta_2\) into the regression:
\[
\begin{aligned}
\log(\text{wage}) &= \beta_0 + (\theta + \beta_2) \, \text{jc} + \beta_2 \, \text{univ} + \beta_3 \, \text{exper} + U \\
&= \beta_0 + \theta \, \text{jc} + \beta_2 (\text{jc} + \text{univ}) + \beta_3 \, \text{exper} + U
\end{aligned}
\]
- Run this transformed regression. The coefficient on jc is \(\hat\theta\), and its standard error is \(\text{se}(\hat\theta)\).
- Test \(H_0: \theta = 0\) with the usual \(t\)-test — no need for covariance.
Multiple Linear Restrictions
- The \(t\)-test (and reparametrization trick) handles one restriction at a time.
- When there are multiple restrictions, reparametrization becomes cumbersome — we need a different approach.
Example: Do performance statistics matter for baseball players’ salaries?
\[
\begin{aligned}
\log(\text{salary}) = \beta_0 &+ \beta_1 \, \text{years} + \beta_2 \, \text{gamesyr} \\
&+ \beta_3 \, \text{bavg} + \beta_4 \, \text{hrunsyr} + \beta_5 \, \text{rbisyr} + U
\end{aligned}
\]
where bavg = batting average, hrunsyr = home runs/year, rbisyr = RBIs/year.
\[
H_0: \beta_3 = \beta_4 = \beta_5 = 0 \qquad \text{vs.} \qquad H_1: \text{at least one} \neq 0
\]
Why Not Use Separate \(t\)-Tests?
Can we just test \(\beta_3 = 0\), \(\beta_4 = 0\), and \(\beta_5 = 0\) individually and reject the joint null if any individual test rejects?
- This is not a valid size-\(\alpha\) test. Even if the tests were independent:
\[
P(\text{reject at least one} \mid H_0) = 1 - (1 - \alpha)^q
\]
- With \(q = 3\) tests at \(\alpha = 0.05\): \(1 - 0.95^3 \approx 0.143\) — nearly three times the nominal level.
- We need a test designed for joint hypotheses.
The \(F\)-Test: Idea
- Unrestricted model: includes all regressors.
- Restricted model: imposes \(H_0\) (e.g., drops the variables whose coefficients are set to zero).
- If \(H_0\) is true, the excluded variables don’t help explain \(Y\) — dropping them should barely worsen the fit.
- If dropping them worsens the fit substantially, that’s evidence against \(H_0\).
- How do we measure “worsening the fit”? Compare the sum of squared residuals: \(\text{SSR}_r\) vs. \(\text{SSR}_{ur}\).
The \(F\)-Statistic
\[
F = \frac{(\text{SSR}_r - \text{SSR}_{ur}) / q}{\text{SSR}_{ur} / (n - k - 1)}
\]
where \(q\) is the number of restrictions.
- \(\text{SSR}_r \geq \text{SSR}_{ur}\) always, so \(F \geq 0\).
- Under \(H_0\): excluded variables have no effect, so \(\text{SSR}_r \approx \text{SSR}_{ur}\) and \(F \approx 0\).
- Under \(H_1\): dropping them worsens fit substantially, so \(F\) is large.
Distribution and Decision Rule
Theorem: Under the CLM assumptions and \(H_0\): \(F \sim F_{q, \, n-k-1}\).
![]()
Reject \(H_0\) if \(F > c\), where \(c = F_{\alpha, \, q, \, n-k-1}\) is the critical value.
Example: MLB Salaries
| Unrestricted |
\(\log(\text{salary})\) on years, gamesyr, bavg, hrunsyr, rbisyr |
183.19 |
| Restricted |
\(\log(\text{salary})\) on years, gamesyr |
198.31 |
\(H_0: \beta_{\text{bavg}} = \beta_{\text{hrunsyr}} = \beta_{\text{rbisyr}} = 0\) (\(q = 3\), \(n - k - 1 = 347\))
\[
F = \frac{(198.31 - 183.19)/3}{183.19/347} = \frac{5.04}{0.528} = 9.55
\]
- Critical value at 5% with \((3, 347)\) df: \(2.63\)
- \(F = 9.55 > 2.63\): reject \(H_0\). Performance statistics are jointly significant.
Joint vs. Individual Significance
- The \(F\)-test can reject the joint null even when no individual \(t\)-test rejects.
- Why? Multicollinearity: correlated regressors make individual effects hard to isolate, but their joint contribution can still be large.
- This is precisely when the \(F\)-test is most valuable.
Overall Significance of a Regression
Testing whether all regressors are jointly significant:
\[
H_0: \beta_1 = \beta_2 = \cdots = \beta_k = 0
\]
- The restricted model is just the intercept (\(R^2_r = 0\)), so:
\[
F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}
\]
- Most software reports this \(F\)-statistic automatically.
\(t\)-Test vs. \(F\)-Test
| Restrictions |
Single |
Multiple |
| Distribution |
\(t_{n-k-1}\) |
\(F_{q, \, n-k-1}\) |
| Direction |
One- or two-sided |
Always one-sided (\(F \geq 0\)) |
- For a single restriction, the two are equivalent: \(t^2 = F\) and \(t^2_{n-k-1} \sim F_{1,\, n-k-1}\).
Summary
- Under the CLM assumptions (MLR.1–MLR.6), OLS estimators are normally distributed, enabling exact finite-sample inference.
- \(t\)-test: tests a single linear restriction using \(t = (\hat\beta_j - \beta_{j,0}) / \text{se}(\hat\beta_j)\).
- \(F\)-test: tests multiple linear restrictions by comparing restricted and unrestricted model fit.
- Confidence intervals: the set of values not rejected by a two-sided test.
- Statistical significance \(\neq\) economic significance — always assess the magnitude of the effect.
What’s Next
Lecture 3b — Asymptotics and Heteroskedasticity:
- Large-sample properties of OLS (consistency, asymptotic normality)
- Inference without normality
- Heteroskedasticity-robust standard errors