Panel Regression

Natasha Kang

Xiamen University, Chow Institute

May, 2026

When Cross-Sectional Tools Aren’t Enough

Two cross-sectional levers from earlier:

Lect4: condition on observable confounders.
Lect5: use exogenous variation via instruments.

Both fail when the confounder is unobserved and no valid instrument exists.

Canonical example — returns to schooling: ability is unobserved and correlated with schooling. OLS is biased; no clean instrument is universally available. Need a different route.

A New Source of Variation

When the same unit is observed at multiple periods, we can compare the unit to itself:

$X_{it}$ takes different values across periods $t$, for fixed unit $i$.
$Y_{it}$ moves with it.
Anything fixed about unit $i$ — observed or not — is held constant by construction.

Payoff: no instrument needed; no requirement to know what the confounder is.

Requires: assumptions on how $X_{it}$ and $\varepsilon_{it}$ co-move within unit over time — shapes the rest of the lecture.

Potential Outcomes

For each unit-period $(i, t)$, the potential outcome $Y_{it}(x)$ is the value $Y$ would take if the regressor were set to $x$:

\[ Y_{it}(x) \;=\; \text{unit } i\text{'s outcome at time } t \text{ when } X_{it} = x \]

Notation:

Capital $X_{it}$ — the realized random regressor.
Lowercase $x$ — a hypothetical value.
$Y_{it} = Y_{it}(X_{it})$ — the observed outcome is the PO evaluated at the realized $X_{it}$.

Cross-section: at a single $t$, we see only one PO per unit. Every other value of $x$ is counterfactual — the fundamental problem of causal inference.

Panel: across periods, the same unit is observed at different realized $X$ values — $Y_{it}(X_{it})$ and $Y_{it'}(X_{it'})$. In a sense, we observe both potential outcomes for unit $i$ — just at different times.

A Linear PO Assumption

Assume the PO function is linear and additively separable:

\[ Y_{it}(x) \;=\; \alpha_i \;+\; \lambda_t \;+\; x'\beta \;+\; \varepsilon_{it} \]

$\alpha_i$ — unit-fixed term: every time-invariant feature of individual $i$, observed or unobserved (ability, gender, family background, …).
$\lambda_t$ — time-fixed term: every unit-invariant feature of period $t$ (national price level, common macro shock, period-specific common event).
$\beta$ — the causal slope in $x$, common across units and periods.
$\varepsilon_{it}$ — idiosyncratic disturbance, varying across $i$ and $t$.

The Empirical Model

Setting $x = X_{it}$ in the linear PO function:

\[ Y_{it} \;=\; \alpha_i + \lambda_t + X_{it}'\beta + \varepsilon_{it} \]

$X_{it}$ is allowed to correlate with both:

$\alpha_i$ — unit-level fixed factors.
$\lambda_t$ — common time shocks.

In a cross-section, $\alpha_i$ would sit inside the error, requiring $\mathrm{Cov}(X_i, \alpha_i) = 0$.

Estimating $\beta$

The parameter we want is $\beta$ — the causal slope. $\alpha_i$ and $\lambda_t$ are unobserved nuisance terms in the regression.

Standard approach — Two-Way Fixed Effects (TWFE), via the within transformation:

Demean $Y, X$ by unit and period means.
$\alpha_i$ and $\lambda_t$ are eliminated from the regression.
Leaves an estimating equation in $\beta$ alone.

The Within Transformation

Take averages of both sides of $Y_{it} = \alpha_i + \lambda_t + X_{it}'\beta + \varepsilon_{it}$:

Unit mean: $\;\;\bar Y_i \;=\; \alpha_i + \bar\lambda + \bar X_i'\beta + \bar\varepsilon_i$
Period mean: $\;\;\bar Y_t \;=\; \bar\alpha + \lambda_t + \bar X_t'\beta + \bar\varepsilon_t$
Grand mean: $\;\;\bar Y \;=\; \bar\alpha + \bar\lambda + \bar X'\beta + \bar\varepsilon$

Combine $\tilde Y_{it} \equiv Y_{it} - \bar Y_i - \bar Y_t + \bar Y$ — $\alpha$ and $\lambda$ terms cancel exactly:

\[ \tilde Y_{it} \;=\; \tilde X_{it}'\beta + \tilde\varepsilon_{it} \]

TWFE Estimator: Formal Derivation

Apply OLS to the double-demeaned model:

\[ \hat\beta_{\text{TWFE}} \;=\; \left(\sum_{i,t} \tilde X_{it}\tilde X_{it}'\right)^{-1} \sum_{i,t} \tilde X_{it}\tilde Y_{it} \]

Substitute $\tilde Y_{it} = \tilde X_{it}'\beta + \tilde\varepsilon_{it}$:

\[ \hat\beta_{\text{TWFE}} - \beta \;=\; \left(\sum_{i,t} \tilde X_{it}\tilde X_{it}'\right)^{-1} \sum_{i,t} \tilde X_{it}\tilde\varepsilon_{it} \]

Whether $\hat\beta_{\text{TWFE}} \to_p \beta$ depends on the behavior of these two sums. The substantive condition is the identifying assumption — strict exogeneity, next.

Strict Exogeneity

The identifying assumption for $\beta$:

\[ E[\varepsilon_{it} \mid X_{i1}, \dots, X_{iT}, \alpha_i, \lambda_t] = 0 \]

The residual $\varepsilon_{it}$ is mean-zero conditional on the entire $X$ trajectory and the FEs.

Why “strict”? Unit-demeaning gives

\[ \tilde\varepsilon_{it} \;=\; \varepsilon_{it} - \frac{1}{T}\sum_{s=1}^{T}\varepsilon_{is} \]

— a linear combination of errors at every period. So $E[\tilde X_{it}\tilde\varepsilon_{it}] = 0$ requires $E[X_{ir}\varepsilon_{is}] = 0$ at every pair $(r, s)$.

When Strict Exogeneity Fails

The natural condition $E[\varepsilon_{it} \mid X_{it}, \alpha_i, \lambda_t] = 0$ can hold while strict exogeneity fails — when $X$ at other periods correlates with $\varepsilon_{it}$:

Lagged dependent variable: $X_{it} = Y_{i,t-1}$ depends on $\varepsilon_{i,t-1}$ → $\hat\beta_{\text{TWFE}}$ inconsistent (Nickell bias).
Feedback / Ashenfelter dip: treatment adoption $X_{i,t+1}$ responds to past outcomes (and hence past errors).
Anticipation: future $X$ enters today’s $\varepsilon$ via expectations.

TWFE Estimator: Consistency

Fixed $T$, $N \to \infty$, units i.i.d. The double sum is an average over i.i.d. units:

\[ \frac{1}{NT}\sum_{i,t}\tilde X_{it}\tilde X_{it}' \;=\; \frac{1}{N}\sum_i \;\frac{1}{T}\sum_t \tilde X_{it}\tilde X_{it}' \]

LLN applies to the outer sum (over $i$):

Rank: $\frac{1}{NT}\sum_{i,t}\tilde X_{it}\tilde X_{it}' \;\to_p\; Q_{\tilde X \tilde X} \;\equiv\; E\!\left[\frac{1}{T}\sum_t \tilde X_{it}\tilde X_{it}'\right]$, positive definite by assumption.
Orthogonality: $\frac{1}{NT}\sum_{i,t}\tilde X_{it}\tilde\varepsilon_{it} \;\to_p\; E\!\left[\frac{1}{T}\sum_t \tilde X_{it}\tilde\varepsilon_{it}\right] \;=\; 0$ under strict exogeneity.

By Slutsky: $\;\;\hat\beta_{\text{TWFE}} \;\xrightarrow{p}\; \beta$.

TWFE Estimator: Asymptotic Normality

By a CLT over units (fixed $T$, $N \to \infty$):

\[ \sqrt{N}(\hat\beta_{\text{TWFE}} - \beta) \;\xrightarrow{d}\; \mathcal{N}\!\left(0,\; Q_{\tilde X \tilde X}^{-1}\,\Omega\,Q_{\tilde X \tilde X}^{-1}\right) \]

The “meat” of the sandwich:

\[ \Omega \;\equiv\; \mathrm{Var}\!\left(\frac{1}{\sqrt N}\sum_{i,t} \tilde X_{it}\tilde\varepsilon_{it}\right) \;=\; E\!\left[\Bigl(\sum_t \tilde X_{it}\tilde\varepsilon_{it}\Bigr)\Bigl(\sum_t \tilde X_{it}\tilde\varepsilon_{it}\Bigr)'\right] \]

(Last equality: i.i.d. across units.) Estimating $\Omega$ correctly is the next slide.

Decomposing $\Omega$

Recall (Asymptotic Normality slide):

\[ \Omega \;=\; E\!\left[\Bigl(\sum_t \tilde X_{it}\tilde\varepsilon_{it}\Bigr)\Bigl(\sum_t \tilde X_{it}\tilde\varepsilon_{it}\Bigr)'\right] \]

Expand the outer sum-product:

\[ \Omega \;=\; \underbrace{\sum_{t} E[\tilde\varepsilon_{it}^2\, \tilde X_{it}\tilde X_{it}']}_{\text{same-period (within } (i,t)\text{)}} \;+\; \underbrace{\sum_{t \neq s} E[\tilde\varepsilon_{it}\tilde\varepsilon_{is}\, \tilde X_{it}\tilde X_{is}']}_{\text{cross-period (within unit } i\text{)}} \]

Within a unit, residuals $\tilde\varepsilon_{i,1}, \dots, \tilde\varepsilon_{i,T}$ are typically serially correlated — shocks persist over time. The cross-period sum is non-zero.

Naive vs. Cluster-Robust

Naive SE treats $(i,t)$ observations as i.i.d. — sets the cross-period sum to zero, captures only the same-period sum:

\[ \hat\Omega_{\text{naive}} \;=\; \frac{1}{NT}\sum_{i,t}\hat\varepsilon_{it}^2\, \tilde X_{it}\tilde X_{it}' \]

Under serial correlation: underestimates $\Omega$.

Cluster-robust at unit level — keeps both sums:

\[ \hat\Omega_{\text{cluster}} \;=\; \frac{1}{N}\sum_i \Bigl(\sum_t \tilde X_{it}\hat\varepsilon_{it}\Bigr)\Bigl(\sum_t \tilde X_{it}\hat\varepsilon_{it}\Bigr)' \]

with $\hat\varepsilon_{it} = \tilde Y_{it} - \tilde X_{it}'\hat\beta_{\text{TWFE}}$.

Bertrand, Duflo & Mullainathan (2004 QJE): in panel-FE/DiD settings, ignoring within-unit correlation causes nominal 5% tests to reject 45%+ of the time.

Which $X$’s Can TWFE Identify?

TWFE includes $\alpha_i$ and $\lambda_t$. These absorb:

Anything that’s a function of $i$ alone (constant within unit).
Anything that’s a function of $t$ alone (constant within period).

Implication for the choice of $X$:

Time-invariant within unit (gender, race, baseline characteristics): absorbed by $\alpha_i$. Cannot be identified in TWFE — its coefficient drops out with the FE.
Common across units in each period (national-level policy, macro shock that hits everyone): absorbed by $\lambda_t$. Cannot be identified.
Two-way varying (varies across both $i$ and $t$): identifiable. This is what TWFE can estimate.

Continuous Treatment: Autor-Dorn-Hanson (ADH) China Shock

Research question (ADH 2013): Did the rise in Chinese import competition between 1990 and 2007 reduce manufacturing employment in U.S. local labor markets exposed to it?

What is a local labor market?

A geographic area where workers can take jobs without relocating.
The scope where displaced workers search and employers recruit.

Commuting zones (CZs) operationalize this:

Groups of counties merged by commuting flows; ~700 cover the U.S.
Capture the boundary of where workers actually look for jobs.
Differ in pre-1990 industrial mix → variation in exposure to Chinese imports.

ADH: Setup

Unit $c$: commuting zone. Period $t$: 1990–2007.
Outcome $L_{ct}$: manufacturing employment share.
Treatment $\text{ChinaShock}_{ct}$: continuous, shift-share construction.

Shift-share treatment: sum across industries $j$ of (local share) $\times$ (national shift):

\[ \text{ChinaShock}_{ct} \;=\; \sum_{j} \underbrace{\frac{L_{cjt-1}}{L_{ct-1}}}_{\text{share}_{cj}} \cdot \underbrace{\frac{\Delta M_{j,\text{China}\to\text{US},t}}{L_{j,t-1}}}_{\text{shift}_{jt}} \]

share$_{cj}$: industry $j$’s employment share in CZ $c$ at baseline. Varies across CZs; fixed in $t$.
shift$_{jt}$: national change in Chinese imports in industry $j$, per worker. Varies across industries and years; common to all CZs.

A Pedagogical TWFE Adaptation

We now write down a yearly TWFE specification for the ADH setup:

\[ L_{ct} = \alpha_c + \lambda_t + \beta \cdot \text{ChinaShock}_{ct} + \varepsilon_{ct} \]

$\alpha_c$: CZ fixed effect — absorbs time-invariant CZ characteristics.
$\lambda_t$: year fixed effect — absorbs national-year shocks common to all CZs.
$\beta$: average causal effect of one extra unit of import exposure on the manufacturing employment share.

Honest disclosure: ADH (2013) does not run this yearly TWFE. They use stacked long-differences (1990→2000, 2000→2007). The framework below is an adaptation we use to illustrate panel-regression machinery — same identifying logic, different estimation.

Within-Transformed Treatment

Shorthand: $s_{cj} \equiv \text{share}_{cj}$, $x_{jt} \equiv \text{shift}_{jt}$, $C_{ct} \equiv \text{ChinaShock}_{ct} = \sum_j s_{cj}\, x_{jt}$.

Three means (using $s_{cj}$ fixed in $t$, $x_{jt}$ fixed in $c$):

\[ \bar C_c = \sum_j s_{cj}\,\bar x_j, \quad \bar C_t = \sum_j \bar s_j\, x_{jt}, \quad \bar C = \sum_j \bar s_j\,\bar x_j \]

Within transformation:

\[ \begin{aligned} \widetilde C_{ct} \;=\; C_{ct} - \bar C_c - \bar C_t + \bar C &= \sum_j \big[\,s_{cj}\,x_{jt} - s_{cj}\,\bar x_j - \bar s_j\,x_{jt} + \bar s_j\,\bar x_j\,\big] \\ &= \sum_j (s_{cj} - \bar s_j)(x_{jt} - \bar x_j) \end{aligned} \]

What Identifies $\beta$

\[ \widetilde C_{ct} \;=\; \sum_j (s_{cj} - \bar s_j)(x_{jt} - \bar x_j) \]

$\widetilde C_{ct}$ is the idiosyncratic part of CZ-$c$’s exposure in year $t$ — what’s left after netting out CZ-$c$’s average across years and year-$t$’s average across CZs.

Where does the variation come from?

Across years: in year $t$, some industries had unusually large national shifts (large $x_{jt} - \bar x_j$).
Across CZs: CZ $c$ is over-exposed to certain industries relative to the average CZ (large $s_{cj} - \bar s_j$).

By FWL, $\hat\beta$ is the OLS slope of $\widetilde L_{ct}$ on $\widetilde C_{ct}$ across the panel — how much idiosyncratic employment moves per unit of idiosyncratic exposure.

What Strict Exogeneity Requires

For $\hat\beta$ to be consistent, both pieces of $C_{ct} = \sum_j s_{cj} x_{jt}$ must be uncorrelated with the error:

shares $s_{cj}$ (baseline industry mix) uncorrelated with CZ-year shocks $\varepsilon_{ct}$,
shifts $x_{jt}$ (national industry shocks) uncorrelated with $\varepsilon_{ct}$,

given the FE.

ADH defend each separately:

Shares: measured at $t-1=1990$ — predetermined relative to later $\varepsilon_{ct}$.
Shifts: are they exogenous? $\to$ next slide.

Are the Shifts Exogenous?

If some unobserved $U$ affects both the shifter $x_{jt}$ and the CZ-year shock $\varepsilon_{ct}$, then $x_{jt}$ is correlated with $\varepsilon_{ct}$ — strict exogeneity fails.

Question: what could $U$ plausibly be? $\to$ next slide.

Why the Shifts Aren’t Exogenous

$U$ = US-side demand shocks. When industry $j$ weakens domestically (shifting tastes, productivity slowdown, etc.):

US firms can’t compete $\Rightarrow$ imports fill the gap $\Rightarrow$ raises $x_{jt}$.
Employment in CZs exposed to industry $j$ falls for non-China reasons $\Rightarrow$ shows up in $\varepsilon_{ct}$.

So $\Delta M_{j,\text{CN}\to\text{US},t}$ confounds Chinese supply (what we want — productivity, WTO entry, capacity expansion) with US demand (the confounder).

ADH: The Instrument

ADH IV: keep the same shift-share construction, but replace the shifter — use Chinese imports to other high-income countries instead of US:

\[ \text{ChinaShock}^{\text{IV}}_{ct} \;=\; \sum_j s_{cj} \cdot \frac{\Delta M_{j,\text{CN}\to\text{other},t}}{L_{j,t-1}} \]

(8 countries: Australia, Denmark, Finland, Germany, Japan, New Zealand, Spain, Switzerland.)

Relevance: same Chinese supply surge hits all 8 destinations $\Rightarrow$ correlated with the US shifter.
Exclusion: other countries’ imports are not driven by US-specific demand shocks $\Rightarrow$ purges the demand confound.

ADH: Controls Strengthen Identification

ADH’s preferred specification adds, interacted with the period dummy:

Census division $\times$ period — region-specific time trends.
Baseline CZ characteristics $\times$ period — % college, % female, % manufacturing employment, % foreign-born — measured at a single pre-period baseline.

Why pre-period?

Predetermined: values fixed before the shock arrives.
Using later values would mix in the outcome of treatment — they’d absorb the very effect we want to estimate (“bad controls”).
Same logic as fixing shares $s_{cj}$ at baseline.

We’ll write these as $W_c^{\text{pre}\prime}\gamma_t$ — interactions of $W^{\text{pre}}$ with period dummies, with a separate coefficient vector per period. Formalized in the next section.

Why Add $W^{\text{pre}}$ Controls?

TWFE assumes the time path $\lambda_t$ is common to all CZs.
Strict exogeneity (given $\alpha_c, \lambda_t$) then requires the error to be mean-zero given $X$.

Problem: CZs with different baseline characteristics may follow different time paths.

A common $\lambda_t$ is misspecified.
$\varepsilon_{ct}$ inherits CZ-specific time variation correlated with $X_{ct}$ — strict exogeneity fails.

Resolution: let $\lambda_t$ depend on a baseline characteristic $W_c^{\text{pre}}$.

Stratifies the time path by pre-period composition.
Relaxes strict exogeneity to a conditional version — exogeneity within strata of $W^{\text{pre}}$.
Formalized next.

General Form: $W^{\text{pre}}$ Interactions

For continuous or multi-dimensional baseline characteristic $W_c^{\text{pre}}$:

\[ Y_{ct} \;=\; \alpha_c + \underbrace{\lambda_t + W_c^{\text{pre}\prime}\gamma_t}_{\text{CZ-specific time path}} + X_{ct}'\beta + \varepsilon_{ct} \]

$\gamma_t \in \mathbb{R}^K$: a period-specific coefficient vector on $W^{\text{pre}}$.
$W_c^{\text{pre}\prime}\gamma_t$ is CZ $c$’s time-$t$ tilt away from the common $\lambda_t$ — CZs with different $W^{\text{pre}}$ travel different time paths.

$W_c^{\text{pre}}$ must be time-invariant (measured pre-treatment).

Adding it as a level alone (without time interaction) is absorbed by $\alpha_c$.
Only the time-varying interaction $W_c^{\text{pre}\prime}\gamma_t$ identifies anything new.

The Fully Expanded Form

Period dummies and their $W^{\text{pre}}$ interactions written out (observation indexed by $(c, t')$ on this slide; $t$ is the summation index):

\[ Y_{c,t'} = \alpha_c \;+\; \sum_{t=2}^{T} \mathbf{1}[t'=t]\,\lambda_{t} \;+\; \sum_{t=2}^{T} \mathbf{1}[t'=t] \cdot W_c^{\text{pre}\prime}\gamma_{t} \;+\; X_{c,t'}'\beta + \varepsilon_{c,t'} \]

Parameters:

$\alpha_c$: $N$ unit FE.
$\lambda_{t}$ for $t = 2, \ldots, T$: $T-1$ scalar coefficients on period dummies.
$\gamma_{t}$ for $t = 2, \ldots, T$: each a $K$-vector — $K(T-1)$ coefficients on (period $\times W^{\text{pre}}$) interactions.
$\beta$: causal slope on $X$.

Dummy Trap and Reference Period

Why drop one period for both $\lambda$ and $\gamma$?

Standalone period dummies: $\sum_{t=1}^{T} \mathbf{1}[t'=t] = 1$ for all $(c, t')$ — the constant is in span($\alpha_c$). Including all $T$ → perfect collinearity. Drop $\lambda_{t_0}$.

$W^{\text{pre}}$-interaction dummies: $\sum_{t=1}^{T} \mathbf{1}[t'=t] \cdot W_c^{\text{pre}} = W_c^{\text{pre}}$ — time-invariant, also in span($\alpha_c$). Drop $\gamma_{t_0}$.

Convention: drop the same reference period $t_0$ — typically the last pre-treatment period. Coefficients $\lambda_{t}, \gamma_{t}$ are then interpreted relative to $t_0$.

ADH: Headline Finding

Pulling everything together — what does ADH actually find?

From their preferred IV spec with $W^{\text{pre}}$ controls (Table 3, col. 6, stacked long-differences):

A $1,000-per-worker increase in import exposure over a decade $\Rightarrow$ a 0.596 pp drop in CZ manufacturing employment / working-age population.

Stability across spec columns 1–6 (with vs. without $W^{\text{pre}}$ controls) is part of ADH’s robustness argument.

What’s Next: DiD as a Special Case

When treatment is binary and group-structured:

\[ D_{it} = G_i \cdot \mathbf{1}[t \geq t^*] \]

$G_i \in \{0, 1\}$: indicator that unit $i$ belongs to the treated group (treated at some point).
$t^*$: the period when treatment turns on for the treated group.

TWFE specializes to difference-in-differences, the canonical design covered in lect6b.