Regression with Time Series Data

Natasha Kang

Xiamen University, Chow Institute

March, 2026

What We Have Modeled So Far

So far in this course, we have focused on the dynamics of a single time series.

AR, MA, and ARIMA models ask questions such as:

How persistent is this variable?
How do shocks propagate over time?
How much of today can be explained by the past?

These models describe internal dynamics — how a variable evolves on its own.

What ARIMA Leaves Out

Univariate time series models, such as ARIMA, describe how a variable evolves based on its own past.

They do not incorporate information from other observed variables.

For example, when income and consumption are observed over time, one may ask:

How are movements in income related to movements in consumption over time?

Regression provides a simple way to summarize such relationships.

Regression with Time Series Data

We extend univariate time series models by allowing the outcome to depend on other observed variables.

In general, we consider linear regressions of the form

\[y_t = x_t' \beta + u_t , \quad t = 1, \ldots, T. \]

Here:

\(y_t\) is the time series of interest
\(x_t\) is a vector of observed regressors
\(u_t\) collects all other influences not explicitly modeled

Since all variables are time series, they may exhibit dependence over time.

Estimation via OLS

OLS: \[ \hat\beta = \Big(\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\sum_{t=1}^T x_t y_t\Big). \]

Plug in \(y_t = x_t'\beta + u_t\): \[ \begin{aligned} \hat\beta - \beta &= \Big(\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\sum_{t=1}^T x_t u_t\Big). \end{aligned} \]

Everything reduces to understanding the behavior of the sample moments \(\sum x_t u_t\) and \(\sum x_t x_t'\) when observations are dependent over time.

Unbiasedness: What It Would Take

A clean sufficient condition for finite-sample unbiasedness is a conditional mean restriction: \[ \mathbb{E}(u_t \mid x_1,\ldots,x_T) = 0 \quad \text{for all } t. \]

Then, conditional on the entire regressor path \(X=(x_1,\ldots,x_T)\), \[ \begin{aligned} \mathbb{E}(\hat\beta \mid X) &= \beta + \Big(\sum x_t x_t'\Big)^{-1} \sum x_t\, \mathbb{E}(u_t \mid X) = \beta. \end{aligned} \]

Is this a reasonable assumption in time series applications?

Why Unbiasedness Is Too Strong

This restriction conditions on the entire regressor path, including future values.

In time series applications, this can be problematic:

future values of \(x_t\) may depend on \(u_t\)

As a result, finite-sample unbiasedness is often too strong to serve as a practical benchmark.

Consistency

Recall the decomposition

\[ \hat\beta - \beta = \Big(\frac{1}{T}\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\frac{1}{T}\sum_{t=1}^T x_t u_t\Big). \]

Consistency requires that each term converges:

Second moments stabilize:

\[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \;\xrightarrow{p}\; Q, \quad Q \text{ positive definite}. \]

Sample cross-products converge to zero (requires \(\mathbb{E}(x_t u_t) = 0\)):

\[ \frac{1}{T}\sum_{t=1}^T x_t u_t \;\xrightarrow{p}\; 0. \]

Under these conditions, \[ \hat\beta \xrightarrow{p} \beta. \]

What Needs to Converge?

Consistency hinges on the behavior of two sample averages:

\[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \quad \text{and} \quad \frac{1}{T}\sum_{t=1}^T x_t u_t . \]

What conditions ensure these quantities converge when the data are time series?

Stationarity and Ergodicity

Let \(z_t = (y_t, x_t')'\) denote the vector of observed variables.

\((y_t, x_t)\) is weakly stationary, so relevant second moments exist and are time-invariant
\(\{z_t\}\) satisfies an ergodic (or LLN-type) condition, so time averages converge to expectations

What’s Needed for Inference?

From the decomposition,

\[ \sqrt{T}(\hat\beta - \beta) = \Big(\frac{1}{T}\sum x_t x_t'\Big)^{-1} \Big(\frac{1}{\sqrt{T}}\sum x_t u_t\Big). \]

For inference, we need:

convergence of \(\frac{1}{T}\sum x_t x_t'\)
a central limit theorem for \(\frac{1}{\sqrt{T}}\sum x_t u_t\)

A CLT for \(\sum x_t u_t\)

Consider the normalized sum \[ \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t . \]

A central limit theorem applies if:

\(\{(x_t,u_t)\}\) is weakly stationary
\(\mathbb{E}(u_t \mid x_t)=0\) (exogeneity)
\(\{x_t u_t\}\) is weakly dependent
\(\mathbb{E}\|x_t u_t\|^2 < \infty\)

Under these conditions, \[ \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t \;\xrightarrow{d}\; N(0,\Omega), \] for some finite asymptotic variance \(\Omega\).

Long-Run Variance

Let \[ Z_t = x_t u_t . \]

The asymptotic variance, aka long-run variance, is defined as the variance of the partial sum: \[ \Omega = \lim_{T\to\infty} \operatorname{Var}\!\left( \frac{1}{\sqrt{T}}\sum_{t=1}^T Z_t \right). \]

Long-Run Variance as a Sum of Autocovariances

Since \(\mathbb E(Z_t)=0\), \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{t=1}^T\sum_{s=1}^T \mathbb E(Z_t Z_s'). \]

Under weak stationarity, define the lag-\(h\) autocovariance matrix \[ \Gamma(h)=\mathbb E(Z_t Z_{t-h}'), \quad h\in\mathbb Z. \] Then \(\mathbb E(Z_t Z_s')=\Gamma(t-s)\), so \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{t=1}^T\sum_{s=1}^T \Gamma(t-s). \]

Let \(h=t-s\). For a fixed \(h\in\{-(T-1),\ldots,T-1\}\), the number of pairs \((t,s)\) with \(t-s=h\) is \(T-|h|\):

if \(h\ge 0\), then \(s=1,\ldots,T-h\) (so \(T-h\) pairs)
if \(h<0\), then \(t=1,\ldots,T-(-h)\) (so \(T-(-h)=T-|h|\) pairs)

Grouping terms by \(h=t-s\) gives \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{h=-(T-1)}^{T-1}(T-|h|)\,\Gamma(h). \]

Therefore, \[ \operatorname{Var}\!\left(\frac{1}{\sqrt{T}}\sum_{t=1}^T Z_t\right) = \sum_{h=-(T-1)}^{T-1}\Big(1-\frac{|h|}{T}\Big)\Gamma(h). \]

Closed-Form Long-Run Variance

If the autocovariances are summable, \[ \sum_{h=-\infty}^{\infty}\|\Gamma(h)\|<\infty, \] then letting \(T\to\infty\) yields \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h). \]

Equivalently, \[ \Omega = \sum_{h=-\infty}^{\infty} \mathbb{E}\!\left(x_t u_t\,x_{t-h}'u_{t-h}\right). \]

Asymptotic Distribution of OLS

From earlier results, \[ \sqrt{T}(\hat\beta - \beta) = \Big(\frac{1}{T}\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t\Big). \]

Under the stated conditions, \[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \xrightarrow{p} Q, \qquad \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t \xrightarrow{d} N(0,\Omega). \]

Therefore, \[ \sqrt{T}(\hat\beta - \beta) \;\xrightarrow{d}\; N\!\left(0,\; Q^{-1}\Omega Q^{-1}\right). \]

Special Case: Martingale Difference Sequence

Let \(Z_t = x_t u_t\).

If \(\{Z_t\}\) is a martingale difference sequence, i.e. \[ \mathbb E(Z_t \mid \mathscr F_{t-1}) = 0, \] then \(\Gamma(h)=0\) for all \(h\neq 0\), and the long-run variance reduces to \[ \Omega = \Gamma(0) = \mathbb E(Z_t Z_t') = \mathbb E(x_t x_t' u_t^2). \]

Homoskedastic Errors and the Classical Result

If \(\{Z_t = x_t u_t\}\) is a martingale difference sequence and \(\mathbb E(u_t^2 \mid x_t)=\sigma^2\), then \[ \mathbb E(x_t x_t' u_t^2)=\sigma^2 Q, \] and \[ \sqrt{T}(\hat\beta-\beta) \Rightarrow N\!\left(0,\;\sigma^2 Q^{-1}\right), \] the classical OLS result.

Estimating the Long-Run Variance (HAC)

In general, the long-run variance \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h), \qquad \Gamma(h)=\mathbb E(Z_t Z_{t-h}'), \quad Z_t=x_t u_t, \] is unknown and must be estimated.

A natural approach replaces population autocovariances by sample analogues: \[ \hat\Gamma(h) = \frac{1}{T}\sum_{t=h+1}^T Z_t Z_{t-h}' , \quad h \ge 0, \quad Z_t=x_t\hat u_t. \]

The HAC (Newey–West) estimator is \[ \hat\Omega = \hat\Gamma(0) + \sum_{h=1}^L w_h \big(\hat\Gamma(h)+\hat\Gamma(h)'\big), \] where:

\(L\) is a truncation lag,
\(w_h\) are kernel weights.

Why Truncation and Weights?

The long-run variance is an infinite sum of autocovariances: \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h). \]

In finite samples, sample autocovariances \(\hat\Gamma(h)\) become increasingly noisy as the lag \(h\) grows:

fewer observations are available at higher lags,
estimation error dominates any true dependence.

A first step is therefore to truncate the sum at a finite lag \(L\), ignoring very high-lag autocovariances.

Kernel Weights in HAC Estimation

Hard truncation treats all included lags equally, even though higher-lag autocovariances are estimated less precisely.

Kernel weights \(w_h\) improve finite-sample behavior by:

downweighting higher-lag autocovariances,
reducing the influence of the noisiest terms,
stabilizing the long-run variance estimator.

The HAC estimator smooths the truncated autocovariance sum, trading a small amount of bias for lower variance.

(With \(L\to\infty\) and \(L/T\to 0\), the HAC estimator is consistent: \(\hat\Omega \xrightarrow{p} \Omega\).)

Contemporaneous Regression

So far, we studied regressions of the form \[ y_t = x_t' \beta + u_t, \qquad t=1,\ldots,T. \]

This specification relates \(y_t\) to \(x_t\) at the same point in time.

Key points:

OLS estimation can be consistent under time dependence, given suitable LLN/ergodic conditions.
Inference depends on the long-run variance of \(\sum x_t u_t\). When \(x_t u_t\) is serially dependent, HAC-based standard errors are typically required.

But many economic relationships are not purely contemporaneous.

The effect of \(x_t\) on \(y_t\) may unfold gradually over time.

Dynamic Regression Models

Dynamic regression models extend contemporaneous regression by allowing lagged variables to enter the specification.

A general form is \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t, \] where \(x_t\) is a vector of regressors.

Common Dynamic Specifications

DL (distributed lag): \[ y_t = \alpha + \sum_{j=0}^q x_{t-j}'\beta_j + u_t \]
ARX (autoregressive with exogenous regressors): \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + x_t'\beta_0 + u_t \]
ADL (autoregressive distributed lag): \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t \]

All are estimated by OLS and use the same inference framework.

A Simple Dynamic Regression

Consider the dynamic regression \[ y_t = a_0 + a_1 y_{t-1} + c_0 z_t + \varepsilon_t, \qquad |a_1|<1. \]

This model allows:

persistence in \(y_t\) through \(y_{t-1}\),
a contemporaneous effect of \(z_t\).

Infinite Distributed-Lag Representation

Rewrite the model as \[ (1 - a_1 L) y_t = a_0 + c_0 z_t + \varepsilon_t. \]

Inverting the lag polynomial gives the infinite distributed-lag representation \[ y_t = \frac{a_0}{1-a_1} + c_0 \sum_{i=0}^\infty a_1^i z_{t-i} + \sum_{i=0}^\infty a_1^i \varepsilon_{t-i}. \]

Impact and Dynamic Responses

From the infinite distributed-lag representation, \[ y_t = \frac{a_0}{1-a_1} + c_0 \sum_{i=0}^\infty a_1^i z_{t-i} + \sum_{i=0}^\infty a_1^i \varepsilon_{t-i}, \] we can trace the response of \(y_t\) to a change in \(z_t\).

The impact effect (one-period response) is \[ \frac{\partial y_t}{\partial z_t} = c_0. \]

For \(j \ge 0\), the dynamic response is \[ \frac{\partial y_{t+j}}{\partial z_t} = c_0 a_1^{\,j}. \]

the response decays geometrically over time
persistence is governed by the autoregressive coefficient \(a_1\)

Impulse Response Function

The sequence \[ \left\{ \frac{\partial y_{t+j}}{\partial z_t} \right\}_{j=0}^\infty = \{ c_0,\, c_0 a_1,\, c_0 a_1^2,\, \ldots \} \] is the impulse response function (IRF) of \(y_t\) to \(z_t\).

Interpretation:

the impulse is a one-unit, one-time change in \(z_t\)
propagation occurs through the dynamic structure of \(y_t\)

Long-Run Effect

The dynamic response is \[ \frac{\partial y_{t+j}}{\partial z_t} = c_0 a_1^{\,j}. \]

Now consider a permanent upward shift in the regressor path: \[ (z_t, z_{t+1}, z_{t+2}, \ldots) \;\mapsto\; (z_t+1, z_{t+1}+1, z_{t+2}+1, \ldots). \]

The effect on \(y_{t+j}\) is the sum of partial effects: \[ \sum_{k=0}^{j} \frac{\partial y_{t+j}}{\partial z_{t+k}} = c_0 \sum_{k=0}^{j} a_1^{\,k}. \]

The Long-Run Multiplier

Taking the limit as \(j \to \infty\), \[ \lim_{j\to\infty} \sum_{k=0}^{j} \frac{\partial y_{t+j}}{\partial z_{t+k}} = \frac{c_0}{1-a_1}, \qquad |a_1|<1. \]

This limit is called the long-run multiplier.

Note: The term \[ \sum_{k=0}^\infty a_1^{\,k} = \frac{1}{1-a_1} \] is the inverse AR polynomial \((1 - a_1 L)^{-1}\) evaluated at \(L=1\).

Long-Run Multiplier in General ADL Models

Consider the ADL(\(p,q\)) model \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t. \]

Write the model in lag-polynomial form: \[ \Phi(L) y_t = \alpha + B(L) x_t + u_t, \] where \[ \Phi(L) = 1 - \sum_{i=1}^p \phi_i L^i, \qquad B(L) = \sum_{j=0}^q \beta_j L^j. \]

Computing the Long-Run Multiplier

If \(\Phi(L)\) is stable, we can invert it and write an (infinite) distributed-lag form:

\[ y_t = \Phi(L)^{-1}\alpha + \Psi(L) x_t + \Phi(L)^{-1}u_t, \qquad \Psi(L) \equiv \Phi(L)^{-1} B(L). \]

The long-run multiplier is the sum of the distributed-lag coefficients, i.e. \[ \theta = \Psi(1) = \frac{B(1)}{\Phi(1)} = \frac{\sum_{j=0}^q \beta_j}{1 - \sum_{i=1}^p \phi_i}. \]

Empirical Example: Inflation and Unemployment (ADL)

Let

\(\pi_t\): monthly inflation (CPI log-difference)
\(UR_t\): unemployment rate

Estimate an ADL(1,1): \[ \pi_t=\alpha+\phi\,\pi_{t-1}+\beta_0 UR_t+\beta_1 UR_{t-1}+e_t. \]

Monthly Inflation

Unemployment Rate

ADL(1,1) Estimation with HAC Standard Errors

# ----------------------------
# Construct lags explicitly
# ----------------------------

df <- df %>%
  mutate(
    inflation_l1 = dplyr::lag(inflation, 1),
    u_l1         = dplyr::lag(UNRATE, 1)
  ) %>%
  na.omit()

# ----------------------------
# Estimate ADL(1,1)
# ----------------------------

adl_fit <- lm(
  inflation ~ inflation_l1 + UNRATE + u_l1,
  data = df
)

# HAC standard errors (Newey–West)
nw_se <- NeweyWest(adl_fit, lag = 12, prewhite = FALSE)

coeftest(adl_fit, vcov. = nw_se)


t test of coefficients:

                Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)   0.00102700  0.00034823  2.9492  0.003281 ** 
inflation_l1  0.61786613  0.05652286 10.9313 < 2.2e-16 ***
UNRATE       -0.00055846  0.00012961 -4.3086 1.851e-05 ***
u_l1          0.00058188  0.00013802  4.2159 2.777e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Impulse Response of Inflation to a One-Unit Increase in Unemployment

What Is the Implied Long-Run Effect of Unemployment on Inflation?

6.12923924367342e-05

Looking Ahead

All results in this unit assume stationarity.

What happens when we regress one nonstationary series on another?

Standard inference breaks down — spurious regression
But meaningful long-run relationships may still exist — cointegration
Next: Cointegration and Error Correction Models