Xiamen University, Chow Institute
March, 2026
So far in this course, we have focused on the dynamics of a single time series.
AR, MA, and ARIMA models ask questions such as:
These models describe internal dynamics — how a variable evolves on its own.
Univariate time series models, such as ARIMA, describe how a variable evolves based on its own past.
They do not incorporate information from other observed variables.
For example, when income and consumption are observed over time, one may ask:
How are movements in income related to movements in consumption over time?
Regression provides a simple way to summarize such relationships.
We extend univariate time series models by allowing the outcome to depend on other observed variables.
In general, we consider linear regressions of the form
\[y_t = x_t' \beta + u_t , \quad t = 1, \ldots, T. \]
Here:
Since all variables are time series, they may exhibit dependence over time.
OLS: \[ \hat\beta = \Big(\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\sum_{t=1}^T x_t y_t\Big). \]
Plug in \(y_t = x_t'\beta + u_t\): \[ \begin{aligned} \hat\beta - \beta &= \Big(\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\sum_{t=1}^T x_t u_t\Big). \end{aligned} \]
Everything reduces to understanding the behavior of the sample moments \(\sum x_t u_t\) and \(\sum x_t x_t'\) when observations are dependent over time.
A clean sufficient condition for finite-sample unbiasedness is a conditional mean restriction: \[ \mathbb{E}(u_t \mid x_1,\ldots,x_T) = 0 \quad \text{for all } t. \]
Then, conditional on the entire regressor path \(X=(x_1,\ldots,x_T)\), \[ \begin{aligned} \mathbb{E}(\hat\beta \mid X) &= \beta + \Big(\sum x_t x_t'\Big)^{-1} \sum x_t\, \mathbb{E}(u_t \mid X) = \beta. \end{aligned} \]
Is this a reasonable assumption in time series applications?
This restriction conditions on the entire regressor path, including future values.
In time series applications, this can be problematic:
future values of \(x_t\) may depend on \(u_t\)
As a result, finite-sample unbiasedness is often too strong to serve as a practical benchmark.
Recall the decomposition
\[ \hat\beta - \beta = \Big(\frac{1}{T}\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\frac{1}{T}\sum_{t=1}^T x_t u_t\Big). \]
Consistency requires that each term converges:
\[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \;\xrightarrow{p}\; Q, \quad Q \text{ positive definite}. \]
\[ \frac{1}{T}\sum_{t=1}^T x_t u_t \;\xrightarrow{p}\; 0. \]
Under these conditions, \[ \hat\beta \xrightarrow{p} \beta. \]
Consistency hinges on the behavior of two sample averages:
\[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \quad \text{and} \quad \frac{1}{T}\sum_{t=1}^T x_t u_t . \]
What conditions ensure these quantities converge when the data are time series?
Let \(z_t = (y_t, x_t')'\) denote the vector of observed variables.
From the decomposition,
\[ \sqrt{T}(\hat\beta - \beta) = \Big(\frac{1}{T}\sum x_t x_t'\Big)^{-1} \Big(\frac{1}{\sqrt{T}}\sum x_t u_t\Big). \]
For inference, we need:
Consider the normalized sum \[ \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t . \]
A central limit theorem applies if:
Under these conditions, \[ \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t \;\xrightarrow{d}\; N(0,\Omega), \] for some finite asymptotic variance \(\Omega\).
Let \[ Z_t = x_t u_t . \]
The asymptotic variance, aka long-run variance, is defined as the variance of the partial sum: \[ \Omega = \lim_{T\to\infty} \operatorname{Var}\!\left( \frac{1}{\sqrt{T}}\sum_{t=1}^T Z_t \right). \]
Since \(\mathbb E(Z_t)=0\), \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{t=1}^T\sum_{s=1}^T \mathbb E(Z_t Z_s'). \]
Under weak stationarity, define the lag-\(h\) autocovariance matrix \[ \Gamma(h)=\mathbb E(Z_t Z_{t-h}'), \quad h\in\mathbb Z. \] Then \(\mathbb E(Z_t Z_s')=\Gamma(t-s)\), so \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{t=1}^T\sum_{s=1}^T \Gamma(t-s). \]
Let \(h=t-s\). For a fixed \(h\in\{-(T-1),\ldots,T-1\}\), the number of pairs \((t,s)\) with \(t-s=h\) is \(T-|h|\):
Grouping terms by \(h=t-s\) gives \[ \operatorname{Var}\!\left(\sum_{t=1}^T Z_t\right) = \sum_{h=-(T-1)}^{T-1}(T-|h|)\,\Gamma(h). \]
Therefore, \[ \operatorname{Var}\!\left(\frac{1}{\sqrt{T}}\sum_{t=1}^T Z_t\right) = \sum_{h=-(T-1)}^{T-1}\Big(1-\frac{|h|}{T}\Big)\Gamma(h). \]
If the autocovariances are summable, \[ \sum_{h=-\infty}^{\infty}\|\Gamma(h)\|<\infty, \] then letting \(T\to\infty\) yields \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h). \]
Equivalently, \[ \Omega = \sum_{h=-\infty}^{\infty} \mathbb{E}\!\left(x_t u_t\,x_{t-h}'u_{t-h}\right). \]
From earlier results, \[ \sqrt{T}(\hat\beta - \beta) = \Big(\frac{1}{T}\sum_{t=1}^T x_t x_t'\Big)^{-1} \Big(\frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t\Big). \]
Under the stated conditions, \[ \frac{1}{T}\sum_{t=1}^T x_t x_t' \xrightarrow{p} Q, \qquad \frac{1}{\sqrt{T}}\sum_{t=1}^T x_t u_t \xrightarrow{d} N(0,\Omega). \]
Therefore, \[ \sqrt{T}(\hat\beta - \beta) \;\xrightarrow{d}\; N\!\left(0,\; Q^{-1}\Omega Q^{-1}\right). \]
Let \(Z_t = x_t u_t\).
If \(\{Z_t\}\) is a martingale difference sequence, i.e. \[ \mathbb E(Z_t \mid \mathscr F_{t-1}) = 0, \] then \(\Gamma(h)=0\) for all \(h\neq 0\), and the long-run variance reduces to \[ \Omega = \Gamma(0) = \mathbb E(Z_t Z_t') = \mathbb E(x_t x_t' u_t^2). \]
If in addition \(\mathbb E(u_t^2 \mid x_t)=\sigma^2\), then \[ \mathbb E(x_t x_t' u_t^2)=\sigma^2 Q, \] and \[ \sqrt{T}(\hat\beta-\beta) \Rightarrow N\!\left(0,\;\sigma^2 Q^{-1}\right), \] the classical OLS result.
In general, the long-run variance \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h), \qquad \Gamma(h)=\mathbb E(Z_t Z_{t-h}'), \quad Z_t=x_t u_t, \] is unknown and must be estimated.
A natural approach replaces population autocovariances by sample analogues: \[ \hat\Gamma(h) = \frac{1}{T}\sum_{t=|h|+1}^T Z_t Z_{t-|h|}' , \quad Z_t=x_t\hat u_t. \]
The HAC (Newey–West) estimator is \[ \hat\Omega = \hat\Gamma(0) + \sum_{h=1}^L w_h \big(\hat\Gamma(h)+\hat\Gamma(h)'\big), \] where:
The long-run variance is an infinite sum of autocovariances: \[ \Omega = \sum_{h=-\infty}^{\infty}\Gamma(h). \]
In finite samples, sample autocovariances \(\hat\Gamma(h)\) become increasingly noisy as the lag \(h\) grows:
A first step is therefore to truncate the sum at a finite lag \(L\), ignoring very high-lag autocovariances.
Hard truncation treats all included lags equally, even though higher-lag autocovariances are estimated less precisely.
Kernel weights \(w_h\) improve finite-sample behavior by:
The HAC estimator smooths the truncated autocovariance sum, trading a small amount of bias for lower variance.
(With \(L\to\infty\) and \(L/T\to 0\), the HAC estimator is consistent: \(\hat\Omega \xrightarrow{p} \Omega\).)
So far, we studied regressions of the form \[ y_t = x_t' \beta + u_t, \qquad t=1,\ldots,T. \]
This specification relates \(y_t\) to \(x_t\) at the same point in time.
Key points:
OLS estimation can be consistent under time dependence, given suitable LLN/ergodic conditions.
Inference depends on the long-run variance of \(\sum x_t u_t\). When \(x_t u_t\) is serially dependent, HAC-based standard errors are typically required.
But many economic relationships are not purely contemporaneous.
The effect of \(x_t\) on \(y_t\) may unfold gradually over time.
Dynamic regression models extend contemporaneous regression by allowing lagged variables to enter the specification.
A general form is \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t, \] where \(x_t\) is a vector of regressors.
Common specifications:
DL (distributed lag): \[ y_t = \alpha + \sum_{j=0}^q x_{t-j}'\beta_j + u_t \]
ARX (autoregressive with exogenous regressors): \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + x_t'\beta_0 + u_t \]
ADL (autoregressive distributed lag): \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t \]
All are estimated by OLS and use the same inference framework.
Consider the dynamic regression \[ y_t = a_0 + a_1 y_{t-1} + c_0 z_t + \varepsilon_t, \qquad |a_1|<1. \]
This model allows:
Rewrite the model as \[ (1 - a_1 L) y_t = a_0 + c_0 z_t + \varepsilon_t. \]
Inverting the lag polynomial gives the infinite distributed-lag representation \[ y_t = \frac{a_0}{1-a_1} + c_0 \sum_{i=0}^\infty a_1^i z_{t-i} + \sum_{i=0}^\infty a_1^i \varepsilon_{t-i}. \]
From the infinite distributed-lag representation, \[ y_t = \frac{a_0}{1-a_1} + c_0 \sum_{i=0}^\infty a_1^i z_{t-i} + \sum_{i=0}^\infty a_1^i \varepsilon_{t-i}, \] we can trace the response of \(y_t\) to a change in \(z_t\).
The impact effect (one-period response) is \[ \frac{\partial y_t}{\partial z_t} = c_0. \]
For \(j \ge 0\), the dynamic response is \[ \frac{\partial y_{t+j}}{\partial z_t} = c_0 a_1^{\,j}. \]
The sequence \[ \left\{ \frac{\partial y_{t+j}}{\partial z_t} \right\}_{j=0}^\infty = \{ c_0,\, c_0 a_1,\, c_0 a_1^2,\, \ldots \} \] is the impulse response function (IRF) of \(y_t\) to \(z_t\).
Interpretation:
THe dynamic response is \[ \frac{\partial y_{t+j}}{\partial z_t} = c_0 a_1^{\,j}. \]
Now consider a permanent upward shift in the regressor path: \[ (z_t, z_{t+1}, z_{t+2}, \ldots) \;\mapsto\; (z_t+1, z_{t+1}+1, z_{t+2}+1, \ldots). \]
The effect on \(y_{t+j}\) is the sum of partial effects: \[ \sum_{k=0}^{j} \frac{\partial y_{t+j}}{\partial z_{t+k}} = c_0 \sum_{k=0}^{j} a_1^{\,k}. \]
Taking the limit as \(j \to \infty\), \[ \lim_{j\to\infty} \sum_{k=0}^{j} \frac{\partial y_{t+j}}{\partial z_{t+k}} = \frac{c_0}{1-a_1}, \qquad |a_1|<1. \]
This limit is called the long-run multiplier.
Note: The term \[ \sum_{k=0}^\infty a_1^{\,k} = \frac{1}{1-a_1} \] is the inverse AR polynomial \((1 - a_1 L)^{-1}\) evaluated at \(L=1\).
Consider the ADL(\(p,q\)) model \[ y_t = \alpha + \sum_{i=1}^p \phi_i y_{t-i} + \sum_{j=0}^q x_{t-j}'\beta_j + u_t. \]
Write the model in lag-polynomial form: \[ \Phi(L) y_t = \alpha + B(L) x_t + u_t, \] where \[ \Phi(L) = 1 - \sum_{i=1}^p \phi_i L^i, \qquad B(L) = \sum_{j=0}^q \beta_j L^j. \]
If \(\Phi(L)\) is stable, we can invert it and write an (infinite) distributed-lag form:
\[ y_t = \Phi(L)^{-1}\alpha + \Psi(L) x_t + \Phi(L)^{-1}u_t, \qquad \Psi(L) \equiv \Phi(L)^{-1} B(L). \]
The long-run multiplier is the sum of the distributed-lag coefficients, i.e. \[ \theta = \Psi(1) = \frac{B(1)}{\Phi(1)} = \frac{\sum_{j=0}^q \beta_j}{1 - \sum_{i=1}^p \phi_i}. \]
Let
Estimate an ADL(1,1): \[ \pi_t=\alpha+\phi\,\pi_{t-1}+\beta_0 u_t+\beta_1 u_{t-1}+e_t. \]
# ----------------------------
# Construct lags explicitly
# ----------------------------
df <- df %>%
mutate(
inflation_l1 = dplyr::lag(inflation, 1),
u_l1 = dplyr::lag(UNRATE, 1)
) %>%
na.omit()
# ----------------------------
# Estimate ADL(1,1)
# ----------------------------
adl_fit <- lm(
inflation ~ inflation_l1 + UNRATE + u_l1,
data = df
)
# HAC standard errors (Newey–West)
nw_se <- NeweyWest(adl_fit, lag = 12, prewhite = FALSE)
coeftest(adl_fit, vcov. = nw_se)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00102700 0.00034823 2.9492 0.003281 **
inflation_l1 0.61786613 0.05652286 10.9313 < 2.2e-16 ***
UNRATE -0.00055846 0.00012961 -4.3086 1.851e-05 ***
u_l1 0.00058188 0.00013802 4.2159 2.777e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1