Linear Time Series Models

AR, MA, ARMA Representations

Natasha Kang

Xiamen University, Chow Institute

March, 2026

Why Do We Need Time Series Models?

In time series data, order matters.

Observations are indexed by time
Reordering the data changes its meaning
Dependence across observations is intrinsic

Time series data are viewed as realizations of a stochastic process, with dependence across time.

A time series model specifies the dynamic evolution of this stochastic process over time.

A General Dynamic Time Series Representation

In general, the evolution of a time series can be written as

\[ y_t = f\!\left( y_{t-1}, y_{t-2}, \ldots; \varepsilon_t, \varepsilon_{t-1}, \ldots \right), \]

where \(f(\cdot)\) is an unknown function.

\(f(\cdot)\) is referred to as the data-generating process (DGP).

DGP is generally unknown and potentially complex.

Linear Time Series Models

In practice, we work with simplified, tractable models that approximate the true data-generating process in a way that preserves the population quantities of interest.

We focus on linear stationary models:
- autoregressive (AR) models
- moving average (MA) models
- autoregressive moving average (ARMA) models

The Lag (Backshift) Operator

The lag (backshift) operator \(L\) is defined by \[ L^i y_t \equiv y_{t-i}. \]

Thus, applying \(L^i\) to \(y_t\) shifts the series back by \(i\) periods.

Algebraic Properties of the Lag Operator

The lag operator is a linear operator:

Lag of a constant: \[ L c = c. \]
Distributive (linearity): \[ (L^i + L^j) y_t = L^i y_t + L^j y_t = y_{t-i} + y_{t-j}. \]
Associative law of multiplication: \[ L^i L^j y_t = L^{i+j} y_t = y_{t-(i+j)}, \qquad L^0 y_t = y_t. \]

Lag Polynomials

Because these properties hold, lag operators can be manipulated algebraically.

A lag polynomial is written as \[ \phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p. \]

Applying it to \(y_t\): \[ \phi(L) y_t = y_t - \phi_1 y_{t-1} - \cdots - \phi_p y_{t-p}. \]

Autoregressive Models

An autoregressive process of order \(p\), denoted AR(\(p\)), is defined by

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t, \]

where

\(\{w_t\}\) is white noise with mean zero and variance \(\sigma_w^2\)
\(\phi_1, \ldots, \phi_p\) are constants

AR Models: Mean and Intercept

If \(\{x_t\}\) has nonzero mean \(\mu\), the AR(\(p\)) model can be written as

\[ x_t - \mu = \phi_1 (x_{t-1} - \mu) + \cdots + \phi_p (x_{t-p} - \mu) + w_t. \]

Equivalently,

\[ x_t = \alpha + \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t, \]

where \(\alpha = \mu(1 - \phi_1 - \cdots - \phi_p)\).

AR Models in Lag-Operator Form

The AR(\(p\)) model can be written as

\[ (1 - \phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p)\, x_t = w_t. \]

Define the lag polynomial

\[ \phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p, \]

so that the model can be written compactly as

\[ \phi(L) x_t = w_t. \]

The operator \(\phi(L)\) is called the autoregressive operator.

Example: The AR(1) Model

Consider the AR(1) model \[ x_t = \phi x_{t-1} + w_t, \qquad w_t \sim wn(0,\sigma_w^2). \]

Iterate backward \(k\) times:

\[ \begin{align*} x_t &= \phi x_{t-1} + w_t \\ &= \phi^2 x_{t-2} + \phi w_{t-1} + w_t \\ &\;\;\vdots \\ &= \phi^k x_{t-k} + \sum_{j=0}^{k-1} \phi^j w_{t-j}. \end{align*} \]

AR(1) Stability and the MA(\(\infty\)) Form

Stability and Initial Conditions

To eliminate the dependence on the initial condition \(x_{t-k}\), we need \(\phi^k x_{t-k}\) to vanish as \(k \to \infty\).

If \(|\phi|<1\) and the initial condition is square-integrable, then

\[ \phi^k x_{t-k} \to 0 \quad \text{in mean square as } k\to\infty. \]

This gives an expression of \(x_t\) as a weighted sum of current and past shocks. \[ x_t = \sum_{j=0}^{\infty}\phi^j w_{t-j}. \]

Interpreting the MA(\(\infty\)) Representation

The representation

\[ x_t = \sum_{j=0}^{\infty}\phi^j w_{t-j} \]

can be written compactly using lag operators as

\[ x_t = (1-\phi L)^{-1} w_t. \]

Here, \((1-\phi L)^{-1}\) denotes the one-sided inverse of the autoregressive operator, defined through the power-series expansion

\[ (1-\phi L)^{-1} = 1 + \phi L + \phi^2 L^2 + \cdots, \qquad |\phi|<1. \]

Thus, backward recursion provides a dynamic justification for the inverse lag-polynomial representation.

AR(1): First and Second Moments

\[ x_t = \sum_{j=0}^{\infty} \phi^j w_{t-j}, \]

Mean

\[ \begin{align*} \mathbb{E}(x_t) &= \mathbb{E}\!\left(\sum_{j=0}^{\infty} \phi^j w_{t-j}\right) = \sum_{j=0}^{\infty} \phi^j \mathbb{E}(w_{t-j}) = 0. \end{align*} \]

Autocovariances

\[ \begin{align} \gamma(h) =& \mathrm{Cov}(x_{t+h},x_t) = \mathbb{E}\!\left[ \left(\sum_{j=0}^{\infty} \phi^j w_{t+h-j}\right) \left(\sum_{k=0}^{\infty} \phi^k w_{t-k}\right) \right]\\ &\;\;\vdots \\ =& \frac{\sigma_w^2\,\phi^h}{1-\phi^2}, \qquad h \ge 0, \end{align} \] and by symmetry, \[ \gamma(-h)=\gamma(h). \]

AR(1): Autocorrelation

Autocorrelation

\[ \rho(h) = \frac{\gamma(h)}{\gamma(0)}=\phi^h, \qquad h\ge 0, \qquad \rho(-h)=\rho(h). \]

When \(|\phi|<1\), the AR(1) process is weakly stationary!

Simulated Sample Path of AR(1) Processes

Explosive AR(1) Models and Causality

\[ x_t = \phi x_{t-1} + w_t \]

When \(|\phi|>1\), the process is called explosive: the magnitude of \(x_t\) grows exponentially.

In this case, backward iteration does not converge. The weights grow rather than decay, so \(x_t\) cannot be written as a weighted sum of past shocks.

Note: the AR equation describes an algebraic relation. Here, ``explosive’’ refers to failure of the implicit forward-time recursive construction. It does not mean that the equation itself admits no stationary solution.

Explosive AR(1): Forward Iteration

Instead, write:

\[ x_{t+1} = \phi x_t + w_{t+1}. \] Solving for \(x_t\) gives \[ x_t = \phi^{-1} x_{t+1} - \phi^{-1} w_{t+1}. \]

Iterating forward \(k\) steps yields \[ x_t = \phi^{-k} x_{t+k} - \sum_{j=1}^{k} \phi^{-j} w_{t+j}. \]

Explosive AR(1): Noncausal Representation

If \(|\phi|>1\), then \(|\phi^{-1}|<1\), so \[ \phi^{-k} x_{t+k} \to 0 \quad \text{in mean square as } k\to\infty, \] and we obtain the representation \[ x_t = - \sum_{j=1}^{\infty} \phi^{-j} w_{t+j}. \]

This representation depends on future shocks \(\{w_{t+1}, w_{t+2},\ldots\}\).

Hence, the explosive AR(1) process is noncausal.

Causality in Time Series

A time series \(\{x_t\}\) is said to be causal if it can be written as \[ x_t = \sum_{j=0}^{\infty} \psi_j w_{t-j}, \] where \(\{w_t\}\) is white noise and the coefficients satisfy \[ \sum_{j=0}^{\infty} |\psi_j| < \infty. \]

The absolute summability condition ensures that the representation is well defined.

Note: this notion of causality has nothing to do with treatment effects or structural causality.

Every Explosion Has a Cause

\[ x_t = \phi x_{t-1} + w_t, \qquad |\phi|>1, \; w_t \sim \text{i.i.d.}\; N(0, \sigma^2_w) \]

\(\{x_t\}\) is a non-causal stationary Gaussian process.

Distributional equivalence

Define the causal AR(1) process \[ y_t = \phi^{-1} y_{t-1} + v_t, \] where \[ v_t \sim \text{i.i.d.}\; N(0,\sigma_w^2 \phi^{-2}). \]

The processes \(\{x_t\}\) and \(\{y_t\}\) have the same finite-dimensional distributions.

Explosive AR(1): Observational Equivalence

Example: If \[ x_t = 2x_{t-1} + w_t, \qquad \sigma_w^2 = 1, \] then the equivalent causal process is \[ y_t = \tfrac12 y_{t-1} + v_t, \qquad \sigma_v^2 = \tfrac14. \]

These two processes are observationally equivalent.

Moving Average Models

A moving average model of order \(q\), denoted MA(\(q\)), is defined by

\[ x_t = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}, \] where \(\{w_t\} \sim wn(0, \sigma^2_w)\) and \(\theta_1,\theta_2, \ldots, \theta_q\) are parameters.

We can equivalently write the MA(\(q\)) process as

\[ x_t = \theta(L) w_t, \] where \(\theta(L) = 1 + \theta_1 L + \cdots + \theta_q L^q\) is the moving average operator.

Unlike AR processes, MA(\(q\)) processes are stationary for all parameter values,
since \(x_t\) is a finite linear combination of white noise.

Example: The MA(1) Process

Consider the MA(1) model \[ x_t = w_t + \theta w_{t-1}, \qquad w_t \sim wn(0,\sigma_w^2). \]

Autocovariances

The autocovariance function is \[ \gamma(h)=\mathrm{Cov}(x_{t+h},x_t) = \begin{cases} (1+\theta^2)\sigma_w^2, & h=0,\\[4pt] \theta\sigma_w^2, & h=1,\\[4pt] 0, & |h|>1. \end{cases} \]

MA(1) Autocorrelation

Autocorrelation

The ACF is \[ \rho(h)=\frac{\gamma(h)}{\gamma(0)} = \begin{cases} \dfrac{\theta}{1+\theta^2}, & h=\pm 1,\\[8pt] 0, & |h|>1. \end{cases} \]

Note that \(|\rho(1)|\le \frac12\) for all \(\theta\) (maximum at \(|\theta|=1\)).

Simulated Sample Path of MA(1) Processes

Non-uniqueness of MA Models and Invertibility

\[ x_t = w_t + \theta w_{t-1} \]

The autocorrelation function

\[ \rho(h)=\frac{\gamma(h)}{\gamma(0)} = \begin{cases} \dfrac{\theta}{1+\theta^2}, & h=\pm 1,\\[8pt] 0, & |h|>1. \end{cases} \] is the same for \(\theta\) and \(1/\theta\).

Moreover, the variance \(\gamma(0)=(1+\theta^2)\sigma_w^2\) is the same for the parameter pairs \((\sigma_w^2,\theta)\) and \((\sigma_w^2\theta^2,\;1/\theta)\).

Hence, the same stochastic process admits two MA(1) representations: \[ x_t = w_t + \theta w_{t-1}, \qquad x_t = v_t + \frac{1}{\theta} v_{t-1}, \] with appropriately scaled white noise.

This lack of uniqueness creates an identification problem!

Invertibility in Time Series

A time series \(\{x_t\}\) is said to be invertible if the shocks \(\{w_t\}\) can be written as \[ w_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}, \] where the coefficients satisfy \[ \sum_{j=0}^{\infty} |\pi_j| < \infty. \]

Invertibility is an additional constraint imposed to select a unique moving-average representation, by requiring that the unobserved shocks can be recovered from current and past observations.
It is analogous to causality in AR models, which is imposed to rule out non-causal representations of the same stochastic process.

ARMA Models

A time series \(\{x_t\}\) is said to follow an ARMA(\(p,q\)) process if it is (weakly) stationary and satisfies \[ x_t = \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}, \] where \(\{w_t\}\) is white noise with \(w_t \sim wn(0,\sigma_w^2)\).

Equivalently, \[ \phi(L)x_t = \theta(L) w_t. \]

Parameter Redundancy

Multiplying both sides by any lag polynomial \(\eta(L)\) gives

\[ \eta(L)\phi(L)x_t = \eta(L)\theta(L) w_t, \] which implies the same stochastic process.

ARMA Parameter Redundancy: An Example

Example: Consider the white noise process \[ x_t = w_t. \]

Multiply both sides by \(\eta(L)=1-0.5L\): \[ (1-0.5L)x_t = (1-0.5L)w_t. \]

Rewriting, \[ x_t = 0.5x_{t-1} + w_t - 0.5w_{t-1}. \]

This looks like an ARMA(1,1) model!

AR and MA Polynomials

Define the autoregressive (AR) polynomial as \[ \phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p, \qquad \phi_p \neq 0, \]

and the moving average (MA) polynomial as \[ \theta(z) = 1 + \theta_1 z + \theta_2 z^2 + \cdots + \theta_q z^q, \qquad \theta_q \neq 0, \]

where \(z\) is a complex number.

To rule out parameter redundancy, we require that \(\phi(z)\) and \(\theta(z)\) have no common factors.

Causality and Invertibility of ARMA

Even after ruling out common factors, ARMA representations are not unique without imposing two additional restrictions:

causality (time runs forward)
invertibility (shocks are recoverable from the data)

Root Conditions

These restrictions translate into simple polynomial conditions:

The representation is causal if

\[ \phi(z) \neq 0 \quad \text{for all } |z|\le 1, \]

The representation is invertible if

\[ \theta(z) \neq 0 \quad \text{for all } |z|\le 1. \]

Implications for ARMA Models

Causality via the AR root condition

When \[ \phi(z)\neq 0 \quad \text{for all } |z|\le 1, \] the inverse \(\phi(L)^{-1}\) exists, and the process admits a causal MA(\(\infty\)) representation: \[ x_t = \sum_{j=0}^{\infty} \psi_j w_{t-j}, \qquad \psi(z)=\sum_{j=0}^{\infty}\psi_j z^j=\frac{\theta(z)}{\phi(z)}. \]

Because the coefficients \(\{\psi_j\}\) are absolutely summable (as implied by the root condition), \(\{x_t\}\) is a linear filter of white noise and is therefore weakly stationary.

Invertibility and AR–MA Duality

Invertibility via the MA root condition

When \[ \theta(z)\neq 0 \quad \text{for all } |z|\le 1, \] the inverse \(\theta(L)^{-1}\) exists, and shocks can be recovered from past data: \[ w_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}, \qquad \pi(z)=\sum_{j=0}^{\infty}\pi_j z^j=\frac{\phi(z)}{\theta(z)}. \]

AR–MA Duality

When both root conditions hold, the same stationary process admits both a MA(\(\infty\)) and an AR(\(\infty\)) representation.

Wold Decomposition (Why MA(\(\infty\)) Is Fundamental)

We have shown that causal ARMA representations admit a one-sided MA(\(\infty\)) form. This turns out not to be a special feature of ARMA models, but a general property of stationary stochastic processes.

Wold Decomposition

If \(\{x_t\}\) is weakly stationary and purely nondeterministic (i.e., no component perfectly predictable from the infinite past), then it admits a representation of the form \[ x_t = \sum_{j=0}^{\infty} \psi_j \, \varepsilon_{t-j}, \qquad \psi_0 = 1, \qquad \sum_{j=0}^{\infty} \psi_j^2 < \infty, \] where \(\{\varepsilon_t\}\) is a white-noise innovation sequence (mean \(0\), variance \(\sigma_\varepsilon^2\)). Here \(\varepsilon_t\) denotes the one-step-ahead prediction error (innovation), playing the role of \(w_t\) in the ARMA framework.

Wold Decomposition: Interpretation

Interpretation

The MA(\(\infty\)) form is the canonical representation of a stationary, purely nondeterministic time series.
All second-order dependence is encoded in the coefficients \(\{\psi_j\}\).

Connection to ARMA

Causality in ARMA ensures the model is consistent with the Wold form (a one-sided MA(\(\infty\))).
Invertibility is not required for Wold’s representation; it is an additional restriction used to identify shocks from the data.

Autocovariance of an MA(\(q\)) Process

Consider the MA(\(q\)) model \[ x_t = \theta(L) w_t = \sum_{j=0}^q \theta_j w_{t-j}, \qquad \theta_0 = 1. \]

Because \(x_t\) is a finite linear combination of white noise, the process is weakly stationary.

Mean

\[ \mathbb{E}(x_t) = \sum_{j=0}^q \theta_j \mathbb{E}(w_{t-j}) = 0. \]

Autocovariance

For \(h \ge 0\), \[ \gamma(h) = \mathrm{Cov}(x_{t+h}, x_t) = \begin{cases} \sigma_w^2 \displaystyle\sum_{j=0}^{q-h} \theta_j \theta_{j+h}, & 0 \le h \le q, \\[10pt] 0, & h > q. \end{cases} \]

Autocorrelation of an MA(\(q\)) Process

Autocorrelation

\[ \rho(h) = \frac{\gamma(h)}{\gamma(0)} = \begin{cases} \displaystyle \frac{\sum_{j=0}^{q-h} \theta_j \theta_{j+h}} {1 + \theta_1^2 + \cdots + \theta_q^2}, & 1 \le h \le q, \\[10pt] 0, & h > q. \end{cases} \]

The ACF cuts off after lag \(q\) — the defining signature of an MA(\(q\)) process.

Autocorrelation of an AR(\(p\)) Process

Consider the AR(\(p\)) model \[ x_t = \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t, \qquad w_t \sim wn(0,\sigma_w^2). \]

Assume \(\{x_t\}\) is weakly stationary.

Mean

\[ \mathbb{E}(x_t) = 0. \]

Autocovariances of an AR(\(p\))

Autocovariances: first \(p+1\) values

Multiply the AR(\(p\)) equation by \(x_{t-h}\) and take expectations.

For \(h=1,\ldots,p\):

\[ \gamma(h) = \phi_1\gamma(h-1)+\phi_2\gamma(h-2)+\cdots+\phi_p\gamma(p-h). \]

For \(h=0\):

\[ \gamma(0) = \phi_1\gamma(1)+\cdots+\phi_p\gamma(p)+\sigma_w^2. \]

These \(p+1\) equations determine \[ \gamma(0),\gamma(1),\ldots,\gamma(p). \]

Autocovariances: recursion for larger lags

Once \(\gamma(0),\ldots,\gamma(p)\) are known, all higher autocovariances follow from \[ \gamma(h) = \phi_1\gamma(h-1)+\cdots+\phi_p\gamma(h-p), \qquad h>p. \]

The ACF of an AR(\(p\)) process decays gradually and does not cut off.

Autocovariances of an ARMA(\(p,q\)) Process

Consider the ARMA(\(p,q\)) model \[ \phi(L)x_t = \theta(L) w_t, \qquad w_t \sim wn(0,\sigma_w^2), \] and assume \(\{x_t\}\) is weakly stationary.

Autocovariance: initial values and recursion

Multiply the ARMA(\(p,q\)) equation by \(x_{t-h}\) and take expectations.

For \(h \ge 0\), \[ \gamma(h) = \phi_1 \gamma(h-1) + \cdots + \phi_p \gamma(h-p) + \sum_{j=0}^q \theta_j\,\mathrm{Cov}(w_{t-j}, x_{t-h}). \]

ARMA Autocovariances: Initial Values and Recursion

Initial values

The MA terms contribute only when \(h \le q\), since \[ \mathrm{Cov}(w_{t-j},x_{t-h}) = 0 \quad \text{for } h>j. \]

Therefore, the values \[ \gamma(0),\gamma(1),\ldots,\gamma(\max(p,q)) \] must be determined by solving a finite system of equations.

Recursion

Once these initial values are known, the MA terms vanish and the autocovariances satisfy the AR(\(p\)) recursion \[ \gamma(h)=\phi_1\gamma(h-1)+\cdots+\phi_p\gamma(h-p), \qquad h>\max(p,q). \]

MA terms determine short-run autocovariances; AR terms govern long-run decay of the ACF.

Partial Autocorrelation Function (PACF)

The autocorrelation function (ACF) \(\rho(h)\) measures the total dependence between \(x_t\) and \(x_{t-h}\). However, it does not distinguish between:

direct dependence at lag \(h\), and
indirect dependence transmitted through \(x_{t-1},\ldots,x_{t-h+1}\).

Definition (Partial Autocorrelation Function)

The partial autocorrelation function (PACF) is the sequence \[ \{\alpha(h)\}_{h\ge1}, \] where, for each lag \(h\), the partial autocorrelation \(\alpha(h)\) is defined as the correlation between the residuals obtained after linearly projecting \[ x_t \quad\text{and}\quad x_{t-h} \] on the intermediate lags \[ x_{t-1}, x_{t-2}, \ldots, x_{t-h+1}. \]

PACF as a Regression Coefficient

Equivalent regression characterization

Equivalently, \(\alpha(h)\) is the coefficient on \(x_{t-h}\) in the linear projection \[ x_t = \beta_1 x_{t-1}+\cdots+\beta_{h-1}x_{t-h+1} +\alpha(h)\,x_{t-h} +u_t. \]

FYI: this is basically Frisch–Waugh–Lovell theorem.

ACF and PACF: Canonical Patterns

ACF and PACF: Identification Rule

Model	ACF	PACF
AR(\(p\))	Tails off	Cuts off after lag \(p\)
MA(\(q\))	Cuts off after lag \(q\)	Tails off
ARMA(\(p,q\))	Tails off	Tails off

Where We Are

What we have done

Introduced linear stationary time series models:
- AR, MA, and ARMA representations
Models are representations of a stochastic process
Identified key sources of non-uniqueness:
- non-causal AR representations
- non-invertible MA representations
- redundant ARMA factorizations
Imposed causality and invertibility to obtain interpretable, unique representations
Characterized dependence structures using:
- autocorrelation functions (ACF)
- partial autocorrelation functions (PACF)

What Comes Next

How do we estimate these models from data?
What does “large sample” mean when observations are dependent?

Next unit: Estimation and Inference for Stationary Time Series