AR, MA, ARMA Representations
Xiamen University, Chow Institute
March, 2026
In time series data, order matters.
Time series data are viewed as realizations of a stochastic process, with dependence across time.
A time series model specifies the dynamic evolution of this stochastic process over time.
In general, the evolution of a time series can be written as
\[ y_t = f\!\left( y_{t-1}, y_{t-2}, \ldots; \varepsilon_t, \varepsilon_{t-1}, \ldots \right), \]
where \(f(\cdot)\) is an unknown function.
\(f(\cdot)\) is referred to as the data-generating process (DGP).
The lag (backshift) operator \(L\) is defined by \[ L^i y_t \equiv y_{t-i}. \]
Thus, applying \(L^i\) to \(y_t\) shifts the series back by \(i\) periods.
The lag operator is a linear operator:
Lag of a constant: \[ L c = c. \]
Distributive (linearity): \[ (L^i + L^j) y_t = L^i y_t + L^j y_t = y_{t-i} + y_{t-j}. \]
Associative law of multiplication: \[ L^i L^j y_t = L^{i+j} y_t = y_{t-(i+j)}, \qquad L^0 y_t = y_t. \]
Because these properties hold, lag operators can be manipulated algebraically.
A lag polynomial is written as \[ \phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p. \]
Applying it to \(y_t\): \[ \phi(L) y_t = y_t - \phi_1 y_{t-1} - \cdots - \phi_p y_{t-p}. \]
An autoregressive process of order \(p\), denoted AR(\(p\)), is defined by
\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t, \]
where
If \(\{x_t\}\) has nonzero mean \(\mu\), the AR(\(p\)) model can be written as
\[ x_t - \mu = \phi_1 (x_{t-1} - \mu) + \cdots + \phi_p (x_{t-p} - \mu) + w_t. \]
Equivalently,
\[ x_t = \alpha + \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t, \]
where \(\alpha = \mu(1 - \phi_1 - \cdots - \phi_p)\).
The AR(\(p\)) model can be written as
\[ (1 - \phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p)\, x_t = w_t. \]
Define the lag polynomial
\[ \phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p, \]
so that the model can be written compactly as
\[ \phi(L) x_t = w_t. \]
The operator \(\phi(L)\) is called the autoregressive operator.
Consider the AR(1) model \[ x_t = \phi x_{t-1} + w_t, \qquad w_t \sim wn(0,\sigma_w^2). \]
Iterate backward \(k\) times:
\[ \begin{align*} x_t &= \phi x_{t-1} + w_t \\ &= \phi^2 x_{t-2} + \phi w_{t-1} + w_t \\ &\;\;\vdots \\ &= \phi^k x_{t-k} + \sum_{j=0}^{k-1} \phi^j w_{t-j}. \end{align*} \]
To eliminate the dependence on the initial condition \(x_{t-k}\), we need \(\phi^k x_{t-k}\) to vanish as \(k \to \infty\).
If \(|\phi|<1\) and the initial condition is square-integrable, then
\[ \phi^k x_{t-k} \to 0 \quad \text{in mean square as } k\to\infty. \]
This gives an expression of \(x_t\) as a weighted sum of current and past shocks. \[ x_t = \sum_{j=0}^{\infty}\phi^j w_{t-j}. \]
The representation
\[ x_t = \sum_{j=0}^{\infty}\phi^j w_{t-j} \]
can be written compactly using lag operators as
\[ x_t = (1-\phi L)^{-1} w_t. \]
Here, \((1-\phi L)^{-1}\) denotes the one-sided inverse of the autoregressive operator, defined through the power-series expansion
\[ (1-\phi L)^{-1} = 1 + \phi L + \phi^2 L^2 + \cdots, \qquad |\phi|<1. \]
Thus, backward recursion provides a dynamic justification for the inverse lag-polynomial representation.
\[ x_t = \sum_{j=0}^{\infty} \phi^j w_{t-j}, \]
\[ \begin{align*} \mathbb{E}(x_t) &= \mathbb{E}\!\left(\sum_{j=0}^{\infty} \phi^j w_{t-j}\right) = \sum_{j=0}^{\infty} \phi^j \mathbb{E}(w_{t-j}) = 0. \end{align*} \]
\[ \begin{align} \gamma(h) =& \mathrm{Cov}(x_{t+h},x_t) = \mathbb{E}\!\left[ \left(\sum_{j=0}^{\infty} \phi^j w_{t+h-j}\right) \left(\sum_{k=0}^{\infty} \phi^k w_{t-k}\right) \right]\\ &\;\;\vdots \\ =& \frac{\sigma_w^2\,\phi^h}{1-\phi^2}, \qquad h \ge 0, \end{align} \] and by symmetry, \[ \gamma(-h)=\gamma(h). \]
\[ \rho(h) = \frac{\gamma(h)}{\gamma(0)}=\phi^h, \qquad h\ge 0, \qquad \rho(-h)=\rho(h). \]
When \(|\phi|<1\), the AR(1) process is weakly stationary!
\[ x_t = \phi x_{t-1} + w_t \]
Note: the AR equation describes an algebraic relation. Here, ``explosive’’ refers to failure of the implicit forward-time recursive construction. It does not mean that the equation itself admits no stationary solution.
\[ x_{t+1} = \phi x_t + w_{t+1}. \] Solving for \(x_t\) gives \[ x_t = \phi^{-1} x_{t+1} - \phi^{-1} w_{t+1}. \]
Iterating forward \(k\) steps yields \[ x_t = \phi^{-k} x_{t+k} - \sum_{j=1}^{k} \phi^{-j} w_{t+j}. \]
If \(|\phi|>1\), then \(|\phi^{-1}|<1\), so \[ \phi^{-k} x_{t+k} \to 0 \quad \text{in mean square as } k\to\infty, \] and we obtain the representation \[ x_t = - \sum_{j=1}^{\infty} \phi^{-j} w_{t+j}. \]
This representation depends on future shocks \(\{w_{t+1}, w_{t+2},\ldots\}\).
Hence, the explosive AR(1) process is noncausal.
A time series \(\{x_t\}\) is said to be causal if it can be written as \[ x_t = \sum_{j=0}^{\infty} \psi_j w_{t-j}, \] where \(\{w_t\}\) is white noise and the coefficients satisfy \[ \sum_{j=0}^{\infty} |\psi_j| < \infty. \]
The absolute summability condition ensures that the representation is well defined.
Note: this notion of causality has nothing to do with treatment effects or structural causality.
\[ x_t = \phi x_{t-1} + w_t, \qquad |\phi|>1, \; w_t \sim wn(0, \sigma^2_w) \]
Define the causal AR(1) process \[ y_t = \phi^{-1} y_{t-1} + v_t, \] where \[ v_t \sim wn(0,\sigma_w^2 \phi^{-2}). \]
The processes \(\{x_t\}\) and \(\{y_t\}\) have the same finite-dimensional distributions.
Example: If \[ x_t = 2x_{t-1} + w_t, \qquad \sigma_w^2 = 1, \] then the equivalent causal process is \[ y_t = \tfrac12 y_{t-1} + v_t, \qquad \sigma_v^2 = \tfrac14. \]
These two processes are observationally equivalent.
A moving average model of order \(q\), denoted MA(\(q\)), is defined by
\[ x_t = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}, \] where \(\{w_t\} \sim wn(0, \sigma^2_w)\) and \(\theta_1,\theta_2, \ldots, \theta_q\) are parameters.
We can equivalently write the MA(\(q\)) process as
\[ x_t = \theta(L) w_t, \] where \(\theta(L) = 1 + \theta_1 L + \cdots + \theta_q L^q\) is the moving average operator.
Unlike AR processes, MA(\(q\)) processes are stationary for all parameter values,
since \(x_t\) is a finite linear combination of white noise.
Consider the MA(1) model \[ x_t = w_t + \theta w_{t-1}, \qquad w_t \sim wn(0,\sigma_w^2). \]
The autocovariance function is \[ \gamma(h)=\mathrm{Cov}(x_{t+h},x_t) = \begin{cases} (1+\theta^2)\sigma_w^2, & h=0,\\[4pt] \theta\sigma_w^2, & h=1,\\[4pt] 0, & |h|>1. \end{cases} \]
Therefore the ACF is \[ \rho(h)=\frac{\gamma(h)}{\gamma(0)} = \begin{cases} \dfrac{\theta}{1+\theta^2}, & h=\pm 1,\\[8pt] 0, & |h|>1. \end{cases} \]
Note that \(|\rho(1)|\le \frac12\) for all \(\theta\) (maximum at \(|\theta|=1\)).
\[ x_t = w_t + \theta w_{t-1} \]
\[ \rho(h)=\frac{\gamma(h)}{\gamma(0)} = \begin{cases} \dfrac{\theta}{1+\theta^2}, & h=\pm 1,\\[8pt] 0, & |h|>1. \end{cases} \] is the same for \(\theta\) and \(1/\theta\).
A time series \(\{x_t\}\) is said to be invertible if the shocks \(\{w_t\}\) can be written as \[ w_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}, \] where the coefficients satisfy \[ \sum_{j=0}^{\infty} |\pi_j| < \infty. \]
Invertibility is an additional constraint imposed to select a unique moving-average representation, by requiring that the unobserved shocks can be recovered from current and past observations.
It is analogous to causality in AR models, which is imposed to rule out non-causal representations of the same stochastic process.
A time series \(\{x_t\}\) is said to follow an ARMA(\(p,q\)) process if it is (weakly) stationary and satisfies \[ x_t = \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}, \] where \(\{w_t\}\) is white noise with \(w_t \sim wn(0,\sigma_w^2)\).
Equivalently, \[ \phi(L)x_t = \theta(L) w_t. \]
\[ \eta(L)\phi(L)x_t = \eta(L)\theta(L) w_t, \] which implies the same stochastic process.
Example: Consider the white noise process \[ x_t = w_t. \]
Multiply both sides by \(\eta(L)=1-0.5L\): \[ (1-0.5L)x_t = (1-0.5L)w_t. \]
Rewriting, \[ x_t = 0.5x_{t-1} + w_t - 0.5w_{t-1}. \]
This looks like an ARMA(1,1) model!
Define the autoregressive (AR) polynomial as \[ \phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p, \qquad \phi_p \neq 0, \]
and the moving average (MA) polynomial as \[ \theta(z) = 1 + \theta_1 z + \theta_2 z^2 + \cdots + \theta_q z^q, \qquad \theta_q \neq 0, \]
where \(z\) is a complex number.
Even after ruling out common factors, ARMA representations are not unique without imposing two additional restrictions:
These restrictions translate into simple polynomial conditions:
\[ \phi(z) \neq 0 \quad \text{for all } |z|\le 1, \]
\[ \theta(z) \neq 0 \quad \text{for all } |z|\le 1. \]
When \[ \phi(z)\neq 0 \quad \text{for all } |z|\le 1, \] the inverse \(\phi(L)^{-1}\) exists, and the process admits a causal MA(\(\infty\)) representation: \[ x_t = \sum_{j=0}^{\infty} \psi_j w_{t-j}, \qquad \psi(z)=\sum_{j=0}^{\infty}\psi_j z^j=\frac{\theta(z)}{\phi(z)}. \]
Because the coefficients \(\{\psi_j\}\) are absolutely summable (as implied by the root condition), \(\{x_t\}\) is a linear filter of white noise and is therefore weakly stationary.
When \[ \theta(z)\neq 0 \quad \text{for all } |z|\le 1, \] the inverse \(\theta(L)^{-1}\) exists, and shocks can be recovered from past data: \[ w_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}, \qquad \pi(z)=\sum_{j=0}^{\infty}\pi_j z^j=\frac{\phi(z)}{\theta(z)}. \]
When both root conditions hold, the same stationary process admits both a MA(\(\infty\)) and an AR(\(\infty\)) representation.
We have shown that causal ARMA representations admit a one-sided MA(\(\infty\)) form. This turns out not to be a special feature of ARMA models, but a general property of stationary stochastic processes.
If \(\{x_t\}\) is weakly stationary and purely nondeterministic, then it admits a representation of the form \[ x_t = \sum_{j=0}^{\infty} \psi_j \, \varepsilon_{t-j}, \qquad \psi_0 = 1, \qquad \sum_{j=0}^{\infty} \psi_j^2 < \infty, \] where \(\{\varepsilon_t\}\) is a white-noise innovation sequence (mean \(0\), variance \(\sigma_\varepsilon^2\)).
Consider the MA(\(q\)) model \[ x_t = \theta(L) w_t = \sum_{j=0}^q \theta_j w_{t-j}, \qquad \theta_0 = 1. \]
Because \(x_t\) is a finite linear combination of white noise, the process is weakly stationary.
\[ \mathbb{E}(x_t) = \sum_{j=0}^q \theta_j \mathbb{E}(w_{t-j}) = 0. \]
For \(h \ge 0\), \[ \gamma(h) = \mathrm{Cov}(x_{t+h}, x_t) = \begin{cases} \sigma_w^2 \displaystyle\sum_{j=0}^{q-h} \theta_j \theta_{j+h}, & 0 \le h \le q, \\[10pt] 0, & h > q. \end{cases} \]
\[ \rho(h) = \frac{\gamma(h)}{\gamma(0)} = \begin{cases} \displaystyle \frac{\sum_{j=0}^{q-h} \theta_j \theta_{j+h}} {1 + \theta_1^2 + \cdots + \theta_q^2}, & 1 \le h \le q, \\[10pt] 0, & h > q. \end{cases} \]
The ACF cuts off after lag \(q\) — the defining signature of an MA(\(q\)) process.
Consider the AR(\(p\)) model \[ x_t = \phi_1 x_{t-1} + \cdots + \phi_p x_{t-p} + w_t, \qquad w_t \sim wn(0,\sigma_w^2). \]
Assume \(\{x_t\}\) is weakly stationary.
\[ \mathbb{E}(x_t) = 0. \]
Multiply the AR(\(p\)) equation by \(x_{t-h}\) and take expectations.
\[ \gamma(h) = \phi_1\gamma(h-1)+\phi_2\gamma(h-2)+\cdots+\phi_p\gamma(p-h). \]
\[ \gamma(0) = \phi_1\gamma(1)+\cdots+\phi_p\gamma(p)+\sigma_w^2. \]
These \(p+1\) equations determine \[ \gamma(0),\gamma(1),\ldots,\gamma(p). \]
Once \(\gamma(0),\ldots,\gamma(p)\) are known, all higher autocovariances follow from \[ \gamma(h) = \phi_1\gamma(h-1)+\cdots+\phi_p\gamma(h-p), \qquad h>p. \]
The ACF of an AR(\(p\)) process decays gradually and does not cut off.
Consider the ARMA(\(p,q\)) model \[ \phi(L)x_t = \theta(L) w_t, \qquad w_t \sim wn(0,\sigma_w^2), \] and assume \(\{x_t\}\) is weakly stationary.
Multiply the ARMA(\(p,q\)) equation by \(x_{t-h}\) and take expectations.
For \(h \ge 0\), \[ \gamma(h) = \phi_1 \gamma(h-1) + \cdots + \phi_p \gamma(h-p) + \sum_{j=0}^q \theta_j\,\mathrm{Cov}(w_{t-j}, x_{t-h}). \]
The MA terms contribute only when \(h \le q\), since \[ \mathrm{Cov}(w_{t-j},x_{t-h}) = 0 \quad \text{for } h>j. \]
Therefore, the values \[ \gamma(0),\gamma(1),\ldots,\gamma(\max(p,q)) \] must be determined by solving a finite system of equations.
Once these initial values are known, the MA terms vanish and the autocovariances satisfy the AR(\(p\)) recursion \[ \gamma(h)=\phi_1\gamma(h-1)+\cdots+\phi_p\gamma(h-p), \qquad h>\max(p,q). \]
MA terms determine short-run autocovariances; AR terms govern long-run decay of the ACF.
The autocorrelation function (ACF) \(\rho(h)\) measures the total dependence between \(x_t\) and \(x_{t-h}\). However, it does not distinguish between:
The partial autocorrelation function (PACF) is the sequence \[ \{\alpha(h)\}_{h\ge1}, \] where, for each lag \(h\), the partial autocorrelation \(\alpha(h)\) is defined as the correlation between the residuals obtained after linearly projecting \[ x_t \quad\text{and}\quad x_{t-h} \] on the intermediate lags \[ x_{t-1}, x_{t-2}, \ldots, x_{t-h+1}. \]
Equivalently, \(\alpha(h)\) is the coefficient on \(x_{t-h}\) in the linear projection \[ x_t = \beta_1 x_{t-1}+\cdots+\beta_{h-1}x_{t-h+1} +\alpha(h)\,x_{t-h} +u_t. \]
FYI: this is basically Frisch–Waugh–Lovell theorem.
Next unit: Estimation and Inference for Stationary Time Series