What Makes Time Series Special?
Xiamen University, Chow Institute
March, 2026
A time series is a sequence of observations indexed in time:
\[ x_1, x_2, \ldots, x_T \]
Examples:
Ordering in time matters!
Cross-Sectional Data
Time Series Data
Cross-sectional data consist of multiple units observed at one point in time
(e.g., households, firms, individuals, countries).
A key modeling assumption is that the data come from random variables:
\[ X_1, X_2, \ldots, X_n \quad \text{ i.i.d.} \]
Why i.i.d.?
These assumptions are reasonable when the dataset is a
collection of different units — reshuffling dataset changes nothing.
Time series track the same unit over time:
\[ x_1, x_2, \ldots, x_T \]
This breaks the usual cross-sectional assumptions:
Not independent:
past values typically influence future values
Not identically distributed:
levels, variance, or dynamics may evolve over time
Order matters:
reshuffling destroys patterns, dependence, trends, or cycles
Therefore, we need a more general probabilistic framework that accommodates ordered data and systematic dependence across time.
To model ordered data with possible dependence, we treat a time series as a realization of a stochastic process.
A stochastic process is a collection of random variables:
\[ \{ X_t : t \in \mathbb{T} \}, \] where \(\mathbb{T}\) is an ordered index set (e.g., time).
The observed series is one realization:
\[ x_t = X_t(\omega), \qquad t = 1, \ldots, T. \]
Modeling the data as a stochastic process allows us to describe the joint distribution of the sequence \(\{X_t\}\), and therefore to capture dependence across time.
The primary goal of time series analysis is to develop statistical models that provide plausible descriptions of data.
In practice, this means specifying a joint distribution for \((X_1, \ldots, X_T)\) through simple, structured models, such as:
White noise
(no dependence across time)
Moving average models
(dependence driven by past shocks)
Autoregressive models
(dependence driven by past values)
Random walks and trends
(persistent or nonstationary behavior)
Models with time-varying variance
(changing volatility over time)
These models serve as building blocks for more general specifications.
In principle, a time series is completely described by the joint distribution of ((X_1, , X_T)):
\[ F_{t_1,\dots,t_n}(c_1,\dots,c_n) = \Pr(X_{t_1} \le c_1,\dots,X_{t_n} \le c_n). \]
In practice, specifying and working with such joint distributions becomes increasingly difficult as the dimension (n) grows.
Instead, we look at the marginal distribution function: \[ F_t(x) = \Pr(X_t \le x). \] or the corresponding marginal density function: \[ f_t(x) = \frac{\partial F_t(x)}{\partial(x)} \]
Another informative marginal descriptive measure is the mean function, defined as \[ \mu_t = \mathbb{E}[X_t] = \int_{-\infty}^{\infty} x \, f_t(x)\, dx, \] provided it exists.
Temporal dependence is described by second-order moments. The autocovariance function, provided it exists, is defined as
\[ \gamma(s,t) = \operatorname{Cov}(X_s, X_t) = \mathbb{E}\!\left[(X_s-\mu_s)(X_t-\mu_t)\right]. \]
For \(s=t\), the autocovariance reduces to the variance: \[ \gamma(t,t) = \operatorname{Var}(X_t). \]
For linear combinations, if \(U=\sum_j^m a_j X_j\) and \(V=\sum_k^r b_k Y_k\), then \[ \operatorname{Cov}(U,V) = \sum_j^m \sum_k^r a_j b_k \operatorname{Cov}(X_j,Y_k). \]
\[ \mathbb{E}(w_t) = 0, \qquad \operatorname{Var}(w_t) = \sigma_w^2, \]
and
\[ \operatorname{Cov}(w_t, w_s) = 0 \quad \text{for } t \neq s. \]
White noise is the most basic building block in time series analysis
\[ x_t = \tfrac{1}{3}\big(w_{t-1} + w_t + w_{t+1}\big) \]
Mean: \[ \mu_t = \mathbb{E}(x_t) = 0. \]
Autocovariance?
\[ x_t = \delta t + \sum_{j=1}^{t} w_j \]
Mean: \[ \mu_t = \mathbb{E}(x_t) = \delta t. \]
Autocovariance: \[ \gamma(s,t) = \operatorname{Cov}(x_s,x_t) = \sigma_w^2 \min(s,t). \]
\[ x_t = s_t + w_t, \qquad s_t \text{ deterministic}. \]
Mean: \[ \mu_t = \mathbb{E}(x_t) = s_t. \]
Autocovariance: \[ \gamma(s,t) = \operatorname{Cov}(x_s,x_t) = \sigma_w^2 \mathbf{1}\{s=t\}. \]
As in classical statistics, it is more convenient to work with scaled measures of association between \(-1\) and \(1\).
The autocorrelation function (ACF) is defined as \[ \rho(s,t) = \frac{\gamma(s,t)} {\sqrt{\gamma(s,s)\,\gamma(t,t)}}, \] provided the variances are finite.
For two series \(\{X_t\}\) and \(\{Y_t\}\), the cross-covariance function is \[ \gamma_{XY}(s,t) = \operatorname{Cov}(X_s,Y_t) = \mathbb{E}\!\left[(X_s-\mu_{Xs})(Y_t-\mu_{Yt})\right], \] with corresponding cross-correlation \[ \rho_{XY}(s,t) = \frac{\gamma_{XY}(s,t)} {\sqrt{\gamma_X(s,s)\,\gamma_Y(t,t)}}. \]
For a multivariate time series \((X_{1t},\ldots,X_{rt})\), \[ \gamma_{jk}(s,t) = \operatorname{Cov}(X_{js},X_{kt}), \qquad j,k=1,\ldots,r. \]
In general, these quantities may depend on both \(s\) and \(t\). When they depend only on the separation \(|t-s|\), this leads to the notion of stationarity.
A time series \(\{X_t\}\) is strictly stationary if, for all \(t_1,\ldots,t_n\) and all time shifts \(h\),
\[ (X_{t_1},\ldots,X_{t_n}) \overset{d}{=} (X_{t_1+h},\ldots,X_{t_n+h}), \]
that is, all finite-dimensional distributions are invariant to time shifts.
\[ \mu_s=\mu_t=\mu. \]
\[ \gamma(s,t)=\gamma(s+h,t+h) \quad \text{for all } s,t,h. \] Hence,
\[ \gamma(s,t)=\gamma(t-s). \]
A time series \(\{X_t\}\) is weakly stationary if
\[ \mathbb{E}(X_t) = \mu; \]
\[ \gamma(s,t) = \gamma(t-s). \]
Under weak stationarity, the autocovariance function can be written as
\[ \gamma(h) = \operatorname{Cov}(X_{t+h},X_t) = \mathbb{E}\!\left[(X_{t+h}-\mu)(X_t-\mu)\right]. \]
The corresponding autocorrelation function (ACF) is
\[ \rho(h) = \frac{\gamma(t+h,t)} {\sqrt{\gamma(t+h,t+h)\,\gamma(t,t)}} = \frac{\gamma(h)}{\gamma(0)}. \]
Exercise (Concept Check). Show that the autocorrelation function satisfies \[ -1 \le \rho(h) \le 1 \quad \text{for all } h. \]
Hint: By Cauchy–Schwarz inequality,
\[ \big|\mathbb{E}[UV]\big| \;\le\; \sqrt{\mathbb{E}[U^2]\mathbb{E}[V^2]}, \]
\[ \mathbb{E}(w_t)=0, \qquad \operatorname{Var}(w_t)=\sigma_w^2, \qquad \operatorname{Cov}(w_t,w_s)=0 \ (t\neq s). \]
Moving average: \(x_t = \tfrac{1}{3}\big(w_{t-1} + w_t + w_{t+1}\big)\)
\[ \mu_t = 0. \]
\[ \gamma_v(h)= \begin{cases} \tfrac{3}{9}\sigma_w^2, & h=0,\\ \tfrac{2}{9}\sigma_w^2, & h=\pm1,\\ \tfrac{1}{9}\sigma_w^2, & h=\pm2,\\ 0, & |h|>2. \end{cases} \]
Mean: \[ \mu_{x,t} = \alpha + \beta t + \mu_{y} \]
Autocovariance: \[ \gamma_x(h)=\gamma_y(h). \]
Let \(\{X_t\}\) and \(\{Y_t\}\) be two time series.
They are said to be jointly (weakly) stationary if both \(\{X_t\}\) and \(\{Y_t\}\) are weakly stationary, and the cross-covariance
\[ \gamma_{XY}(h) = \operatorname{Cov}(X_{t+h},Y_t) = \mathbb{E}\!\left[(X_{t+h}-\mu_X)(Y_t-\mu_Y)\right]. \] depends only on the lag \(h\).
The corresponding cross-correlation function (CCF) is \[ \rho_{XY}(h) = \frac{\gamma_{XY}(h)} {\sqrt{\gamma_X(0)\,\gamma_Y(0)}}. \]
Exercise. Show that \(\rho_{XY}(h) = \rho_{YX}(-h)\)
Consider two jointly stationary series \(\{x_t\}\) and \(\{y_t\}\) related by \[ y_t = A x_{t-\ell} + w_t, \] where \(w_t\) is noise uncorrelated with \(\{x_t\}\).
The cross-covariance satisfies \[ \begin{aligned} \gamma_{yx}(h) &= \operatorname{Cov}(y_{t+h}, x_t) \\ &= \operatorname{Cov}(A x_{t+h-\ell} + w_{t+h}, x_t) \\ &= A\,\gamma_x(h-\ell). \end{aligned} \]
Hence, \(\gamma_{yx}(h)\) peaks at \(h=\ell\) (Why?)
A time series \(\{x_t\}\) is called a linear process if it can be written as
\[ x_t = \mu + \sum_{j=-\infty}^{\infty} \psi_j \, w_{t-j}, \quad \sum_{j=-\infty}^{\infty} |\psi_j| < \infty. \] where \(\{w_t\}\) is white noise.
Its autocovariance function is given by
\[ \gamma_x(h) = \sigma^2_\omega \sum_{j=-\infty}^{\infty} \psi_{j+h} \psi_j \] for \(h\geq 0\).
For the existence of finite second moments, it is sufficient to assume \(\sum_{j=-\infty}^{\infty} \psi_j^2 < \infty\).
A process \(\{x_t\}\) is called a Gaussian process if, for every collection of distinct time points \(t_1, t_2, \ldots, t_n\) and every \(n\), the random vector \[ (x_{t_1}, x_{t_2}, \ldots, x_{t_n}) \] has a multivariate normal distribution.
For a Gaussian process, the joint distribution is completely characterized by the mean function \(\mu = (\mu_{t_1}, \mu_{t_2}, \ldots, \mu_{t_n})\) and the covariance matrix \(\Gamma = \{\gamma(t_i,t_j); i,j = 1, \ldots, n\}\).
The multivariate normal density function can be written as
\[ f(x) = (2\pi)^{-n/2} |\Gamma|^{-1/2} \exp\!\left\{ -\tfrac12 (x-\mu)' \Gamma^{-1} (x-\mu) \right\}. \]
In time series analysis, we observe a single realization
\[ x_{t_1}, x_{t_2}, \ldots, x_{t_n}, \] of a stochastic process \[ \{X_t : t \in \mathbb{T}\}. \]
Hence, we impose stationarity!
Under stationarity, \(\mu_t = \mu\) is constant and we can estimate it by the sample mean
\[ \bar{x} = \frac{1}{n} \sum_{t=1}^{n} x_t. \]
The variance of sample mean is given by
\[ \begin{aligned} \operatorname{Var}(\bar{x}) &= \operatorname{Var}\!\left(\frac{1}{n}\sum_{t=1}^n x_t\right) \\ &= \frac{1}{n^2}\operatorname{Cov}\!\left(\sum_{t=1}^n x_t,\ \sum_{s=1}^n x_s\right) \\ &= \frac{1}{n^2} \Big( n\gamma_x(0) + (n-1)\gamma_x(1) + (n-2)\gamma_x(2) + \cdots + \gamma_x(n-1) \\ &\qquad\quad + (n-1)\gamma_x(-1) + (n-2)\gamma_x(-2) + \cdots + \gamma_x(1-n) \Big) \\ &= \frac{1}{n}\sum_{h=-n}^{n} \left(1-\frac{|h|}{n}\right)\gamma_x(h). \end{aligned} \]
The sample autocovariance function is defined as
\[ \hat{\gamma}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}-\bar{x})(x_t-\bar{x}), \qquad h=0,1,\ldots,n-1, \]
with \[ \hat{\gamma}(-h)=\hat{\gamma}(h). \]
Note. The estimator \(\hat{\gamma}(h)\) is not unbiased for \(\gamma(h)\) in finite samples. However, it has an important structural property: non-negative definiteness
\[ \operatorname{Var}\!\left(\sum_{t=1}^n a_t X_t\right) = \sum_{j=1}^n \sum_{k=1}^n a_j a_k \,\gamma(j-k) \;\ge\;0. \]
The sample analogue replaces \(\gamma(\cdot)\) by \(\hat\gamma(\cdot)\) and defines
\[ Q_n(a) \;\equiv\; \sum_{j=1}^n \sum_{k=1}^n a_j a_k \,\hat{\gamma}(j-k) = a^\top \hat{\Gamma} a, \qquad \hat{\Gamma}=\big(\hat{\gamma}(j-k)\big)_{j,k}. \]
With the \(1/n\) normalization above,
\[ Q_n(a) = \frac{1}{n}\sum_{s=1}^n \left( \sum_{t=1}^n a_t \,(x_{t+s}-\bar x) \right)^2 \;\ge\;0, \]
so \(\hat{\Gamma}\) is non-negative definite.
This guarantee may fail if \(\hat{\gamma}(h)\) is defined using \(1/(n-h)\) instead of \(1/n\).
The sample autocorrelation function is defined as
\[ \hat{\rho}(h)= \frac{\hat \gamma(h)}{\hat \gamma(0)} \]
Even when a time series has no autocorrelation, its sample ACF will generally not be zero.
For a white noise process \(\{x_t\}\), and for any fixed lag \(h>0\),
\[ \hat\rho_x(h) \;\approx\; \mathrm N\!\left(0,\; \frac{1}{n}\right) \qquad \text{for large } n. \]
As a practical rule of thumb,
\[ \Pr\!\left(\,|\hat\rho(h)| \le \frac{2}{\sqrt{n}}\,\right) \;\approx\; 0.95. \]
The dashed lines in the ACF plot correspond to \[ \pm \frac{2}{\sqrt{n}}. \]
Sample autocorrelations outside these bounds are unlikely to arise from white noise.
Let \(\{x_t\}\) and \(\{y_t\}\) be two observed time series with sample means \(\bar x\) and \(\bar y\).
The sample cross-covariance function is defined as
\[ \hat\gamma_{xy}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}-\bar x)(y_t-\bar y), \qquad h = 0,1,\ldots,n-1. \]
For negative lags, \[ \hat\gamma_{xy}(-h) = \hat\gamma_{yx}(h). \]
The sample cross-correlation function is
\[ \hat\rho_{xy}(h) = \frac{\hat\gamma_{xy}(h)} {\sqrt{\hat\gamma_x(0)\,\hat\gamma_y(0)}}. \]
Under white noise benchmark, \(\hat \rho_{xy}\) is normal with mean zero and
\[ \sigma_{\hat \rho_{xy}} = \frac{1}{\sqrt{n}} \] if at least one of the processes is independent white noise.
Null hypothesis
\[ H_0:\quad \gamma_{xy}(h) = 0 \quad \text{for all } h \]
Alternative hypothesis
\[ H_1:\quad \gamma_{xy}(h) \neq 0 \quad \text{for some } h \]
The dashed lines \(\pm 2/\sqrt{n}\) provide a valid reference only if at least one of the series is independent white noise.
Since neither series here is white noise, this benchmark does not apply.
The pparent peaks in the sample CCF may reflect spurious dependence induced by serial correlation within each series.
\[ x_t = 2\cos\left(\frac{2\pi t}{12}\right) + \varepsilon_t, \qquad y_t = 2\cos\left(\frac{2\pi (t+5)}{12}\right) + \eta_t, \]
Prewhitening: remove predictable serial structure from one series. Here, the seasonal component of \(y_t\), i.e. regress on \(cos(2 \pi t / 12)\) and \(sin(2 \pi t / 12)\).
In practice, the serial dependence structure of the data is unknown.
Prewhitening requires specifying and estimating a model to remove this dependence.
Model choice is data-dependent and introduces additional uncertainty that is not accounted for by standard cross-correlation benchmarks, invalidating standard CCF-based inference.
Takeaway: cross-correlation analysis is exploratory, and results should be interpreted with a grain of salt.
In many applications, we are interested in the relationships between multiple time series.
Instead of a scalar process \(\{X_t\}\), we consider a vector-valued process \[ \mathbf{X}_t = \begin{pmatrix} X_{1t} \\ X_{2t} \\ \vdots \\ X_{rt} \end{pmatrix}, \qquad t \in \mathbb{T}. \]
Each component \(X_{jt}\) is itself a time series, and dependence may arise
The mean vector is \[ \boldsymbol{\mu}=\mathbb{E}(\mathbf{X}_t) = \begin{pmatrix} \mathbb{E}(X_{1t}) \\ \mathbb{E}(X_{2t}) \\ \vdots \\ \mathbb{E}(X_{rt}) \end{pmatrix}. \]
The cross-covariance matrix function is \[ \boldsymbol{\Gamma}(h) = \mathbb{E}\!\left[(\mathbf{X}_{t+h}-\boldsymbol{\mu})(\mathbf{X}_t-\boldsymbol{\mu})'\right]. \]
Its \((i,j)\) entry is \[ \gamma_{ij}(h)=\mathbb{E}\!\left[(\mathbf{X}_{t+h,i}-\boldsymbol{\mu}_i)(\mathbf{X}_{t,j}-\boldsymbol{\mu}_j)\right]. \]
Because \(\gamma_{ij}(h)=\gamma_{ji}(-h)\), the matrix satisfies \[ \boldsymbol{\Gamma}(-h)=\boldsymbol{\Gamma}(h)'. \]
Let \(\mathbf{x}_t = (x_{1t},\ldots,x_{rt})'\) be an observed \(r\)-dimensional time series with sample mean
\[ \bar{\mathbf{x}} = \frac{1}{n}\sum_{t=1}^n \mathbf{x}_t . \]
The sample autocovariance matrix at lag \(h\) is defined as
\[ \widehat{\boldsymbol{\Gamma}}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (\mathbf{x}_{t+h}-\bar{\mathbf{x}}) (\mathbf{x}_t-\bar{\mathbf{x}})' , \qquad h = 0,1,\ldots,n-1. \]
For negative lags, \[ \widehat{\boldsymbol{\Gamma}}(-h) = \widehat{\boldsymbol{\Gamma}}(h)' . \]
The \((i,j)\) entry of \(\widehat{\boldsymbol{\Gamma}}(h)\) is the sample cross-covariance between \(x_{i,t+h}\) and \(x_{j,t}\).
So far, we have studied time series: a stochastic process indexed by time, with scalar or vector values.
More generally, a stochastic process can be indexed by multiple dimensions.
Example: Soil surface temperature measured on a spatial grid.
Let \(x_{s_1,s_2}\) denote soil surface temperature at grid location \((s_1,s_2)\).
This defines a spatial stochastic process
\[ \{ x_s : s \in \mathbb{Z}^2 \}. \]
The multidimensional autocovariance function is defined as \[ \gamma(\mathbf{h}) = \mathbb{E}\!\left[ (x_{\mathbf{s}+\mathbf{h}}-\mu)(x_{\mathbf{s}}-\mu) \right], \] where \[ \mathbf{h} = (h_1, h_2, \ldots, h_r) \] is a vector of lags in each dimension.
For a two-dimensional process (rows and columns), \[ \gamma(h_1, h_2) = \mathbb{E}\!\left[ (x_{s_1+h_1,\, s_2+h_2}-\mu) (x_{s_1,\, s_2}-\mu) \right]. \]
The autocovariance depends on spatial displacement, not absolute location.
The sample autocovariance function is
\[ \hat{\gamma}(h_1,h_2) = \frac{1}{S_1 S_2} \sum_{s_1=1}^{S_1-h_1} \sum_{s_2=1}^{S_2-h_2} (x_{s_1+h_1,\;s_2+h_2}-\bar{x}) (x_{s_1,\;s_2}-\bar{x}), \]
where the sample mean is
\[ \bar{x} = \frac{1}{S_1 S_2} \sum_{s_1=1}^{S_1} \sum_{s_2=1}^{S_2} x_{s_1,s_2}. \]
The sample autocorrelation function is
\[ \hat{\rho}(h_1,h_2) = \frac{\hat{\gamma}(h_1,h_2)}{\hat{\gamma}(0,0)}. \]
\[m_t = \sum_{j=-k}^{k} a_j x_{t-j},\] where \(a_j = a_{-j}\) and \(\sum_{j=-k}^{k} a_j = 1\) is a symmetric moving average of the data.
\[ m_t = \sum_{i=1}^{n} w_i(t)\, x_i, \quad w_i(t) = K\!\left(\frac{t - i}{b}\right) \Big/ \sum_{j=1}^{n} K\!\left(\frac{t - j}{b}\right), \] where \(K(\cdot)\) is kernel function, typically the normal kernel \[ K(z) = (2\pi)^{-1/2}\,\exp\!\left(- z^2/2 \right). \]
We smooth a scatterplot of mortality \((M_t)\) as a function of temperature \((T_t)\) using LOWESS.