Introduction to Time Series

What Makes Time Series Special?

Natasha Kang

Xiamen University, Chow Institute

March, 2026

What Is a Time Series?

A time series is a sequence of observations indexed in time:

\[ x_1, x_2, \ldots, x_T \]

Examples:

Monthly airline passengers
Atmospheric CO₂
Quarterly GDP
Daily stock returns

Ordering in time matters!

Time Series vs Cross-Sectional Data

Cross-Sectional Data

Time Series Data

Cross-Sectional Data: Reminder

Cross-sectional data consist of multiple units observed at one point in time
(e.g., households, firms, individuals, countries).

A key modeling assumption is that the data come from random variables:

\[ X_1, X_2, \ldots, X_n \quad \text{ i.i.d.} \]

Why i.i.d.?

Independence: one unit’s outcome does not affect another’s
Identical distribution: all units share the same distribution

These assumptions are reasonable when the dataset is a
collection of different units — reshuffling dataset changes nothing.

Why i.i.d. Fails for Time Series

Time series track the same unit over time:

\[ x_1, x_2, \ldots, x_T \]

This breaks the usual cross-sectional assumptions:

Not independent:
past values typically influence future values
Not identically distributed:
levels, variance, or dynamics may evolve over time
Order matters:
reshuffling destroys patterns, dependence, trends, or cycles

Therefore, we need a more general probabilistic framework that accommodates ordered data and systematic dependence across time.

Stochastic Processes

To model ordered data with possible dependence, we treat a time series as a realization of a stochastic process.

A stochastic process is a collection of random variables:

\[ \{ X_t : t \in \mathbb{T} \}, \] where \(\mathbb{T}\) is an ordered index set (e.g., time).

The observed series is one realization:

\[ x_t = X_t(\omega), \qquad t = 1, \ldots, T. \]

Modeling the data as a stochastic process allows us to describe the joint distribution of the sequence \(\{X_t\}\), and therefore to capture dependence across time.

Nature of Time Series Data

Each series exhibits different features: trend, seasonality, changing variance, or abrupt events.

From Data to a Statistical Model

The primary goal of time series analysis is to develop statistical models that provide plausible descriptions of data.

In practice, this means specifying a joint distribution for \((X_1, \ldots, X_T)\) through simple, structured models, such as:

White noise
(no dependence across time)
Moving average models
(dependence driven by past shocks)
Autoregressive models
(dependence driven by past values)
Random walks and trends
(persistent or nonstationary behavior)
Models with time-varying variance
(changing volatility over time)

These models serve as building blocks for more general specifications.

Measures of Dependence

All of these models are specified through their moment structure. So we begin by studying how to measure dependence.

In principle, a time series is completely described by the joint distribution of \((X_1, \ldots, X_T)\):

\[ F_{t_1,\dots,t_n}(c_1,\dots,c_n) = \Pr(X_{t_1} \le c_1,\dots,X_{t_n} \le c_n). \]

In practice, specifying and working with such joint distributions becomes increasingly difficult as the dimension \(n\) grows.

Instead, we summarize the process through its first and second moments — means, variances, and covariances.

We start with the marginal distribution function: \[ F_t(x) = \Pr(X_t \le x). \] or the corresponding marginal density function: \[ f_t(x) = \frac{\partial F_t(x)}{\partial x} \]

Mean Function

The first summary measure is the mean function, defined as \[ \mu_t = \mathbb{E}[X_t] = \int_{-\infty}^{\infty} x \, f_t(x)\, dx, \] provided it exists.

Autocovariance Function

Temporal dependence is described by second-order moments. The autocovariance function, provided it exists, is defined as

\[ \gamma(s,t) = \operatorname{Cov}(X_s, X_t) = \mathbb{E}\!\left[(X_s-\mu_s)(X_t-\mu_t)\right]. \]

For \(s=t\), the autocovariance reduces to the variance: \[ \gamma(t,t) = \operatorname{Var}(X_t). \]

For linear combinations, if \(U=\sum_j^m a_j X_j\) and \(V=\sum_k^r b_k Y_k\), then \[ \operatorname{Cov}(U,V) = \sum_j^m \sum_k^r a_j b_k \operatorname{Cov}(X_j,Y_k). \]

Examples

A time series \(\{w_t\}\) is called white noise if

\[ \mathbb{E}(w_t) = 0, \qquad \operatorname{Var}(w_t) = \sigma_w^2, \]

and

\[ \operatorname{Cov}(w_t, w_s) = 0 \quad \text{for } t \neq s. \]

White noise is the most basic building block in time series analysis

Moving Average Series

\[ x_t = \tfrac{1}{3}\big(w_{t-1} + w_t + w_{t+1}\big) \]

Mean: \[ \mu_t = \mathbb{E}(x_t) = 0. \]

Autocovariance?

Examples (cont.)

Random Walk with Drift

\[ x_t = \delta t + \sum_{j=1}^{t} w_j \]

Mean: \[ \mu_t = \mathbb{E}(x_t) = \delta t. \]

Autocovariance: \[ \gamma(s,t) = \operatorname{Cov}(x_s,x_t) = \sigma_w^2 \min(s,t). \]

Examples (cont.)

Signal Plus Noise

\[ x_t = s_t + w_t, \qquad s_t \text{ deterministic}. \]

Mean: \[ \mu_t = \mathbb{E}(x_t) = s_t. \]

Autocovariance: \[ \gamma(s,t) = \operatorname{Cov}(x_s,x_t) = \sigma_w^2 \mathbf{1}\{s=t\}. \]

Autocorrelation Function

As in classical statistics, it is more convenient to work with scaled measures of association between \(-1\) and \(1\).

The autocorrelation function (ACF) is defined as \[ \rho(s,t) = \frac{\gamma(s,t)} {\sqrt{\gamma(s,s)\,\gamma(t,t)}}, \] provided the variances are finite.

Cross-Covariance and Cross-Correlation

For two series \(\{X_t\}\) and \(\{Y_t\}\), the cross-covariance function is \[ \gamma_{XY}(s,t) = \operatorname{Cov}(X_s,Y_t) = \mathbb{E}\!\left[(X_s-\mu_{Xs})(Y_t-\mu_{Yt})\right], \] with corresponding cross-correlation \[ \rho_{XY}(s,t) = \frac{\gamma_{XY}(s,t)} {\sqrt{\gamma_X(s,s)\,\gamma_Y(t,t)}}. \]

Cross-Covariance and Cross-Correlation (cont.)

For a multivariate time series \((X_{1t},\ldots,X_{rt})\), \[ \gamma_{jk}(s,t) = \operatorname{Cov}(X_{js},X_{kt}), \qquad j,k=1,\ldots,r. \]

In general, these quantities may depend on both \(s\) and \(t\). When they depend only on the separation \(t-s\), this leads to the notion of stationarity.

Stationary Time Series

A time series \(\{X_t\}\) is strictly stationary if, for all \(t_1,\ldots,t_n\) and all time shifts \(h\),

\[ (X_{t_1},\ldots,X_{t_n}) \overset{d}{=} (X_{t_1+h},\ldots,X_{t_n+h}), \]

that is, all finite-dimensional distributions are invariant to time shifts.

Implications of Strict Stationarity

If the mean function exists, strict stationarity implies

\[ \mu_s=\mu_t=\mu. \]

If the variance exists, the autocovariance satisfies

\[ \gamma(s,t)=\gamma(s+h,t+h) \quad \text{for all } s,t,h. \] Hence,

\[ \gamma(s,t)=\gamma(t-s). \]

Stationary Time Series

A time series \(\{X_t\}\) is weakly stationary if

the mean is constant:

\[ \mathbb{E}(X_t) = \mu; \]

the autocovariance depends only on the lag:

\[ \gamma(s,t) = \gamma(t-s). \]

Under weak stationarity, the autocovariance function can be written as

\[ \gamma(h) = \operatorname{Cov}(X_{t+h},X_t) = \mathbb{E}\!\left[(X_{t+h}-\mu)(X_t-\mu)\right]. \]

Stationary Time Series (cont.)

The corresponding autocorrelation function (ACF) is

\[ \rho(h) = \frac{\gamma(t+h,t)} {\sqrt{\gamma(t+h,t+h)\,\gamma(t,t)}} = \frac{\gamma(h)}{\gamma(0)}. \]

Properties of the ACF

Exercise (Concept Check). Show that the autocorrelation function satisfies \[ -1 \le \rho(h) \le 1 \quad \text{for all } h. \]

Hint: By Cauchy–Schwarz inequality,

\[ \big|\mathbb{E}[UV]\big| \;\le\; \sqrt{\mathbb{E}[U^2]\mathbb{E}[V^2]}, \]

Examples of Stationary Processes

White noise: \(\{w_t\}\)

\[ \mathbb{E}(w_t)=0, \qquad \operatorname{Var}(w_t)=\sigma_w^2, \qquad \operatorname{Cov}(w_t,w_s)=0 \ (t\neq s). \]

Examples of Stationary Processes (cont.)

Moving average: \(x_t = \tfrac{1}{3}\big(w_{t-1} + w_t + w_{t+1}\big)\)
- Mean: \(\mu_t = 0.\)
- Autocovariance:

\[ \gamma_x(h)= \begin{cases} \tfrac{3}{9}\sigma_w^2, & h=0,\\ \tfrac{2}{9}\sigma_w^2, & h=\pm1,\\ \tfrac{1}{9}\sigma_w^2, & h=\pm2,\\ 0, & |h|>2. \end{cases} \]

Non-Stationary Examples

Random walk with drift (not stationary — why?): \(x_t = \delta t + \sum_{j=1}^t w_j\)

Trend stationary: \(x_t = \alpha + \beta t + y_t\), \(y_t\) is weakly stationary.
- Mean: \[ \mu_{x,t} = \alpha + \beta t + \mu_{y} \]
- Autocovariance: \[ \gamma_x(h)=\gamma_y(h). \]

Jointly Stationary Time Series

Let \(\{X_t\}\) and \(\{Y_t\}\) be two time series.

They are said to be jointly (weakly) stationary if both \(\{X_t\}\) and \(\{Y_t\}\) are weakly stationary, and the cross-covariance

\[ \gamma_{XY}(h) = \operatorname{Cov}(X_{t+h},Y_t) = \mathbb{E}\!\left[(X_{t+h}-\mu_X)(Y_t-\mu_Y)\right]. \] depends only on the lag \(h\).

The corresponding cross-correlation function (CCF) is \[ \rho_{XY}(h) = \frac{\gamma_{XY}(h)} {\sqrt{\gamma_X(0)\,\gamma_Y(0)}}. \]

Exercise. Show that \(\rho_{XY}(h) = \rho_{YX}(-h)\)

Prediction Using Cross-Correlation

Consider two jointly stationary series \(\{x_t\}\) and \(\{y_t\}\) related by \[ y_t = A x_{t-\ell} + w_t, \] where \(w_t\) is noise uncorrelated with \(\{x_t\}\).

If \(\ell > 0\), then \(x_t\) leads \(y_t\);
if \(\ell < 0\), then \(x_t\) lags \(y_t\).

The cross-covariance satisfies \[ \begin{aligned} \gamma_{yx}(h) &= \operatorname{Cov}(y_{t+h}, x_t) \\ &= \operatorname{Cov}(A x_{t+h-\ell} + w_{t+h}, x_t) \\ &= A\,\gamma_x(h-\ell). \end{aligned} \]

Hence, \(\gamma_{yx}(h)\) peaks at \(h=\ell\) (Why?)

Prediction Using Cross-Correlation (cont.)

Linear Processes

A time series \(\{x_t\}\) is called a linear process if it can be written as

\[ x_t = \mu + \sum_{j=-\infty}^{\infty} \psi_j \, w_{t-j}, \quad \sum_{j=-\infty}^{\infty} |\psi_j| < \infty. \] where \(\{w_t\}\) is white noise.

Its autocovariance function is given by

\[ \gamma_x(h) = \sigma^2_w \sum_{j=-\infty}^{\infty} \psi_{j+h} \psi_j \] for \(h\geq 0\).

For the existence of finite second moments, it is sufficient to assume \(\sum_{j=-\infty}^{\infty} \psi_j^2 < \infty\).

Gaussian Processes

A process \(\{x_t\}\) is called a Gaussian process if, for every collection of distinct time points \(t_1, t_2, \ldots, t_n\) and every \(n\), the random vector \[ (x_{t_1}, x_{t_2}, \ldots, x_{t_n}) \] has a multivariate normal distribution.

For a Gaussian process, the joint distribution is completely characterized by the mean function \(\mu = (\mu_{t_1}, \mu_{t_2}, \ldots, \mu_{t_n})\) and the covariance matrix \(\Gamma = \{\gamma(t_i,t_j); i,j = 1, \ldots, n\}\).

Gaussian Processes (cont.)

The multivariate normal density function can be written as

\[ f(x) = (2\pi)^{-n/2} |\Gamma|^{-1/2} \exp\!\left\{ -\tfrac12 (x-\mu)' \Gamma^{-1} (x-\mu) \right\}. \]

If \(x_t = \mu + \sum_{j=-\infty}^{\infty} \psi_j w_{t-j}\) is a linear process with Gaussian white noise \(w_t \sim \mathcal{N}(0,\sigma_w^2)\), then \({x_t}\) is a linear Gaussian process.

From Population Quantities to Sample Estimates

In time series analysis, we observe a single realization

\[ x_{t_1}, x_{t_2}, \ldots, x_{t_n}, \] of a stochastic process \[ \{X_t : t \in \mathbb{T}\}. \]

The collection \(\{X_t\}\) consists of distinct random variables indexed by time, and possible dependence across time.

From a classical statistics perspective, estimation is difficult because we do not observe independent replications of the process.

Without additional structure, even basic quantities such as the mean \(\mu_t\) may vary over time and are not meaningful targets of estimation.

Hence, we impose stationarity (and ergodicity, so that time averages converge to population moments)!

Moments Estimation

Under stationarity, \(\mu_t = \mu\) is constant and we can estimate it by the sample mean

\[ \bar{x} = \frac{1}{n} \sum_{t=1}^{n} x_t. \]

The variance of sample mean is given by

\[ \begin{aligned} \operatorname{Var}(\bar{x}) &= \operatorname{Var}\!\left(\frac{1}{n}\sum_{t=1}^n x_t\right) \\ &= \frac{1}{n^2}\operatorname{Cov}\!\left(\sum_{t=1}^n x_t,\ \sum_{s=1}^n x_s\right) \\ &= \frac{1}{n^2} \Big( n\gamma_x(0) + (n-1)\gamma_x(1) + (n-2)\gamma_x(2) + \cdots + \gamma_x(n-1) \\ &\qquad\quad + (n-1)\gamma_x(-1) + (n-2)\gamma_x(-2) + \cdots + \gamma_x(1-n) \Big) \\ &= \frac{1}{n}\sum_{h=-n}^{n} \left(1-\frac{|h|}{n}\right)\gamma_x(h). \end{aligned} \]

Moments Estimation

The sample autocovariance function is defined as

\[ \hat{\gamma}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}-\bar{x})(x_t-\bar{x}), \qquad h=0,1,\ldots,n-1, \]

with \[ \hat{\gamma}(-h)=\hat{\gamma}(h). \]

Note. The estimator \(\hat{\gamma}(h)\) is not unbiased for \(\gamma(h)\) in finite samples.

The sample autocorrelation function is defined as

\[ \hat{\rho}(h)= \frac{\hat \gamma(h)}{\hat \gamma(0)} \]

Non-Negative Definiteness of the Sample Autocovariance

The sample autocovariance has an important structural property: non-negative definiteness.

For any constants \(a_1,\ldots,a_n\), the population quadratic form satisfies

\[ \operatorname{Var}\!\left(\sum_{t=1}^n a_t X_t\right) = \sum_{j=1}^n \sum_{k=1}^n a_j a_k \,\gamma(j-k) \;\ge\;0. \]

Non-Negative Definiteness (cont.)

The sample analogue replaces \(\gamma(\cdot)\) by \(\hat\gamma(\cdot)\) and defines

\[ Q_n(a) \;\equiv\; \sum_{j=1}^n \sum_{k=1}^n a_j a_k \,\hat{\gamma}(j-k) = a^\top \hat{\Gamma} a, \qquad \hat{\Gamma}=\big(\hat{\gamma}(j-k)\big)_{j,k}. \]

With the \(1/n\) normalization above,

\[ Q_n(a) = \frac{1}{n}\sum_{s=1}^n \left( \sum_{t=1}^n a_t \,(x_{t+s}-\bar x) \right)^2 \;\ge\;0, \]

so \(\hat{\Gamma}\) is non-negative definite.

This guarantee may fail if \(\hat{\gamma}(h)\) is defined using \(1/(n-h)\) instead of \(1/n\).

Sample ACF Under No Dependence

Even when a time series has no autocorrelation, its sample ACF will generally not be zero.

Sampling Variability of the ACF (White Noise)

For a white noise process \(\{x_t\}\), and for any fixed lag \(h>0\),

\[ \hat\rho_x(h) \;\approx\; \mathrm N\!\left(0,\; \frac{1}{n}\right) \qquad \text{for large } n. \]

As a practical rule of thumb,

\[ \Pr\!\left(\,|\hat\rho(h)| \le \frac{2}{\sqrt{n}}\,\right) \;\approx\; 0.95. \]

The dashed lines in the ACF plot correspond to \[ \pm \frac{2}{\sqrt{n}}. \]

Sample autocorrelations outside these bounds are unlikely to arise from white noise.

Example: ACF of a Speech Signal

Sample Cross-Covariance and Cross-Correlation

Let \(\{x_t\}\) and \(\{y_t\}\) be two observed time series with sample means \(\bar x\) and \(\bar y\).

The sample cross-covariance function is defined as

\[ \hat\gamma_{xy}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}-\bar x)(y_t-\bar y), \qquad h = 0,1,\ldots,n-1. \]

For negative lags, \[ \hat\gamma_{xy}(-h) = \hat\gamma_{yx}(h). \]

The sample cross-correlation function is

\[ \hat\rho_{xy}(h) = \frac{\hat\gamma_{xy}(h)} {\sqrt{\hat\gamma_x(0)\,\hat\gamma_y(0)}}. \]

Sample Cross-Correlation: Inference

Under white noise benchmark, \(\hat \rho_{xy}\) is normal with mean zero and

\[ \sigma_{\hat \rho_{xy}} = \frac{1}{\sqrt{n}} \] if at least one of the processes is independent white noise.

Example: SOI and Recruitment

Example: SOI and Recruitment (cont.)

SOI: a climate index that summarizes the state of the El Niño–La Niña cycle, which strongly affects ocean conditions and marine ecosystems.
Recruitment: number of young fish that survive to enter the population
Negative lags: SOI leads Recruitment

Can we interpret the sample CCF as evidence of dependence?

Not Really…

Null hypothesis

\[ H_0:\quad \gamma_{xy}(h) = 0 \quad \text{for all } h \]

Alternative hypothesis

\[ H_1:\quad \gamma_{xy}(h) \neq 0 \quad \text{for some } h \]

The dashed lines \(\pm 2/\sqrt{n}\) provide a valid reference only if at least one of the series is independent white noise.

Since neither series here is white noise, this benchmark does not apply.

The apparent peaks in the sample CCF may reflect spurious dependence induced by serial correlation within each series.

Spurious Cross-Correlation and Prewhitening

Simulate two independent series:

\[ x_t = 2\cos\left(\frac{2\pi t}{12}\right) + \varepsilon_t, \qquad y_t = 2\cos\left(\frac{2\pi (t+5)}{12}\right) + \eta_t, \]

Spurious Cross-Correlation and Prewhitening

Prewhitening

Prewhitening: remove predictable serial structure from one series. Here, the seasonal component of \(y_t\), i.e. regress on \(\cos(2 \pi t / 12)\) and \(\sin(2 \pi t / 12)\).

In practice, the serial dependence structure of the data is unknown.
Prewhitening requires specifying and estimating a model to remove this dependence.
Model choice is data-dependent and introduces additional uncertainty that is not accounted for by standard cross-correlation benchmarks, invalidating standard CCF-based inference.

Takeaway: cross-correlation analysis is exploratory, and results should be interpreted with a grain of salt.

Vector-Valued Series

In many applications, we are interested in the relationships between multiple time series.

Instead of a scalar process \(\{X_t\}\), we consider a vector-valued process \[ \mathbf{X}_t = \begin{pmatrix} X_{1t} \\ X_{2t} \\ \vdots \\ X_{rt} \end{pmatrix}, \qquad t \in \mathbb{T}. \]

Each component \(X_{jt}\) is itself a time series, and dependence may arise

over time (within each series),
across components (between series).

Mean Vector and Covariance Structure

The mean vector is \[ \boldsymbol{\mu}=\mathbb{E}(\mathbf{X}_t) = \begin{pmatrix} \mathbb{E}(X_{1t}) \\ \mathbb{E}(X_{2t}) \\ \vdots \\ \mathbb{E}(X_{rt}) \end{pmatrix}. \]

The cross-covariance matrix function is \[ \boldsymbol{\Gamma}(h) = \mathbb{E}\!\left[(\mathbf{X}_{t+h}-\boldsymbol{\mu})(\mathbf{X}_t-\boldsymbol{\mu})'\right]. \]

Mean Vector and Covariance Structure (cont.)

The \((i,j)\) entry of \(\boldsymbol{\Gamma}(h)\) is \[ \gamma_{ij}(h)=\mathbb{E}\!\left[(X_{i,t+h}-\mu_i)(X_{j,t}-\mu_j)\right]. \]

Because \(\gamma_{ij}(h)=\gamma_{ji}(-h)\), the matrix satisfies \[ \boldsymbol{\Gamma}(-h)=\boldsymbol{\Gamma}(h)'. \]

Sample Covariance Structure

Let \(\mathbf{x}_t = (x_{1t},\ldots,x_{rt})'\) be an observed \(r\)-dimensional time series with sample mean

\[ \bar{\mathbf{x}} = \frac{1}{n}\sum_{t=1}^n \mathbf{x}_t . \]

The sample autocovariance matrix at lag \(h\) is defined as

\[ \widehat{\boldsymbol{\Gamma}}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (\mathbf{x}_{t+h}-\bar{\mathbf{x}}) (\mathbf{x}_t-\bar{\mathbf{x}})' , \qquad h = 0,1,\ldots,n-1. \]

For negative lags, \[ \widehat{\boldsymbol{\Gamma}}(-h) = \widehat{\boldsymbol{\Gamma}}(h)' . \]

The \((i,j)\) entry of \(\widehat{\boldsymbol{\Gamma}}(h)\) is the sample cross-covariance between \(x_{i,t+h}\) and \(x_{j,t}\).

A Brief Digression: Multidimensional Series

So far, we have studied time series: a stochastic process indexed by time, with scalar or vector values.
More generally, a stochastic process can be indexed by multiple dimensions.

Example: Soil Surface Temperature

Let \(x_{s_1,s_2}\) denote soil surface temperature at grid location \((s_1,s_2)\).

Index: spatial location \((s_1,s_2)\)
Value: temperature (scalar)

This defines a spatial stochastic process

\[ \{ x_s : s \in \mathbb{Z}^2 \}. \]

Soil Surface Temperatures

Multidimensional Autocovariance

The multidimensional autocovariance function is defined as \[ \gamma(\mathbf{h}) = \mathbb{E}\!\left[ (x_{\mathbf{s}+\mathbf{h}}-\mu)(x_{\mathbf{s}}-\mu) \right], \] where \(\mathbf{h} = (h_1, h_2, \ldots, h_r)\) is a vector of lags in each dimension.

Multidimensional Autocovariance (cont.)

For a two-dimensional process (rows and columns), \[ \gamma(h_1, h_2) = \mathbb{E}\!\left[ (x_{s_1+h_1,\, s_2+h_2}-\mu) (x_{s_1,\, s_2}-\mu) \right]. \]

The autocovariance depends on spatial displacement, not absolute location.

Sample Multidimensional Autocovariance

The sample autocovariance function is

\[ \hat{\gamma}(h_1,h_2) = \frac{1}{S_1 S_2} \sum_{s_1=1}^{S_1-h_1} \sum_{s_2=1}^{S_2-h_2} (x_{s_1+h_1,\;s_2+h_2}-\bar{x}) (x_{s_1,\;s_2}-\bar{x}), \]

where \(\bar{x} = \frac{1}{S_1 S_2}\sum_{s_1}\sum_{s_2} x_{s_1,s_2}\).

The sample autocorrelation function is \(\hat{\rho}(h_1,h_2) = \hat{\gamma}(h_1,h_2) / \hat{\gamma}(0,0)\).

Sample ACF of Soil Temperature

Smoothing in Time Series: Moving Average

\[m_t = \sum_{j=-k}^{k} a_j x_{t-j},\] where \(a_j = a_{-j}\) and \(\sum_{j=-k}^{k} a_j = 1\) is a symmetric moving average of the data.

Moving Average Smoother: SOI

Smoothing in Time Series: Kernel Smoothing

\[ m_t = \sum_{i=1}^{n} w_i(t)\, x_i, \quad w_i(t) = K\!\left(\frac{t - i}{b}\right) \Big/ \sum_{j=1}^{n} K\!\left(\frac{t - j}{b}\right), \] where \(K(\cdot)\) is a kernel function, typically \(K(z) = (2\pi)^{-1/2}\exp(- z^2/2)\).

Kernel Smoother: SOI

Smoothing in Time Series: LOWESS

LOWESS (LOcally WEighted Scatterplot Smoothing): a local-regression method that uses only \(\{x_{t-k/2}, \ldots, x_t, \ldots, x_{t+k/2}\}\) to predict \(x_t\), and then sets \(m_t = \hat x_t\).

Smoothing One Series as a Function of Another

We smooth a scatterplot of mortality \((M_t)\) as a function of temperature \((T_t)\) using LOWESS.

Summary

Time series data are ordered and dependent — i.i.d. methods fail
A stochastic process provides the probabilistic framework
Stationarity (weak) reduces the problem: constant mean, autocovariance depends only on lag
Key tools: ACF, CCF, and their sample analogues
CCF interpretation requires care — prewhitening guards against spurious cross-correlation
Smoothing (MA, kernel, LOWESS) extracts trends from noisy data

Next: ARMA Models — parameterizing dependence through autoregressive and moving average structures.