Cointegration and Error Correction Models

Natasha Kang

Xiamen University, Chow Institute

March, 2026

Can we remove a stochastic trend by detrending?

Consider \[ y_t = y_{t-1} + \alpha_0 + \epsilon_t, \] where \(\epsilon_t\) is stationary.


This can be written equivalently as \[ y_t = y_0 + \alpha_0 t + \sum_{i=1}^t \epsilon_i . \]


Subtracting the deterministic component \(y_0 + \alpha_0 t\) removes the linear drift, but the accumulated shocks \(\sum_{i=1}^t \epsilon_i\) remain. Therefore, detrending cannot remove a stochastic trend.

Choosing the Right Transformation

The appropriate transformation depends on the nature of the trend.

  • If the trend is deterministic:
    • detrending is sufficient
    • differencing is inappropriate and leads to over-differencing
  • If the trend is stochastic:
    • detrending fails
    • differencing may be required to restore stationarity


In practice, the nature of the trend is unknown.

This motivates the need for tools that help distinguish deterministic from stochastic trends — namely, unit root testing.

A Critique of Differencing

Routine differencing changes the object of analysis by removing low-frequency variation that is often central to macroeconomic questions.


While differencing restores stationarity, it eliminates low-frequency components of the data.

As a result:

  • permanent movements in levels are removed
  • long-run comovement between variables is no longer visible
  • relationships in levels cannot be identified


After differencing, regression coefficients describe how changes in the dependent variable are related to changes (or levels) of the regressors, not how their levels move together over time.


This concern is emphasized in Christopher A. Sims’ critique of routine differencing and pre-testing in macroeconometric practice.

What If We Do Not Difference?

Consider a regression using variables in levels:

\[ y_t = \beta x_t + u_t, \] where \(x_t\) and \(y_t\) are nonstationary.


For example, consider the following two time series observed in levels.



Table: OLS regression in levels

|Term        | Estimate| Std.Error| t.stat| p.value|
|:-----------|--------:|---------:|------:|-------:|
|(Intercept) |  -190.03|      7.69| -24.70|  <1e-04|
|co2         |     0.60|      0.02|  25.33|  <1e-04|

Can We Trust Statistical Association in Levels?

In the late 19th and early 20th centuries, statistics was primarily used to describe and compare social conditions over time.


Governments, churches, and public institutions began collecting long time series on:

  • church attendance, as a measure of religiosity and moral behavior
  • drunkenness or alcohol consumption, as indicators of social disorder
  • poverty, population, and mortality, as measures of social welfare


As these long time series became available, researchers naturally began using correlation to assess whether different indicators moved together over time.


Correlation was often interpreted as evidence consistent with a causal relationship, and used to inform policy or intervention.

These associations were often:

  • large in magnitude
  • stable over time
  • regarded as strong and non-negligible by the statistical standards of the time


The difficulty was not that any single relationship was obviously absurd, but that almost everything appeared related to everything else.


Yule (1926) was among the first to articulate this problem clearly.


He noted that when time series exhibit strong persistence, large correlations can arise mechanically, even when the underlying series are unrelated.


Yule referred to this phenomenon as “nonsense correlation.”

Spurious Regression

As regression methods became standard in applied work (particularly from the 1960s onward), researchers increasingly studied relationships between time series in levels.


A typical specification was:

\[ y_t = \beta x_t + u_t. \]


Regression was often viewed as more informative than correlation, and results were frequently interpreted at face value.


Granger and Newbold (1974) showed that even when \(x_t\) and \(y_t\) are unrelated, regressions in levels can exhibit:

  • large \(t\)-statistics
  • high \(R^2\)
  • persistent residuals


This phenomenon became known as spurious regression.

Illustration: Independent Persistent Series



|            | Estimate| Std. Error| t value| p-value|
|:-----------|--------:|----------:|-------:|-------:|
|(Intercept) |   -2.790|      0.347|   -8.03|       0|
|x           |   -0.762|      0.039|  -19.44|       0|

R-squared

Regression Residuals

Nonstationary Residuals

The problem of spurious regression lies in the residual \(\{\hat{\varepsilon}_t\}\) (ignoring the intercept):

\[ \hat{\varepsilon}_t = y_t - \hat{\beta} x_t . \]


Suppose the variables follow random walks: \[ y_t = \sum_{i=1}^t v_i, \qquad x_t = \sum_{i=1}^t u_i, \] where \(\{u_i\}\) and \(\{v_i\}\) are independent, mean-zero innovations.


Then the residual behaves like \[ \hat{\varepsilon}_t \;\approx\; \sum_{i=1}^t v_i \;-\; \hat{\beta} \sum_{i=1}^t u_i . \]


  • The residual process is nonstationary (contains a unit root component).
  • This violates the assumptions underlying standard OLS inference.

Illustration: Same Data, After Differencing



Table: Regression in first differences

|            | Estimate| Std. Error| t value| Pr(>&#124;t&#124;)|
|:-----------|--------:|----------:|-------:|------------------:|
|(Intercept) |    0.063|      0.069|   0.908|              0.365|
|diff(x)     |    0.044|      0.071|   0.616|              0.539|

When Is a Levels Regression Not Spurious?

So far, we have seen that:

  • regressions in levels with nonstationary data are unreliable
  • differencing removes the problem, but also removes long-run information


There is, however, an important exception.


If a linear combination of nonstationary variables is stationary, then a regression in levels can be meaningful.

Cointegration

Suppose \(x_t\) and \(y_t\) are both nonstationary.


They are said to be cointegrated if there exists a coefficient \(\beta\) such that

\[ y_t - \beta x_t \]

is stationary.


  • the variables may wander individually
  • but they do not drift arbitrarily far apart
  • deviations from the long-run relationship are stable

Cointegration: Economic Motivation

Before the statistical consequences of nonstationarity were widely understood, empirical macroeconomics routinely analyzed key variables in levels.


By the late 1970s, this practice faced a tension:

  • many macroeconomic time series appeared to be nonstationary
  • differencing restored statistical validity
  • but economic analysis often focused on long-run relationships in levels


Examples of relationships historically analyzed in levels include:

  • consumption and income, studied as long-run co-moving aggregates
  • money supply and the price level, central to monetary equilibrium analysis
  • nominal interest rates at different maturities, where yield spreads are often stable
  • exchange rates and relative prices, motivated by long-run parity conditions

Consumption and Income (Permanent Income Hypothesis)

The permanent income hypothesis (Friedman) states that households choose consumption based on long-run (permanent) income, not on short-run income fluctuations.


Income is decomposed as \[ y_t = y_t^{p} + y_t^{tr}, \] where:

  • \(y_t^{p}\) is permanent income
  • \(y_t^{tr}\) is transitory income

Consumption is decomposed analogously: \[ c_t = c_t^{p} + c_t^{tr}. \]


The key behavioral assumption of PIH is: \[ c_t^{p} = \beta \, y_t^{p}, \] with transitory consumption \(c_t^{tr}\) assumed to be stationary.


This implies \[ c_t = \beta y_t^{p} + c_t^{tr}, \] so that \(c_t\) and \(y_t^{p}\) may be nonstationary, but deviations from their long-run relationship are stable.

Monetary Equilibrium (Money, Prices, and Income)

In the long run, the money market clears: \[ \text{money supply} = \text{money demand}. \]


The behavioral side of this equilibrium is given by the liquidity theory of money demand: \[ \frac{M_t}{P_t} = L(Y_t, i_t), \] where real money balances demanded depend on real income and interest rates.


In logs, the long-run equilibrium condition can be written as \[ m_t - p_t = \beta_0 + \beta_1 y_t + \beta_2 i_t + u_t. \]


If money supply, prices, and income are \(I(1)\), monetary equilibrium requires the disequilibrium term \[ u_t = (m_t - p_t) - \beta_1 y_t - \beta_2 i_t - \beta_0 \] to be stationary.

This implies cointegration among money, prices, and income.

Term Structure of Interest Rates

Let \(r_t^{(n)}\) denote the nominal interest rate on an \(n\)-period bond and \(r_t^{(1)}\) the one-period (short-term) rate.


Term structure theory implies that long and short rates are linked by a stable long-run relationship. In particular, the yield spread \[ s_t^{(n)} \equiv r_t^{(n)} - r_t^{(1)} \] reflects expectations of future short rates and term premia.


Empirically, individual interest rates are often highly persistent and are reasonably characterized as \(I(1)\) processes.


If term structure theory is correct in the long run, the spread \(s_t^{(n)}\) should be stable, implying \[ r_t^{(n)} - r_t^{(1)} \sim I(0). \]

Thus, \(r_t^{(n)}\) and \(r_t^{(1)}\) are cointegrated, with the yield spread as the equilibrium error.

Purchasing Power Parity (PPP)

Purchasing power parity posits a long-run relationship between nominal exchange rates and relative price levels.

In levels, absolute PPP implies \[ S_t = \frac{P_t}{P_t^*}, \] where \(S_t\) is the nominal exchange rate (domestic price of foreign currency).


Taking logs, \[ s_t = p_t - p_t^*. \]

Empirically, exchange rates and price levels are often highly persistent and are reasonably characterized as \(I(1)\) processes.


If PPP holds as a long-run equilibrium condition, the real exchange rate \[ q_t \equiv s_t - (p_t - p_t^*) \] should be stable, implying \[ q_t \sim I(0). \]

Thus, \(s_t\), \(p_t\), and \(p_t^*\) are cointegrated, with the real exchange rate as the equilibrium error.

Cointegration (Engle and Granger, 1987)

Let \(x_t = (x_{1t}, \ldots, x_{kt})'\) be a vector of time series.

The components of \(x_t\) are said to be cointegrated of order \((d,b)\), denoted \(x_t \sim CI(d,b)\), if:


  1. Each component of \(x_t\) is integrated of order \(d\).

  2. There exists a nonzero vector \(\beta \in \mathbb{R}^k\) such that \[ \beta' x_t \sim I(d-b), \qquad b>0. \]


The vector \(\beta\) is called a cointegrating vector. In most macroeconomic applications, we focus on the case \[ CI(1,1), \] where individual series are \(I(1)\) but the equilibrium error \(\beta' x_t\) is stationary.

Non-Uniqueness and Normalization of Cointegrating Vectors

Suppose \(x_t \sim CI(d,b)\) and there exists a cointegrating vector \(\beta \neq 0\) such that \[ \beta' x_t \sim I(d-b). \]


For any nonzero scalar \(c \neq 0\), \[ (c\beta)' x_t = c(\beta' x_t) \sim I(d-b). \]


Cointegrating vectors are not unique. Only the cointegrating space is uniquely defined.


To express the long-run restriction in a convenient form, a normalization is imposed.

A common normalization sets one coefficient equal to one. For example, if \[ \beta_1 y_t + \beta_2 x_t \sim I(0), \] we may normalize on \(y_t\): \[ y_t - \theta x_t \sim I(0), \qquad \theta = -\beta_2 / \beta_1. \]

Different normalizations represent the same equilibrium condition.

Cointegrating Rank

Let \(x_t\) be a \(k \times 1\) vector of time series, with each component integrated of order \(d\).


Suppose there exist \(r\) linearly independent vectors \[ \beta_1, \ldots, \beta_r \] such that \[ \beta_i' x_t \sim I(d-b), \qquad i = 1,\ldots,r. \]

Then \(x_t\) is said to have cointegrating rank \(r\).


When \(r = 1\), the cointegrating vector is unique up to scale (normalization).

When \(r > 1\), there are multiple linearly independent cointegrating relationships.


  • \(r = 0\): no cointegration
  • \(r = 1\): a single long-run equilibrium relationship
  • \(1 < r < k\): multiple long-run equilibrium restrictions
  • \(r = k\): all variables are stationary

Cointegrating Rank: Interpretation

Cointegrating rank is the number of linearly independent stationary relations among a set of nonstationary variables.


Rank can exceed one when multiple long-run relations hold simultaneously in the same system.


For example, in a monetary system:

  • a long-run money demand relation implies \[ m_t - \beta_0 - \beta_1 p_t - \beta_2 y_t - \beta_3 r_t \sim I(0) \]

  • a monetary policy feedback rule, where the central bank adjusts nominal money supply in response to nominal GDP, implies \[ m_t - \gamma_0 + \gamma_1 (y_t + p_t) \sim I(0) \]


Writing \[ x_t = (m_t,\; 1,\; p_t,\; y_t,\; r_t)', \] we follow the standard convention of augmenting the stochastic variables with a constant so that intercepts are included in the cointegrating relations. Cointegration itself concerns the stochastic variables \((m_t,p_t,y_t,r_t)\).


These relations correspond to the cointegrating vectors

\[ \beta^{(1)} = (1 ,\; -\beta_0,\; -\beta_1,\; -\beta_2,\; -\beta_3)', \qquad \beta^{(2)} = (1,\; -\gamma_0,\; \gamma_1,\; \gamma_1,\; 0)'. \]

Since the vectors are linearly independent, the cointegrating rank is \(r=2\).

Cointegration and Error Correction Models

Cointegration is a restriction on long-run behavior.

It implies that while variables may drift over time, they cannot drift arbitrarily far apart.

Equivalently, deviations from the long-run relation must be temporary.


For example, let \[ z_{t-1} \equiv y_{t-1} - \beta x_{t-1} \] denote the equilibrium error.

If \(\{z_t\}\) is stationary, then it is mean reverting: when the system is out of equilibrium, adjustment must occur to restore the long-run relation.


Since adjustment cannot occur in levels, it must occur through changes in the variables.

A dynamically coherent specification is therefore \[ \begin{aligned} \Delta y_t &= \alpha_y \, z_{t-1} + \varepsilon_{y,t}, \\ \Delta x_t &= \alpha_x \, z_{t-1} + \varepsilon_{x,t}, \end{aligned} \] which is called an error correction model (ECM).


  • \(z_{t-1}\) measures the extent of disequilibrium
  • \(\alpha_y\) and \(\alpha_x\) are speed-of-adjustment parameters, describing the direction and strength of adjustment
  • If \(\alpha_y = 0\), then \(y_t\) does not respond to disequilibrium. All error correction occurs through \(x_t\). In this case, \(y_t\) is said to be weakly exogenous: it does not participate in error correction.

Error Correction Model: General Form

In practice, short-run dynamics may involve lags, intercepts, and additional covariates.

A general single-equation ECM can be written as \[ \Delta y_t = \alpha\, z_{t-1} + c + \sum_{i=1}^p \phi_i \, \Delta y_{t-i} + \sum_{j=0}^q \psi_j \, \Delta x_{t-j} + \varepsilon_t, \] where \[ z_{t-1} = y_{t-1} - \beta x_{t-1}. \]


  • \(z_{t-1}\) captures long-run disequilibrium
  • differenced terms capture short-run dynamics
  • the constant allows for deterministic drift
  • the ECM combines long-run restrictions with short-run flexibility

Implementing an ECM in Practice (Engle–Granger)

An error correction model is typically implemented in two steps.


Step 1. Estimate the long-run relationship

Estimate the cointegrating relation in levels: \[ y_t = \beta x_t + u_t. \]

Obtain the estimated equilibrium error: \[ \hat z_t = y_t - \hat\beta x_t. \]


Step 2. Estimate short-run dynamics

Estimate the ECM using differenced data: \[ \Delta y_t = \alpha \hat z_{t-1} + \sum_{i=1}^p \phi_i \Delta y_{t-i} + \sum_{j=0}^q \psi_j \Delta x_{t-j} + \varepsilon_t. \]

Cointegration Test: Engle–Granger

Spurious regression arises because the regression residual is nonstationary.


Cointegration reverses this logic.


Suppose we estimate the levels regression \[ y_t = \beta x_t + u_t, \] where \(x_t\) and \(y_t\) are nonstationary.

Define the residual error \[ \hat u_t \equiv y_t - \hat\beta x_t. \]


  • If \(\hat u_t\) is nonstationary, the regression is spurious
  • If \(\hat u_t\) is stationary, \(x_t\) and \(y_t\) are cointegrated

The Engle–Granger test implements this idea by testing whether \(\hat u_t\) contains a unit root.


Because \(\hat u_t\) is constructed using an estimated coefficient, the test uses critical values different from the usual Dickey–Fuller case.

A Limitation of the Engle–Granger Approach

Cointegration is a symmetric property:
either \(x_t\) and \(y_t\) share a stationary linear combination, or they do not.

The Engle–Granger procedure is asymmetric by construction because it relies on a single OLS projection.


Consider two independent \(I(1)\) processes: \[ x_t = x_{t-1} + \eta_t, \qquad y_t = y_{t-1} + \xi_t, \] with \(\{\eta_t\}\) and \(\{\xi_t\}\) i.i.d., mean zero, and uncorrelated.

Equivalently, \[ x_t = \sum_{i=1}^t \eta_i, \qquad y_t = \sum_{i=1}^t \xi_i. \]

Estimate the levels regression \[ y_t = \beta x_t + \varepsilon_t \] by OLS.

OLS chooses \(\hat\beta\) to minimize \[ \sum_{t=1}^T (y_t - \beta x_t)^2, \] that is, it projects the entire path of \(y_t\) onto the path of \(x_t\).

The resulting residual is \[ \hat\varepsilon_t = y_t - \hat\beta x_t = \sum_{i=1}^t \xi_i - \hat\beta \sum_{i=1}^t \eta_i = \sum_{i=1}^t (\xi_i - \hat\beta \eta_i). \]


If instead we reverse the roles and estimate \[ x_t = \gamma y_t + \nu_t, \] OLS now projects the path of \(x_t\) onto the path of \(y_t\), producing \[ \hat\nu_t = \sum_{i=1}^t (\eta_i - \hat\gamma \xi_i), \] which is a different stochastic process.

Why test outcomes may differ

  • Under no cointegration, any residual constructed from nonstationary regressors is nonstationary in population.

  • The Engle–Granger test is a finite-sample procedure: it evaluates whether the estimated residual appears sufficiently mean-reverting.

  • Different OLS projections absorb different amounts of low-frequency variation, so residuals can exhibit different degrees of persistence in finite samples.

  • As a result, unit root tests applied to these residuals can yield different outcomes, even though the underlying variables are not cointegrated.


Key point

  • Cointegration itself is a symmetric property
  • The Engle–Granger procedure is asymmetric because it relies on a single OLS projection
  • Hence, the test outcome can depend on which variable is treated as dependent