Structural VARs and Identification

Natasha Kang

Xiamen University, Chow Institute

May, 2026

Where We Are

In Unit 7 we built reduced-form VARs:

\[ x_t = c + A_1 x_{t-1} + \cdots + A_p x_{t-p} + u_t, \qquad \mathbb{E}(u_t u_t') = \Sigma_u . \]

These describe the joint dynamics of \(x_t\) — how variables co-move and how the system propagates over time.

This unit asks a different kind of question:

What happens to GDP if the central bank surprises markets with a 25 bp hike today?

Answering it requires an extra layer of structure — identification — that turns reduced-form estimates into causal ones.

Two Questions a VAR Can Answer …

Question 1 — Forecasting. Given everything we know up to today, what is our best guess for next quarter?

\[ \hat x_{t+h\mid t} = \mathbb{E}(x_{t+h} \mid x_t, x_{t-1}, \ldots). \]

The reduced form is sufficient. We never have to ask why variables co-move — only that they do.

Question 2 — Granger causality. Does the past of \(X\) help predict \(Y\) beyond what the past of \(Y\) alone does? In a VAR, a joint \(F\)-test on the lag coefficients of \(X\) in the equation for \(Y\).

The name is misleading: this is about predictability, not causation. A barometer’s reading Granger-causes the weather, but turning its dial does not bring rain.

… And One It Can’t Yet

Question 3 — Causal counterfactual.

If the central bank surprises markets with a 25 bp hike today, by how much does output fall over the next two years?

This is a question about an intervention, not a forecast.

  • Forecasting: “Given the data we observe today, what do we expect output to look like next year?”
  • Intervention: “If the central bank surprises markets with a 25 bp hike today — not as a response to inflation or output, but as a move on its own — how does output respond?”

These are different objects. The data — even an infinite amount of it — do not tell us the second from the first without extra assumptions.

The “surprise” part of the rate change — what moves on its own, not as a response — is what we’ll call a shock.

What is a “Shock,” Really?

A shock captures a primitive cause:

  • moves on its own
  • not a response to other variables in the model

Equivalently, a shock is a surprise — the part of a variable unpredictable from the information set of the agents in the model.

Two shocks are, by construction, mutually uncorrelated. If two candidates co-move systematically, neither is primitive — something behind both is the real cause.

Terminology. Mutually uncorrelated shocks are commonly called orthogonal shocks in the SVAR literature.

Analogy. A sound mixing board has independent input knobs (bass, treble, vocals). What you hear is a mixture of all knobs at once. The speaker output is observable; the knob settings are what we want to recover.

Reduced-Form Innovations Are Mixtures

The reduced-form innovation \(u_t\) is defined mechanically:

\[ u_t = x_t - \mathbb{E}(x_t \mid x_{t-1}, x_{t-2}, \ldots). \]

It is “today’s \(x_t\) minus its forecast from the past” — itself a surprise, but relative to the econometrician’s information set: the history of the observed series. Nothing about that definition makes the components of \(u_t\) causally distinct.

The relationship to structural shocks is

\[ u_t = B^{-1} \varepsilon_t . \]

Each component \(u_{it}\) is a linear mixture of the underlying primitive shocks \(\varepsilon_t\).

Why That Matters: A Concrete Example

Imagine a tiny macro VAR with two variables: inflation \(\pi_t\) and the policy rate \(i_t\).

Suppose, within a quarter:

  • the central bank reacts to inflation: \(i_t\) moves when \(\pi_t\) moves
  • inflation does not react to the policy rate within the quarter (price stickiness)

Two structural shocks drive the system:

  • \(\varepsilon_{\pi t}\): an inflation shock (e.g., an oil price spike)
  • \(\varepsilon_{it}\): a monetary surprise — the primitive policy shock

The reduced-form residuals can be written in terms of these:

\[ \begin{aligned} u_{\pi t} &= \varepsilon_{\pi t} & &\text{(no within-quarter response of $\pi$ to $i$)} \\ u_{i t} &= \varepsilon_{i t} + \alpha\,\varepsilon_{\pi t} & &\text{(policy responds to inflation, coef. }\alpha\text{)} \end{aligned} \]

So \(u_{it}\) is not the monetary surprise — it is the surprise plus the systematic policy response to the inflation shock.

What’s at Stake

Three questions about the \((\pi_t, i_t)\) system that only the structural form can answer.

1. Policy. What is the effect of a hypothetical policy action?

2. Theory testing. Which transmission mechanism is operating?

3. Counterfactuals. How much of observed inflation came from which source?

1. Policy — The Effect of a Hypothetical Hike

If the central bank surprises with a 25 bp rate hike today — not in response to anything happening in the economy — what happens to inflation over the next two years?

The object we want. The IRF of \(\pi\) to a unit \(\varepsilon_{it}\) shock, horizon by horizon.

Why \(u_{it}\) won’t do. Recall \(u_{it} = \varepsilon_{it} + \alpha\,\varepsilon_{\pi t}\). A “unit shock to \(u_{it}\)” mixes a true policy surprise with the Fed’s mechanical response to an inflation shock. These have different effects on \(\pi\) — averaging them answers no well-defined question.

Where this shows up. Every published monetary-policy IRF — from Sims (1980) to CEE (1999) to Gertler–Karadi (2015) — is the response of inflation (or output) to \(\varepsilon_{it}\), after some identification has separated it from \(\alpha\,\varepsilon_{\pi t}\).

2. Theory Testing — Which Channel Is Operating?

Through what mechanism does a rate hike actually move inflation?

Two stories.

  • Demand channel. Higher rates cool spending; with sticky prices, this feeds into inflation only with a lag — no impact response, then a gradual decline peaking at several quarters.
  • Cost channel. Higher rates raise firms’ financing costs within the period; marginal cost rises, so inflation rises on impact, then falls back as demand effects take over.

The IRF as referee. The two stories disagree about the impact response of \(\pi\) to \(\varepsilon_{it}\):

  • demand-only: about zero on impact, negative at medium horizons
  • cost channel present: positive on impact, negative later

The shape of the IRF at short horizons distinguishes them.

3. Counterfactuals — Decomposing History

Of the inflation observed in 2021–2024, how much came from the Fed’s hiking cycle, and how much from everything else?

The object we want. Write each observed \(\pi_t\) as a sum of contributions from the two structural shocks:

\[ \pi_t \;=\; \underbrace{\sum_{j\ge 0}\psi^{(\pi i)}_j\,\varepsilon_{i,t-j}}_{\text{policy contribution}} \;+\; \underbrace{\sum_{j\ge 0}\psi^{(\pi\pi)}_j\,\varepsilon_{\pi,t-j}}_{\text{everything else}} . \]

The counterfactual “no policy shocks” zeros the first sum and recomputes \(\pi_t\) from the second alone.

Where this shows up. Historical decompositions of inflation, output gaps, and exchange rates are staples of central-bank reports and post-mortems on specific episodes (e.g., Bernanke & Blanchard’s 2023 analysis of post-pandemic inflation).

Recall: The Symmetric Dynamic System

All three questions reduce to one problem: how do we recover the structural shocks \(\varepsilon_t\) from the reduced-form residuals \(u_t\)?

Start from the bivariate structural system of Unit 7:

\[ \begin{aligned} y_t &= b_{10} - b_{12} z_t + \gamma_{11} y_{t-1} + \gamma_{12} z_{t-1} + \varepsilon_{yt}, \\ z_t &= b_{20} - b_{21} y_t + \gamma_{21} y_{t-1} + \gamma_{22} z_{t-1} + \varepsilon_{zt}. \end{aligned} \]

The \(\varepsilon_t\) are mutually uncorrelated structural shocks with diagonal covariance \(\Omega_\varepsilon\).

Stacked:

\[ B x_t = \Gamma_0 + \Gamma_1 x_{t-1} + \varepsilon_t, \qquad B = \begin{bmatrix} 1 & b_{12}\\ b_{21} & 1 \end{bmatrix}. \]

\(B\) encodes contemporaneous interactions between variables.

Reduced Form: What We Actually Estimate

Premultiply by \(B^{-1}\):

\[ x_t = B^{-1}\Gamma_0 + B^{-1}\Gamma_1 x_{t-1} + B^{-1}\varepsilon_t . \]

Define \(c = B^{-1}\Gamma_0\), \(A_1 = B^{-1}\Gamma_1\), \(u_t = B^{-1}\varepsilon_t\):

\[ x_t = c + A_1 x_{t-1} + u_t . \]

This is the form we estimate by OLS, equation by equation.

We get:

  • \(\widehat A_1\), the reduced-form dynamics
  • \(\widehat\Sigma_u\), the residual covariance

Neither \(B\) nor \(\Omega_\varepsilon\) appears separately.

Identification: Bivariate Counting

In the bivariate system from Unit 7:

  • \(B = \begin{bmatrix} 1 & b_{12}\\ b_{21} & 1 \end{bmatrix}\) has 2 unknowns
  • \(\Omega_\varepsilon = \mathrm{diag}(\sigma_y^2,\sigma_z^2)\) has 2 unknowns

So 4 structural unknowns.

The reduced form delivers \(\Sigma_u\), a symmetric \(2\times 2\) matrix with 3 distinct moments.

One additional restriction is needed to pin down \(B\). From outside the data — economic theory or institutional timing.

The Identification Problem, in General

The structural parameters split into two pieces:

  • Dynamics \(\Gamma_0, \ldots, \Gamma_p\) — recovered for free once \(B\) is known, via \(\Gamma_j = B A_j\).
  • Contemporaneous block \((B, \Omega_\varepsilon)\) — what identification has to pin down.

The data constrain the latter only through

\[ \boxed{\;\Sigma_u = B^{-1}\,\Omega_\varepsilon\,B^{-1\prime}.\;} \]

\(\Sigma_u\) has \(K(K+1)/2\) distinct entries — fewer than the free parameters in \((B, \Omega_\varepsilon)\). Infinitely many structural pairs reproduce the same \(\Sigma_u\), and the data alone cannot distinguish them.

Identification: The General Counting Argument

In general, with \(K\) variables:

  • \(B\) has \(K^2\) entries
  • \(\Omega_\varepsilon\) has \(K\) diagonal entries
  • Total: \(K^2 + K\) structural parameters

Scaling indeterminacy. For any invertible diagonal \(D\),

\[ (B, \Omega_\varepsilon) \quad \text{and} \quad (DB,\, D\Omega_\varepsilon D') \]

produce the same \(u_t\). The scale of each shock is interchangeable with its row of \(B\), so \(K\) of the \(K^2 + K\) parameters are not separately identified.

Effective free parameters: \(K^2\).

The gap. \(\Sigma_u\) has \(K(K+1)/2\) distinct moments, leaving

\[ K^2 - \frac{K(K+1)}{2} = \frac{K(K-1)}{2} \]

substantive restrictions to be supplied from outside.

Normalization

To do estimation we need a single representative of each equivalence class, not the whole class. Two issues to handle.

Scale. The SVAR convention is to set \(\Omega_\varepsilon = I\) (equivalently, pick \(D = \Omega_\varepsilon^{-1/2}\)). This pins down the \(K\) scale parameters of \(\Omega_\varepsilon\), and the constraint becomes

\[\Sigma_u = B^{-1} B^{-1\prime}.\]

Sign: a real convention. Once scale is fixed, \(B^{-1}\) and \(-B^{-1}\) still produce the same \(\Sigma_u\). Convention: require positive diagonal entries on \(B^{-1}\).

With scale and sign pinned down, the substantive identification problem is the \(K(K-1)/2\) restrictions.

Identification Strategies: A Preview

The \(K(K-1)/2\) restrictions can come from different sources. Four standard strategies:

1. Recursive (Cholesky). Order variables so earlier ones do not respond contemporaneously to later ones’ shocks. Restricts \(B^{-1}\) to be lower triangular.

2. Long-run (Blanchard–Quah). Restrict the cumulative response of certain variables to certain shocks at horizon \(\infty\).

3. Sign restrictions. Require IRFs to satisfy specific signs on impact (or for the first \(h\) horizons). Identifies a set of \(B^{-1}\), not a single point.

4. External instruments (Proxy SVAR). Use an outside variable correlated with one structural shock and orthogonal to the others. Identifies one column of \(B^{-1}\).

Recursive (Cholesky) Identification

Idea. Restrict \(B^{-1}\) to be lower triangular.

The \(\frac{K(K-1)}{2}\) above-diagonal zeros are exactly the right number of restrictions.

With \(\Omega_\varepsilon = I\):

\[ \Sigma_u = B^{-1} B^{-1\prime}, \qquad B^{-1}\ \text{lower triangular with positive diagonal.} \]

This is the Cholesky decomposition of \(\Sigma_u\) — unique.

Cholesky: Bivariate Worked Example

Let

\[ \Sigma_u = \begin{bmatrix} \sigma_{11} & \sigma_{12}\\ \sigma_{12} & \sigma_{22} \end{bmatrix}. \]

Solve \(\Sigma_u = B^{-1} B^{-1\prime}\) with \(B^{-1} = \begin{bmatrix} a & 0 \\ b & c \end{bmatrix}\):

\[ \begin{aligned} a^2 &= \sigma_{11} & \Rightarrow\quad a &= \sqrt{\sigma_{11}}, \\ ab &= \sigma_{12} & \Rightarrow\quad b &= \sigma_{12}/\sqrt{\sigma_{11}}, \\ b^2 + c^2 &= \sigma_{22} & \Rightarrow\quad c &= \sqrt{\sigma_{22} - \sigma_{12}^2/\sigma_{11}}. \end{aligned} \]

Reading the Triangular Mixing

\[ \begin{aligned} u_{1t} &= a\,\varepsilon_{1t}, \\ u_{2t} &= b\,\varepsilon_{1t} + c\,\varepsilon_{2t}. \end{aligned} \]

  • Variable 1’s residual is a clean rescaling of its own shock.
  • Variable 2’s residual mixes \(\varepsilon_{2t}\) with the within-period spillover from \(\varepsilon_{1t}\).

Causal direction. On impact, \(\varepsilon_{1t}\) moves both variables; \(\varepsilon_{2t}\) moves only variable 2. Shocks flow \(1 \to 2\) within the period — never \(2 \to 1\).

Defining the Structural IRF

The reduced-form MA representation:

\[ x_t = \sum_{j=0}^{\infty} \Phi_j\, u_{t-j}, \qquad \Phi_0 = I. \]

Substitute \(u_{t-j} = B^{-1}\varepsilon_{t-j}\):

\[ x_t = \sum_{j=0}^{\infty} \Psi_j\, \varepsilon_{t-j}, \qquad \boxed{\;\Psi_j = \Phi_j\, B^{-1}.\;} \]

\((\Psi_h)_{ik}\) = response of variable \(i\) at horizon \(h\) to a unit shock in \(\varepsilon_{kt}\). Structural impulse response function.

Computing Structural IRFs in Practice

Two ingredients we already have:

  1. Reduced-form coefficients \(\widehat A_1, \ldots, \widehat A_p\) → recursion gives \(\widehat\Phi_j\)
  2. Cholesky factor \(\widehat B^{-1} = \mathrm{chol}(\widehat\Sigma_u)\)

\(\widehat\Psi_j = \widehat\Phi_j \widehat B^{-1}\) for each horizon \(j\).

In the vars package, irf(fit, ortho = TRUE, ...) does both steps and returns the IRFs with confidence bands.

Sampling Uncertainty in IRFs

\(\widehat\Psi_j\) depends on the OLS estimates \((\widehat A_1, \ldots, \widehat A_p, \widehat\Sigma_u)\) — both subject to sampling error. We need a CI at each horizon.

  • \(\widehat\Psi_j\) is a nonlinear function of the estimates (companion-form products + Cholesky map).
  • Closed-form delta-method SEs are derivable but tedious.

The standard fix: simulate the sampling distribution of \(\widehat\Psi_j\) by bootstrap, and read CIs off the simulated quantiles.

The Residual Bootstrap

  1. Fit the VAR. Save \(\widehat A_j\), \(\widehat\Sigma_u\), residuals \(\{\widehat u_t\}_{t=1}^T\).
  1. For \(b = 1, \ldots, B\) (typically \(B = 1000+\)):
    1. Resample \(\{u^*_t\}_{t=1}^T\) from \(\{\widehat u_t\}\) with replacement.
    2. Build \(\{x^*_t\}\) recursively from the VAR using \(\{u^*_t\}\).
    3. Re-estimate the VAR; compute \(\widehat\Psi_j^{*(b)}\) for each \(j\).
  1. At each horizon \(j\): take pointwise quantiles of \(\{\widehat\Psi_j^{*(b)}\}_{b=1}^B\) — e.g., 2.5th and 97.5th for a 95% band.

Wild Bootstrap

The residual bootstrap resamples \(\{u^*_t\}\) i.i.d. from \(\{\widehat u_t\}\) — fine if residuals are i.i.d. across \(t\). But often they exhibit conditional heteroskedasticity: variance varies over time (e.g., volatility clustering in financial data).

Why i.i.d. resampling fails. A high-variance residual from period A might land in low-variance period B. The bootstrap world has uniform variance across time; the data world doesn’t. CIs miscalibrate.

Wild bootstrap. Don’t resample — keep each \(\widehat u_t\) in place and multiply by a random sign:

\[u^*_t = \widehat u_t \cdot \eta_t,\qquad \eta_t \in \{-1, +1\}\ \text{i.i.d.}\]

Then \(\mathbb{E}[u^*_t] = 0\) and \(\mathrm{Var}(u^*_t \mid \widehat u_t) = \widehat u_t \widehat u_t'\) — the time-varying covariance pattern is preserved, only the sign is randomized.

Pointwise vs Simultaneous Bands

Pointwise (default irf() output). At each horizon \(h\) separately,

\[\Pr\bigl[\widehat\Psi_h^{0.025} \le \Psi_h \le \widehat\Psi_h^{0.975}\bigr] \approx 0.95.\]

The catch. Joint coverage across all horizons is lower:

\[\Pr\bigl[\Psi_h \in \text{CI}_h\ \text{for all } h = 1, \ldots, H\bigr] \;<\; 0.95.\]

When the question is about IRF shape — “does it stay positive for all \(h \le 12\)?” — pointwise bands undercover.

Simultaneous bands. Wider \((L_h, U_h)\) with

\[\Pr\bigl[L_h \le \Psi_h \le U_h\ \forall\, h\bigr] \ge 0.95.\]

Constructions: Sims–Zha (1999), sup-\(t\) bands (Olea & Plagborg-Møller, 2019), Bonferroni (most conservative). Worth using whenever a claim is about the full IRF path.

Match the Band to the Question

For path-level claims, pointwise and simultaneous bands can give different answers.

Is the effect positive throughout \(h \le 20\)? Pointwise lower stays \(> 0\)yes. Simultaneous lower dips below zero at long \(h\)no. Same data, same point estimate, different conclusions: pointwise undercover for joint claims.

When Is a Recursive Ordering Plausible?

Bootstrap CIs presume the identification is correct. Step back to that: when does a recursive ordering actually hold?

A recursive ordering encodes a timing assumption: variables earlier in the order do not respond to variables later in the order within the same period.

Common conventions in macro:

  • Slow-moving variables (output, prices) ordered first — they cannot react to financial conditions within the quarter.
  • Fast-moving variables (interest rates, asset prices) ordered last — they react to everything within the period.
  • Technology shocks often ordered first; monetary policy shocks often ordered last.

The assumption is not in the data. If you reorder, you get a different \(B^{-1}\), different IRFs, different stories. Always report the ordering and justify it.

Ordering Matters: A Demonstration

Take a small simulated bivariate VAR. We Cholesky-identify it under two orderings — \((v_1, v_2)\) and \((v_2, v_1)\) — and plot the impulse response of \(v_2\) to a shock in \(v_1\).

Same data, same model, different ordering, different answer.

The choice of ordering is part of the identifying assumption — the data do not pick it.

A Real Application: Christiano, Eichenbaum & Evans (1999)

The most influential recursive-SVAR study of monetary policy. The question:

How do output, prices, and the rest of the economy respond to a contractionary monetary policy shock?

Data. Quarterly U.S. macro data, 1965Q1–1995Q4 (CEE’s sample).

Variables (in the recursive ordering CEE adopt):

  • Slow block: GDP, GDP deflator, commodity price index
  • Federal funds rate — the Fed’s main policy instrument over the sample
  • Fast block: total reserves, non-borrowed reserves, M2

Reserves and M2

Reserves = bank balances held at the Fed. Banks hold them to meet:

  • Reserve requirements — a mandatory fraction of demand deposits
  • Interbank payment settlement — payments between banks clear through Fed reserve accounts

Two types of reserves:

  • Non-borrowed — from the Fed buying/selling Treasuries with banks
  • Borrowed — from banks borrowing directly from the Fed at the discount rate
  • Total = non-borrowed + borrowed

The FFR is the interbank overnight reserve rate (banks lending reserves to each other). The Fed targets it by adjusting non-borrowed reserves — the main policy lever.

M2 = M1 (cash + checking) + savings + small CDs + retail money market funds — the broad money supply households and firms hold.

Transmission. Fed sells Treasuries → non-borrowed reserves contract → FFR rises → bank lending tightens → deposits and M2 fall. The fast-block IRFs trace this chain.

Why CEE Include Commodity Prices: The “Price Puzzle”

In simpler VARs (without \(p^{\text{com}}\)), a contractionary shock raises prices — implausible.

Sims (1992). Commodity prices are leading inflation indicators in the Fed’s information set. Omit \(p^{\text{com}}\) → “policy shock” absorbs the Fed’s preemptive response → shock correlates with realized inflation → wrong sign on \(p\).

CEE: The Identifying Assumption

The ordering is a timing story about within-quarter behavior:

  • Slow variables (output, prices) don’t respond to policy within the quarter — sticky prices, sluggish real activity.
  • The fed funds rate can respond to slow variables within the quarter — the Fed sees output, prices, and commodity prices and reacts.
  • Fast variables (reserves, money) respond to everything within the quarter — financial markets clear quickly.

The structural shock to the fed funds equation, isolated by this ordering, is the monetary policy shock — a quarter’s policy move that the Fed’s reaction function does not explain.

What the Restriction Buys You

Without the ordering, the fed-funds equation residual is a mixture: a true policy surprise, plus the systematic Fed response to today’s slow-block news (output, prices, commodity prices).

With the ordering — and only with it — the mixture unscrambles:

\[ u_{\text{ffr},t} \;=\; \underbrace{\alpha_y\,\varepsilon_{y t} + \alpha_p\,\varepsilon_{p t} + \alpha_{c}\,\varepsilon_{p^{\text{com}} t}}_{\text{policy reaction to slow-block shocks}} \;+\; \underbrace{\varepsilon_{\text{ffr},t}}_{\text{policy shock}} . \]

This is the identifying restriction biting — turning a residual of mixed origin into an interpretable structural object.

CEE: Output Response

Hump-shaped decline. Output falls gradually, troughs at ~4–6 quarters, and recovers within ~3 years — inverted-U, not a jump. Impact is zero by the slow-block restriction; the delayed real effect is the headline real-economy finding.

CEE: GDP Deflator Response

Sluggish decline. Bulk of the response arrives only after 1–2 years — well behind the output trough. Output falls before prices. Pointwise CIs straddle zero throughout — discussion next slide.

Reading the Weak Price Response

Calvo (sticky prices) predicts: small short-run response, gradual decline, full long-run adjustment.

What we see:

  • Short run — matches: no detectable response.
  • Long run — ambiguous: point estimates drift down; CIs span zero.

Why isn’t the long-run decline sharp?

  • Sample noise (short sample; bootstrap variance grows with \(h\)).
  • Recursive identification gives a small price effect in this sample.

Identification matters. Different rotations → different IRFs. The data don’t pick the rotation. Identification is a modeling decision.

CEE: Commodity Prices Response

Faster decline than \(p\) — flexible-price markets clear quickly. \(p^{\text{com}}\) is still in the slow block: the Fed reacts to it within-quarter (not that pcom itself is slow).

CEE: Federal Funds Rate Response

The policy shock itself. A one-s.d. surprise raises FFR on impact; the rate decays back over ~6–8 quarters. By construction, this response is not part of the systematic Fed reaction — it’s the residual orthogonal to slow-block contemporaneous shocks, the SVAR’s interpretation of exogenous policy.

CEE: Non-Borrowed Reserves — the Liquidity Effect

Liquidity effect. \(\text{nbr}\) contracts sharply on impact — the Fed withdraws reserves to engineer the rate hike. (Total reserves’ response is muddier: when reserves are tight, banks under stress pay the higher discount rate to borrow from the Fed; borrowed reserves rise and partially offset the fall in \(\text{nbr}\).)

CEE: M2 Response

Money supply contracts. \(M_2\) declines as the policy contraction propagates — the broad monetary aggregate moves with reserves and the rate, more gradually than \(\text{nbr}\).

Forecast Error Variance Decomposition

What FEVD measures:

  • IRF: per-unit effect of one shock on variable \(i\)
  • FEVD: that shock’s share of variable \(i\)’s variance, against all other shocks
  • “What does shock \(k\) do?” vs. “which shock drives \(i\)?”

Decomposition. The \(h\)-step forecast error is \(x_{t+h} - \mathbb{E}_t x_{t+h} = \sum_{j=0}^{h-1} \Psi_j\, \varepsilon_{t+h-j}\). With \(\Omega_\varepsilon = I\), variance is additive across shocks:

\[ \operatorname{Var}_h(x_{i,t+h}) = \underbrace{\textstyle\sum_{j} (\Psi_j)_{i1}^2}_{\text{shock 1}} + \cdots + \underbrace{\textstyle\sum_{j} (\Psi_j)_{iK}^2}_{\text{shock $K$}}. \]

FEVD = shock \(k\)’s piece divided by the total:

\[ \mathrm{FEVD}_{ik}(h) = \frac{\sum_{j=0}^{h-1}(\Psi_j)_{ik}^{2}}{\sum_{j=0}^{h-1}\sum_{\ell=1}^{K}(\Psi_j)_{i\ell}^{2}}. \]

Shares sum to 1; inherits meaning from the identification.

CEE: FEVD

Share of forecast-error variance attributable to the FFR shock, by variable and horizon.

CEE: FEVD — Takeaways

Share of forecast-error variance attributable to the FFR shock:

  • Output (\(y\)): ~35–40% at medium–long horizons — the FFR shock is a substantial driver of real activity in this replication
  • GDP deflator and commodity prices: \(<10\%\) throughout — small share
  • Non-borrowed reserves: ~10% — modest contribution

Long-Run Identification (Blanchard–Quah)

A different idea: instead of restricting impact responses, restrict long-run cumulative responses.

The cumulative effect of a shock at horizon \(\infty\) is

\[ \Psi(1) = \sum_{j=0}^{\infty} \Psi_j = \Phi(1)\, B^{-1}. \]

For a stable VAR(1), iterating gives \(\Phi_j = A_1^j\), so

\[ \Phi(1) = \sum_{j=0}^{\infty} A_1^j = (I - A_1)^{-1}. \]

(For VAR\((p)\): \(\Phi(1) = (I - A_1 - \cdots - A_p)^{-1}\).)

A long-run zero restriction like

\[ \Psi(1)_{ij} = 0 \]

says shock \(j\) has no permanent effect on the level of variable \(i\).

BQ: Bivariate Worked Example

For \(K=2\) with one long-run zero, \(\Psi(1) = CB^{-1}\) is lower triangular — an ordering assumption, the long-run analog of recursive ordering, imposed on \(\Psi(1)\) rather than \(B^{-1}\).

Square \(\Psi(1)\):

\[ \Psi(1)\Psi(1)' \;=\; C\,\Sigma_u\,C' \;\equiv\; V, \]

where \(C = (I - A_1 - \cdots - A_p)^{-1}\). \(V\) is data. Take Cholesky: \(LL' = V \Rightarrow \Psi(1) = L\), then \(B^{-1} = C^{-1} L\).

With \(V = \begin{bmatrix} v_{11} & v_{12}\\ v_{12} & v_{22} \end{bmatrix}\),

\[ L = \begin{bmatrix} \sqrt{v_{11}} & 0 \\ v_{12}/\sqrt{v_{11}} & \sqrt{v_{22} - v_{12}^2/v_{11}} \end{bmatrix}. \]

Same Cholesky bivariate formulas as the recursive case — applied to the long-run covariance \(V\) rather than \(\Sigma_u\).

A Real Application: Blanchard & Quah (1989)

The founding long-run-restriction SVAR. The question:

What are the dynamic effects of aggregate demand and supply disturbances? How much of the variation in output and unemployment does each account for?

Data. U.S. quarterly, 1948Q1–1987Q4 (BQ’s sample).

Variables. Output growth \(\Delta y_t\) and unemployment \(U_t\). Bivariate VAR(8).

Supply and Demand Disturbances

Two basic kinds of disturbance hit the economy:

  • Supply disturbances — oil shocks, technology, capital accumulation, labor-force changes, weather, regulation. They change what the economy can produce — its productive capacity.
  • Demand disturbances — monetary policy, fiscal policy, household and business spending, financial conditions, “animal spirits.” They change how much is spent at given capacity.

The distinction matters for policy.

  • Demand shortfall — capacity is intact, only spending has fallen. Monetary and fiscal stimulus are designed for this case.
  • Supply contraction — capacity itself has shrunk. Stimulating into a smaller economy just raises prices.

Long-Run Neutrality

Output (\(Y_t\)). Long-run output is determined by supply-side fundamentals — capital, labor, productivity. Quantity theory:

\[MV = PY, \qquad Y \to Y^* \;\Longrightarrow\; \Delta M \to \Delta P.\]

  • Demand (movements in \(M\)): nominal effect only — \(\Delta P\), not \(\Delta Y\). Long-run neutral on output.
  • Supply: shifts \(Y^*_t\) itself (productivity, capital, labor force). Permanent on output.

Unemployment (\(U_t\)). Long-run unemployment is the natural rate \(U^*\), set by labor-market frictions, demographics, and institutions. Neither supply nor demand shocks change \(U^*\). Both shocks long-run neutral on unemployment.

The asymmetry BQ exploits: only supply moves output’s long-run level; nothing moves unemployment’s.

BQ: The Identifying Restriction

The asymmetry — demand transient, supply permanent — gives a single zero in the long-run cumulative response matrix \(\Psi(1)\):

\[\Psi(1)_{\Delta y,\, d} = 0\]

  • The cumulative response of \(\Delta y\) to a demand shock is zero. Since the level \(y\) is the running sum of \(\Delta y\), this means demand has no permanent effect on the level of output.
  • Row \(\Delta y\) = output growth (first variable in \(x_t\)); column \(d\) = demand shock.
  • One restriction — exactly the \(K(K-1)/2 = 1\) count for the bivariate case.

BQ: Supply Shock

Permanent on output, persistent on unemployment.

  • \(\Delta y\) panel: response is positive on impact, then decays back to zero. Cumulating this IRF gives a positive permanent level shift in \(Y\) — the structural property that defines this shock.
  • \(U\) panel: unemployment trough around horizon 5–6 (~−0.5); recovery is slow, still detectable beyond 10 years.

BQ: Demand Shock

Transient on output, hump on unemployment.

  • \(\Delta y\) panel: rises on impact, then declines back to zero (possibly overshooting briefly negative). Cumulating this IRF gives zero — the imposed long-run neutrality.
  • \(U\) panel: unemployment falls in a hump, troughing around horizon 2–3; gradual recovery.

Long-Run vs. Short-Run: A Comparison

Cholesky Blanchard–Quah
Restriction on: impact (\(B^{-1}\)) long-run sum (\(\Psi(1)\))
Form: zeros in \(B^{-1}\) zeros in \(\Psi(1)\)
Information needed: within-period timing long-run neutrality
Computation: \(\mathrm{chol}(\Sigma_u)\) \(\mathrm{chol}(C\Sigma_u C')\)

The two schemes encode different economic information. Use whichever is more defensible for the question at hand.

Sign Restrictions

Both Cholesky and Blanchard–Quah are point-identifying: the restrictions pin down a single \(B^{-1}\).

Sign restrictions are weaker — they say only:

A monetary contraction raises the policy rate, lowers output, and lowers inflation, on impact (or for the first \(h\) horizons).

That is set identification: we keep every \(B^{-1}\) consistent with the data (\(B^{-1}B^{-1\prime} = \Sigma_u\)) and with the sign pattern. The IRF becomes a set of curves, not a single curve.

Identification as a Choice of \(Q\)

Structural identification requires \(\Sigma_u = B^{-1}(B^{-1})'\) (with \(\Omega_\varepsilon = I\)).

The Cholesky factor \(L\) is one solution. But for any orthogonal \(Q\) (\(QQ' = I\)):

\[(LQ)(LQ)' = L\, QQ'\, L' = LL' = \Sigma_u.\]

So \(\tilde B^{-1} = LQ\) is also a valid decomposition. \(Q\) parametrizes everything the data leave unidentified.

Each identification scheme is a rule for picking \(Q\):

  • Cholesky: \(Q = I\)\(B^{-1} = L\) (lower triangular).
  • BQ: pick the unique \(Q\) that makes \(\Psi(1) = C \cdot LQ\) lower triangular.
  • Sign restrictions: keep the set of \(Q\)’s for which \(\tilde\Psi_h = \Phi_h \cdot LQ\) satisfies the sign pattern.

Sign Restrictions: The Algorithm

We search the identified set by Monte Carlo over \(Q \in O(K) = \{Q : QQ' = I\}\):

  1. Sample \(Q\) uniformly from \(O(K)\) — standard algorithm (Stewart 1980).
  2. Form candidate \(\tilde B^{-1} = LQ\) where \(L = \mathrm{chol}(\Sigma_u)\).
  3. Check signs. For each column \(k\), compute \(\tilde\Psi_h^{(k)} = \Phi_h \cdot (\tilde B^{-1})_{\bullet,\, k}\) at horizons \(0, \ldots, h_0\). If the sign pattern holds, keep that column’s full IRF.
  4. Repeat many times. The accepted IRFs form the identified set.

External Instruments (Proxy SVAR)

Suppose we have a variable \(z_t\) — outside the VAR — that is correlated with one structural shock and only that shock:

\[ \mathbb{E}[z_t\, \varepsilon_{1t}] \neq 0, \qquad \mathbb{E}[z_t\, \varepsilon_{jt}] = 0\ \ (j \neq 1). \]

This is an instrument for \(\varepsilon_{1t}\). It directly identifies the first column of \(B^{-1}\), without imposing any timing or long-run restrictions.

Proxy SVAR: Identifying \(b\) — Direction

Goal: recover \(b = (B^{-1})_{\bullet 1}\) — the impact response to shock \(\varepsilon_{1t}\).

Cross-moment isolates the column. From \(u_t = B^{-1}\varepsilon_t\) and the instrument’s properties (\(\mathbb{E}[z_t\varepsilon_{1t}] = \alpha\), others \(= 0\)): \[\mathbb{E}[u_t z_t] \;=\; B^{-1}\,\mathbb{E}[\varepsilon_t z_t] \;=\; b\cdot\alpha.\]

The cross-moment is \(b\) scaled by the unknown \(\alpha\).

OLS estimates this. Regress \(\hat u_t\) on \(z_t\) (no intercept): \[\hat\beta \;=\; \frac{\sum_t \hat u_t z_t}{\sum_t z_t^2} \;\xrightarrow{p}\; b\cdot c, \qquad c = \frac{\alpha}{\mathbb{E}[z_t^2]}.\]

OLS recovers \(b\) up to the unknown scalar \(c\) — direction yes, magnitude no.

Proxy SVAR: Identifying \(b\) — Magnitude

Variance constraint pins \(|c|\). \(\Omega_\varepsilon = I \Rightarrow \Sigma_u = B^{-1}(B^{-1})'\), hence \(b'\,\Sigma_u^{-1}\,b = 1\). Plug in \(\hat\beta = c\cdot b\): \[\hat\beta'\,\Sigma_u^{-1}\,\hat\beta \;=\; c^2 \;\;\Rightarrow\;\; |c| = \sqrt{\hat\beta'\,\widehat\Sigma_u^{-1}\,\hat\beta}.\]

Result. \(\hat b = \hat\beta / |c|\), with sign chosen by the positive-diagonal convention from earlier.

Other columns of \(B^{-1}\) remain unidentified — fine if we only need the IRF for shock 1.

Proxy SVAR: Application — The Credit Channel

We use Proxy SVAR to identify a financial transmission channel for monetary policy that recursive VARs without a credit variable can’t show.

Setup.

  • 4-variable VAR: output \(y\), GDP deflator \(p\), FFR, BAA–10y credit spread
  • Sample: 1994Q1–2007Q4 (HF series begins 1994; pre-ZLB to keep FFR informative)
  • 4 lags

Instrument. Quarterly-summed FF4 surprises from the USMPD database (Bauer-Swanson) — change in 4-month-ahead fed-funds futures in a 30-min window around FOMC events.

Spirit follows Gertler-Karadi (2015); we use the updated USMPD series in place of the original Gürkaynak-Sack-Swanson data.

Federal Funds Futures — Mechanics

FF4 is the federal funds futures contract for the 4-month-ahead month, traded on the CME.

Quote convention. The contract cash-settles at expiration to \(100 - r_{\text{realized}}\) (where \(r\) is the average realized FF rate over the contract month). By no-arbitrage: \[P_t \;\approx\; 100 - \mathbb{E}_t[r_{4\text{-month ahead}}],\] so the implied rate \(= 100 - P_t\) recovers market expectations from the price.

Trading mechanics — not a spot purchase.

  • No upfront principal. You post a refundable margin (security deposit, ~$1,000–$2,000 per contract).
  • Daily P&L = (price change) × contract multiplier (~$41.67 per basis point for the 30-day FF contract — fixed by CME contract spec, not by market).
  • At expiration, the contract cash-settles; margin is returned net of cumulative P&L.

Why FF4 Surprises Identify the Monetary Shock

The “surprise” = change in the FF4 implied rate over a 30-minute window around an FOMC announcement (in basis points; positive = rate revised up).

The 30-min window argument.

  • In that window, nothing else macro-relevant happens — no GDP releases, no inflation prints, no fiscal news.
  • The only “news” is the FOMC announcement itself.
  • Change in implied rate = the part markets didn’t already expect = the unexpected component of policy = the monetary shock \(\varepsilon_{m,t}\).

So \(z_t\) is relevant (it IS the surprise component) and exogenous (other shocks don’t move in 30 min).

Quarterly aggregation. Fed meets ~2× per quarter → 2 event-time surprises per quarter. Sum within-quarter to get the quarterly instrument \(z_t\). Aggregation preserves both validity properties.

Proxy SVAR: Credit Spread Response

The credit channel.

  • Persistent widening of the BAA–Treasury spread, peaking ~8–10 quarters out.
  • Impact response negative (~−0.02), but CI spans zero — not significant.
  • Gertler–Karadi (2015) headline: monetary policy transmits via financial frictions — a channel a recursive VAR without a credit variable can’t show.

Proxy SVAR: Standard IV Concerns

Proxy SVAR uses an external \(z_t\) to identify one shock — but the identification rests on standard IV conditions.

Exogeneity is plausible, not testable

  • \(\mathbb{E}[z_t\,\varepsilon_{jt}] = 0\) for \(j \neq 1\) cannot be checked directly — a general IV concern.
  • Concrete worry for FF4: the Fed information effect — surprises may release the Fed’s private information about the economy, not pure policy stance (Nakamura–Steinsson, 2018).
  • If so, \(z_t\) is correlated with non-policy shocks; identification breaks.

Weak-instrument risk (relevance)

  • Relevance only requires \(\mathbb{E}[z_t\,\varepsilon_{1t}] \neq 0\) — the correlation can be arbitrarily small.
  • In practice, \(z_t\) may explain little of \(u_t\); first-stage \(F < 10\) is common, and the identified \(b\) has large standard errors.

Proxy SVAR: What Shock Do We Identify?

Even when exogeneity and relevance hold, the instrument identifies the response to a specific type of shock — not a generic one.

LATE-style heterogeneity

  • The identified IRF reflects only the component of \(\varepsilon_{1t}\) that is correlated with \(z_t\).
  • FF4 surprises capture FOMC-window news; slow-moving policy shifts (gradual forward guidance, off-meeting balance-sheet operations) produce no FF4 surprise and so don’t enter the identified IRF.
  • The IRF is local to the variation \(z_t\) captures — analogous to LATE in cross-sectional IV.

Summary of Identification Schemes

Scheme Restriction What it pins down Cost
Cholesky Triangular \(B^{-1}\) All shocks, given order Strong timing
Long-run (BQ) Zeros in \(\Psi(1)\) All shocks Long-run neutrality
Sign restrictions Sign of impact / IRFs Set of admissible IRFs Set, not point ID
Proxy SVAR (IV) Instrument exogeneity One column of \(B^{-1}\) Need a credible \(z_t\)

None of these is “the right” scheme. Each makes economic content explicit. Choosing among them is a modeling decision, not a statistical one.

Standard SVAR and the \(C(L)\) Structure

The general relationship between reduced-form innovations and structural shocks: \[ u_t \;=\; C(L)\,\varepsilon_t. \]

Standard SVAR is the special case \(C(L) = B\) (constant matrix). Identify \(B\) from \(\Sigma_u\) \(+\) restrictions, recover \(\varepsilon_t = B^{-1} u_t\).

If \(C(L)\) has lags but is invertible: the VAR’s \(\Phi(L)\) (with enough lags) absorbs the lag structure, leaving \(u_t = B\,\varepsilon_t\) in the limit. Standard SVAR still works.

If \(C(L)\) is non-invertible: no VAR — finite or infinite order — can recover \(\varepsilon_t\) from past \(u\)’s. This is non-fundamentality.

The canonical source of non-invertible \(C(L)\): anticipation.

\(C(L)\) and the Structural MA \(M(L)\)

From \(\Phi(L)\,y_t = C(L)\,\varepsilon_t\), the structural MA of observables is \[ y_t = M(L)\,\varepsilon_t, \qquad C(L) = \Phi(L)\,M(L). \]

  • Determinant rule: \(\det C(z) = \det\Phi(z) \cdot \det M(z)\).
  • Stationarity: \(\det\Phi(z)\) has no inside-disk roots.
  • \(\Rightarrow\) inside-disk roots of \(\det C(z)\) are exactly those of \(\det M(z)\).

\(\Rightarrow\) checking \(C(L)\) non-invertibility reduces to checking \(\det M(z)\) for inside-disk roots. The next slide does this in a GE model.

LWY (2013): Setup

Neoclassical growth model with \(q\)-period tax foresight.

Model ingredients

  • Preferences: log utility, discount factor \(\beta \in (0,1)\).
  • Production: \(A_t K_{t-1}^\alpha\) (capital share \(\alpha \in (0,1)\), full depreciation, inelastic labor).
  • Government: distortionary tax at rate \(\tau_t\) (steady-state level \(\tau\)), rebated lump-sum.

Observables

  • \(a_t\) — log TFP.
  • \(\hat K_t\) — log capital deviation from steady state.

Structural shocks

  • \(\varepsilon^A_t\) — TFP innovation.
  • \(\varepsilon^\tau_t\) — tax news at \(t\), announces \(\hat\tau_{t+q}\).

LWY (2013): Equilibrium \(M(L)\)

Solving the equilibrium (log utility, full depreciation, Cobb–Douglas production) gives, for \(q = 2\): \[ M(L) \;=\; \begin{pmatrix} 1 & 0 \\ \dfrac{1}{1 - \alpha L} & -\dfrac{\kappa(L + \theta)}{1 - \alpha L} \end{pmatrix}, \] with \(\theta = \alpha\beta(1-\tau) \in (0,1)\) and \(\kappa = (1-\theta)\tau/(1-\tau)\).

  • Row 1 (TFP): exogenous to taxes, responds 1-to-1 to its own shock.
  • Row 2 (capital):
    • TFP: AR dynamics \((1-\alpha L)^{-1}\).
    • tax news: numerator \(-\kappa(L+\theta)\) (impact \(-\kappa\theta\), \(L^1\) coefficient \(-\kappa\)), AR filter \((1-\alpha L)^{-1}\).

LWY (2013): Non-Invertibility

Determinant: \[ \det M(z) \;=\; -\frac{\kappa(z+\theta)}{1-\alpha z}, \qquad \text{root at } z = -\theta. \]

Since \(\theta = \alpha\beta(1-\tau) \in (0,1)\), the root sits inside the unit disk. \(M(L)\) non-invertible \(\Rightarrow C(L)\) non-invertible \(\Rightarrow\) standard SVAR fails.

Economic intuition.

  • Coefficient on current news in today’s capital is \(-\kappa\theta\) (from row 2 of \(M_0\)).
  • Larger \(\theta\) \(\Rightarrow\) larger response of capital to current news \(\Rightarrow\) news more visible in today’s data \(\Rightarrow\) closer to invertibility.
  • Smaller \(\theta\) \(\Rightarrow\) news barely affects today’s capital \(\Rightarrow\) news invisible until tax change at \(t+q\) \(\Rightarrow\) severe non-invertibility.
  • Standard parameters (\(\alpha \approx 0.3\), \(\beta \approx 0.99\), \(\tau \approx 0.25\)): \(\theta \approx 0.22\), well inside disk.

Proxy SVAR: Impact Response

Suppose we have an external instrument \(z_t\) with

  • Relevance: \(\mathrm{Cov}(z_t,\,\varepsilon^\tau_t) \neq 0\).
  • Exogeneity to other current shocks: \(\mathrm{Cov}(z_t,\,\varepsilon^A_t) = 0\).
  • Exogeneity to past shocks: \(\mathrm{Cov}(z_t,\,\varepsilon_{s}) = 0\) for all \(s < t\).

Impact response. Under non-invertibility, \(u_t = C(L)\,\varepsilon_t = \sum_{j \geq 0} C_j\,\varepsilon_{t-j}\). Then \[\begin{align*} \mathrm{Cov}(u_t,\, z_t) &= \sum_{j \geq 0} C_j\,\mathrm{Cov}(\varepsilon_{t-j},\, z_t) \\ &= C_0\,\mathrm{Cov}(\varepsilon_t,\, z_t) \quad\text{(past-shock terms zero)} \\ &= (C_0)_{\bullet,\tau} \cdot \mathrm{Cov}(z_t,\,\varepsilon^\tau_t). \end{align*}\] Since \(C_0 = \Phi(0)\,M_0 = M_0\), this recovers \((M_0)_{\bullet,\tau}\) — the impact column (\(h = 0\) of the IRF).

Full IRF. For later horizons, use local-projection IV (LP-IV).

Practical Guidance

  • Identification is the key assumption — the data do not choose it for you.
  • Always report the ordering / restrictions used, with a one-line economic justification.
  • Check robustness across schemes. If a result holds under Cholesky with two orderings, plus sign restrictions, plus a Proxy SVAR — that is meaningful. If it flips, that is also meaningful.
  • Inspect the IRFs critically. Implausible signs or magnitudes are a warning that the identification doesn’t fit the data.
  • Confidence bands matter. A “significant” IRF whose 95% band crosses zero everywhere except at horizon 3 is not a robust finding.

What SVARs Do — and Don’t

Do (given an identification):

  • IRFs — effects of structural shocks on each variable
  • FEVDs — share of each variable’s variance attributable to each shock
  • Historical decomposition — which shocks drove observed paths
  • Counterfactuals — paths under hypothetical shock sequences

Don’t:

  • Adjudicate which identification is right — data don’t choose
  • Tell you which shocks matter economically — theory does
  • Handle structural breaks or regime instability
  • Capture non-linear effects (asymmetric responses, threshold dynamics)

Beyond This Unit

Standard extensions worth knowing the names of:

Estimation:

  • Local projections (Jordà, 2005) — IRFs by direct \(h\)-step regressions; no MA inversion
  • FAVAR (Bernanke, Boivin, Eliasz, 2005) — augment the VAR with factors from a large panel
  • Bayesian VARs — shrinkage priors for high-dimensional systems

Identification:

  • Heteroskedasticity (Rigobon, 2003; Lewis, 2021) — exploit changes in variance regimes to identify \(B^{-1}\)
  • Non-Gaussian identification (Lanne, Meitz, Saikkonen, 2017) — independent non-Gaussian components uniquely identified up to scale and permutation