Xiamen University, Chow Institute
May, 2026
In Unit 7 we built reduced-form VARs:
\[ x_t = c + A_1 x_{t-1} + \cdots + A_p x_{t-p} + u_t, \qquad \mathbb{E}(u_t u_t') = \Sigma_u . \]
These describe the joint dynamics of \(x_t\) — how variables co-move and how the system propagates over time.
This unit asks a different kind of question:
What happens to GDP if the central bank surprises markets with a 25 bp hike today?
Answering it requires an extra layer of structure — identification — that turns reduced-form estimates into causal ones.
Question 1 — Forecasting. Given everything we know up to today, what is our best guess for next quarter?
\[ \hat x_{t+h\mid t} = \mathbb{E}(x_{t+h} \mid x_t, x_{t-1}, \ldots). \]
The reduced form is sufficient. We never have to ask why variables co-move — only that they do.
Question 2 — Granger causality. Does the past of \(X\) help predict \(Y\) beyond what the past of \(Y\) alone does? In a VAR, a joint \(F\)-test on the lag coefficients of \(X\) in the equation for \(Y\).
The name is misleading: this is about predictability, not causation. A barometer’s reading Granger-causes the weather, but turning its dial does not bring rain.
Question 3 — Causal counterfactual.
If the central bank surprises markets with a 25 bp hike today, by how much does output fall over the next two years?
This is a question about an intervention, not a forecast.
These are different objects. The data — even an infinite amount of it — do not tell us the second from the first without extra assumptions.
The “surprise” part of the rate change — what moves on its own, not as a response — is what we’ll call a shock.
A shock captures a primitive cause:
Equivalently, a shock is a surprise — the part of a variable unpredictable from the information set of the agents in the model.
Two shocks are, by construction, mutually uncorrelated. If two candidates co-move systematically, neither is primitive — something behind both is the real cause.
Terminology. Mutually uncorrelated shocks are commonly called orthogonal shocks in the SVAR literature.
Analogy. A sound mixing board has independent input knobs (bass, treble, vocals). What you hear is a mixture of all knobs at once. The speaker output is observable; the knob settings are what we want to recover.
The reduced-form innovation \(u_t\) is defined mechanically:
\[ u_t = x_t - \mathbb{E}(x_t \mid x_{t-1}, x_{t-2}, \ldots). \]
It is “today’s \(x_t\) minus its forecast from the past” — itself a surprise, but relative to the econometrician’s information set: the history of the observed series. Nothing about that definition makes the components of \(u_t\) causally distinct.
The relationship to structural shocks is
\[ u_t = B^{-1} \varepsilon_t . \]
Each component \(u_{it}\) is a linear mixture of the underlying primitive shocks \(\varepsilon_t\).
Imagine a tiny macro VAR with two variables: inflation \(\pi_t\) and the policy rate \(i_t\).
Suppose, within a quarter:
Two structural shocks drive the system:
The reduced-form residuals can be written in terms of these:
\[ \begin{aligned} u_{\pi t} &= \varepsilon_{\pi t} & &\text{(no within-quarter response of $\pi$ to $i$)} \\ u_{i t} &= \varepsilon_{i t} + \alpha\,\varepsilon_{\pi t} & &\text{(policy responds to inflation, coef. }\alpha\text{)} \end{aligned} \]
So \(u_{it}\) is not the monetary surprise — it is the surprise plus the systematic policy response to the inflation shock.
Three questions about the \((\pi_t, i_t)\) system that only the structural form can answer.
1. Policy. What is the effect of a hypothetical policy action?
2. Theory testing. Which transmission mechanism is operating?
3. Counterfactuals. How much of observed inflation came from which source?
If the central bank surprises with a 25 bp rate hike today — not in response to anything happening in the economy — what happens to inflation over the next two years?
The object we want. The IRF of \(\pi\) to a unit \(\varepsilon_{it}\) shock, horizon by horizon.
Why \(u_{it}\) won’t do. Recall \(u_{it} = \varepsilon_{it} + \alpha\,\varepsilon_{\pi t}\). A “unit shock to \(u_{it}\)” mixes a true policy surprise with the Fed’s mechanical response to an inflation shock. These have different effects on \(\pi\) — averaging them answers no well-defined question.
Where this shows up. Every published monetary-policy IRF — from Sims (1980) to CEE (1999) to Gertler–Karadi (2015) — is the response of inflation (or output) to \(\varepsilon_{it}\), after some identification has separated it from \(\alpha\,\varepsilon_{\pi t}\).
Through what mechanism does a rate hike actually move inflation?
Two stories.
The IRF as referee. The two stories disagree about the impact response of \(\pi\) to \(\varepsilon_{it}\):
The shape of the IRF at short horizons distinguishes them.
Of the inflation observed in 2021–2024, how much came from the Fed’s hiking cycle, and how much from everything else?
The object we want. Write each observed \(\pi_t\) as a sum of contributions from the two structural shocks:
\[ \pi_t \;=\; \underbrace{\sum_{j\ge 0}\psi^{(\pi i)}_j\,\varepsilon_{i,t-j}}_{\text{policy contribution}} \;+\; \underbrace{\sum_{j\ge 0}\psi^{(\pi\pi)}_j\,\varepsilon_{\pi,t-j}}_{\text{everything else}} . \]
The counterfactual “no policy shocks” zeros the first sum and recomputes \(\pi_t\) from the second alone.
Where this shows up. Historical decompositions of inflation, output gaps, and exchange rates are staples of central-bank reports and post-mortems on specific episodes (e.g., Bernanke & Blanchard’s 2023 analysis of post-pandemic inflation).
All three questions reduce to one problem: how do we recover the structural shocks \(\varepsilon_t\) from the reduced-form residuals \(u_t\)?
Start from the bivariate structural system of Unit 7:
\[ \begin{aligned} y_t &= b_{10} - b_{12} z_t + \gamma_{11} y_{t-1} + \gamma_{12} z_{t-1} + \varepsilon_{yt}, \\ z_t &= b_{20} - b_{21} y_t + \gamma_{21} y_{t-1} + \gamma_{22} z_{t-1} + \varepsilon_{zt}. \end{aligned} \]
The \(\varepsilon_t\) are mutually uncorrelated structural shocks with diagonal covariance \(\Omega_\varepsilon\).
Stacked:
\[ B x_t = \Gamma_0 + \Gamma_1 x_{t-1} + \varepsilon_t, \qquad B = \begin{bmatrix} 1 & b_{12}\\ b_{21} & 1 \end{bmatrix}. \]
\(B\) encodes contemporaneous interactions between variables.
Premultiply by \(B^{-1}\):
\[ x_t = B^{-1}\Gamma_0 + B^{-1}\Gamma_1 x_{t-1} + B^{-1}\varepsilon_t . \]
Define \(c = B^{-1}\Gamma_0\), \(A_1 = B^{-1}\Gamma_1\), \(u_t = B^{-1}\varepsilon_t\):
\[ x_t = c + A_1 x_{t-1} + u_t . \]
This is the form we estimate by OLS, equation by equation.
We get:
Neither \(B\) nor \(\Omega_\varepsilon\) appears separately.
In the bivariate system from Unit 7:
So 4 structural unknowns.
The reduced form delivers \(\Sigma_u\), a symmetric \(2\times 2\) matrix with 3 distinct moments.
One additional restriction is needed to pin down \(B\). From outside the data — economic theory or institutional timing.
The structural parameters split into two pieces:
The data constrain the latter only through
\[ \boxed{\;\Sigma_u = B^{-1}\,\Omega_\varepsilon\,B^{-1\prime}.\;} \]
\(\Sigma_u\) has \(K(K+1)/2\) distinct entries — fewer than the free parameters in \((B, \Omega_\varepsilon)\). Infinitely many structural pairs reproduce the same \(\Sigma_u\), and the data alone cannot distinguish them.
In general, with \(K\) variables:
Scaling indeterminacy. For any invertible diagonal \(D\),
\[ (B, \Omega_\varepsilon) \quad \text{and} \quad (DB,\, D\Omega_\varepsilon D') \]
produce the same \(u_t\). The scale of each shock is interchangeable with its row of \(B\), so \(K\) of the \(K^2 + K\) parameters are not separately identified.
Effective free parameters: \(K^2\).
The gap. \(\Sigma_u\) has \(K(K+1)/2\) distinct moments, leaving
\[ K^2 - \frac{K(K+1)}{2} = \frac{K(K-1)}{2} \]
substantive restrictions to be supplied from outside.
To do estimation we need a single representative of each equivalence class, not the whole class. Two issues to handle.
Scale. The SVAR convention is to set \(\Omega_\varepsilon = I\) (equivalently, pick \(D = \Omega_\varepsilon^{-1/2}\)). This pins down the \(K\) scale parameters of \(\Omega_\varepsilon\), and the constraint becomes
\[\Sigma_u = B^{-1} B^{-1\prime}.\]
Sign: a real convention. Once scale is fixed, \(B^{-1}\) and \(-B^{-1}\) still produce the same \(\Sigma_u\). Convention: require positive diagonal entries on \(B^{-1}\).
With scale and sign pinned down, the substantive identification problem is the \(K(K-1)/2\) restrictions.
The \(K(K-1)/2\) restrictions can come from different sources. Four standard strategies:
1. Recursive (Cholesky). Order variables so earlier ones do not respond contemporaneously to later ones’ shocks. Restricts \(B^{-1}\) to be lower triangular.
2. Long-run (Blanchard–Quah). Restrict the cumulative response of certain variables to certain shocks at horizon \(\infty\).
3. Sign restrictions. Require IRFs to satisfy specific signs on impact (or for the first \(h\) horizons). Identifies a set of \(B^{-1}\), not a single point.
4. External instruments (Proxy SVAR). Use an outside variable correlated with one structural shock and orthogonal to the others. Identifies one column of \(B^{-1}\).
Idea. Restrict \(B^{-1}\) to be lower triangular.
The \(\frac{K(K-1)}{2}\) above-diagonal zeros are exactly the right number of restrictions.
With \(\Omega_\varepsilon = I\):
\[ \Sigma_u = B^{-1} B^{-1\prime}, \qquad B^{-1}\ \text{lower triangular with positive diagonal.} \]
This is the Cholesky decomposition of \(\Sigma_u\) — unique.
Let
\[ \Sigma_u = \begin{bmatrix} \sigma_{11} & \sigma_{12}\\ \sigma_{12} & \sigma_{22} \end{bmatrix}. \]
Solve \(\Sigma_u = B^{-1} B^{-1\prime}\) with \(B^{-1} = \begin{bmatrix} a & 0 \\ b & c \end{bmatrix}\):
\[ \begin{aligned} a^2 &= \sigma_{11} & \Rightarrow\quad a &= \sqrt{\sigma_{11}}, \\ ab &= \sigma_{12} & \Rightarrow\quad b &= \sigma_{12}/\sqrt{\sigma_{11}}, \\ b^2 + c^2 &= \sigma_{22} & \Rightarrow\quad c &= \sqrt{\sigma_{22} - \sigma_{12}^2/\sigma_{11}}. \end{aligned} \]
\[ \begin{aligned} u_{1t} &= a\,\varepsilon_{1t}, \\ u_{2t} &= b\,\varepsilon_{1t} + c\,\varepsilon_{2t}. \end{aligned} \]
Causal direction. On impact, \(\varepsilon_{1t}\) moves both variables; \(\varepsilon_{2t}\) moves only variable 2. Shocks flow \(1 \to 2\) within the period — never \(2 \to 1\).
The reduced-form MA representation:
\[ x_t = \sum_{j=0}^{\infty} \Phi_j\, u_{t-j}, \qquad \Phi_0 = I. \]
Substitute \(u_{t-j} = B^{-1}\varepsilon_{t-j}\):
\[ x_t = \sum_{j=0}^{\infty} \Psi_j\, \varepsilon_{t-j}, \qquad \boxed{\;\Psi_j = \Phi_j\, B^{-1}.\;} \]
\((\Psi_h)_{ik}\) = response of variable \(i\) at horizon \(h\) to a unit shock in \(\varepsilon_{kt}\). Structural impulse response function.
Two ingredients we already have:
\(\widehat\Psi_j = \widehat\Phi_j \widehat B^{-1}\) for each horizon \(j\).
In the vars package, irf(fit, ortho = TRUE, ...) does both steps and returns the IRFs with confidence bands.
\(\widehat\Psi_j\) depends on the OLS estimates \((\widehat A_1, \ldots, \widehat A_p, \widehat\Sigma_u)\) — both subject to sampling error. We need a CI at each horizon.
The standard fix: simulate the sampling distribution of \(\widehat\Psi_j\) by bootstrap, and read CIs off the simulated quantiles.
The residual bootstrap resamples \(\{u^*_t\}\) i.i.d. from \(\{\widehat u_t\}\) — fine if residuals are i.i.d. across \(t\). But often they exhibit conditional heteroskedasticity: variance varies over time (e.g., volatility clustering in financial data).
Why i.i.d. resampling fails. A high-variance residual from period A might land in low-variance period B. The bootstrap world has uniform variance across time; the data world doesn’t. CIs miscalibrate.
Wild bootstrap. Don’t resample — keep each \(\widehat u_t\) in place and multiply by a random sign:
\[u^*_t = \widehat u_t \cdot \eta_t,\qquad \eta_t \in \{-1, +1\}\ \text{i.i.d.}\]
Then \(\mathbb{E}[u^*_t] = 0\) and \(\mathrm{Var}(u^*_t \mid \widehat u_t) = \widehat u_t \widehat u_t'\) — the time-varying covariance pattern is preserved, only the sign is randomized.
Pointwise (default irf() output). At each horizon \(h\) separately,
\[\Pr\bigl[\widehat\Psi_h^{0.025} \le \Psi_h \le \widehat\Psi_h^{0.975}\bigr] \approx 0.95.\]
The catch. Joint coverage across all horizons is lower:
\[\Pr\bigl[\Psi_h \in \text{CI}_h\ \text{for all } h = 1, \ldots, H\bigr] \;<\; 0.95.\]
When the question is about IRF shape — “does it stay positive for all \(h \le 12\)?” — pointwise bands undercover.
Simultaneous bands. Wider \((L_h, U_h)\) with
\[\Pr\bigl[L_h \le \Psi_h \le U_h\ \forall\, h\bigr] \ge 0.95.\]
Constructions: Sims–Zha (1999), sup-\(t\) bands (Olea & Plagborg-Møller, 2019), Bonferroni (most conservative). Worth using whenever a claim is about the full IRF path.
For path-level claims, pointwise and simultaneous bands can give different answers.
Is the effect positive throughout \(h \le 20\)? Pointwise lower stays \(> 0\) — yes. Simultaneous lower dips below zero at long \(h\) — no. Same data, same point estimate, different conclusions: pointwise undercover for joint claims.
Bootstrap CIs presume the identification is correct. Step back to that: when does a recursive ordering actually hold?
A recursive ordering encodes a timing assumption: variables earlier in the order do not respond to variables later in the order within the same period.
Common conventions in macro:
The assumption is not in the data. If you reorder, you get a different \(B^{-1}\), different IRFs, different stories. Always report the ordering and justify it.
Take a small simulated bivariate VAR. We Cholesky-identify it under two orderings — \((v_1, v_2)\) and \((v_2, v_1)\) — and plot the impulse response of \(v_2\) to a shock in \(v_1\).
Same data, same model, different ordering, different answer.
The choice of ordering is part of the identifying assumption — the data do not pick it.
The most influential recursive-SVAR study of monetary policy. The question:
How do output, prices, and the rest of the economy respond to a contractionary monetary policy shock?
Data. Quarterly U.S. macro data, 1965Q1–1995Q4 (CEE’s sample).
Variables (in the recursive ordering CEE adopt):
Reserves = bank balances held at the Fed. Banks hold them to meet:
Two types of reserves:
The FFR is the interbank overnight reserve rate (banks lending reserves to each other). The Fed targets it by adjusting non-borrowed reserves — the main policy lever.
M2 = M1 (cash + checking) + savings + small CDs + retail money market funds — the broad money supply households and firms hold.
Transmission. Fed sells Treasuries → non-borrowed reserves contract → FFR rises → bank lending tightens → deposits and M2 fall. The fast-block IRFs trace this chain.
In simpler VARs (without \(p^{\text{com}}\)), a contractionary shock raises prices — implausible.
Sims (1992). Commodity prices are leading inflation indicators in the Fed’s information set. Omit \(p^{\text{com}}\) → “policy shock” absorbs the Fed’s preemptive response → shock correlates with realized inflation → wrong sign on \(p\).
The ordering is a timing story about within-quarter behavior:
The structural shock to the fed funds equation, isolated by this ordering, is the monetary policy shock — a quarter’s policy move that the Fed’s reaction function does not explain.
Without the ordering, the fed-funds equation residual is a mixture: a true policy surprise, plus the systematic Fed response to today’s slow-block news (output, prices, commodity prices).
With the ordering — and only with it — the mixture unscrambles:
\[ u_{\text{ffr},t} \;=\; \underbrace{\alpha_y\,\varepsilon_{y t} + \alpha_p\,\varepsilon_{p t} + \alpha_{c}\,\varepsilon_{p^{\text{com}} t}}_{\text{policy reaction to slow-block shocks}} \;+\; \underbrace{\varepsilon_{\text{ffr},t}}_{\text{policy shock}} . \]
This is the identifying restriction biting — turning a residual of mixed origin into an interpretable structural object.
Hump-shaped decline. Output falls gradually, troughs at ~4–6 quarters, and recovers within ~3 years — inverted-U, not a jump. Impact is zero by the slow-block restriction; the delayed real effect is the headline real-economy finding.
Sluggish decline. Bulk of the response arrives only after 1–2 years — well behind the output trough. Output falls before prices. Pointwise CIs straddle zero throughout — discussion next slide.
Calvo (sticky prices) predicts: small short-run response, gradual decline, full long-run adjustment.
What we see:
Why isn’t the long-run decline sharp?
Identification matters. Different rotations → different IRFs. The data don’t pick the rotation. Identification is a modeling decision.
Faster decline than \(p\) — flexible-price markets clear quickly. \(p^{\text{com}}\) is still in the slow block: the Fed reacts to it within-quarter (not that pcom itself is slow).
The policy shock itself. A one-s.d. surprise raises FFR on impact; the rate decays back over ~6–8 quarters. By construction, this response is not part of the systematic Fed reaction — it’s the residual orthogonal to slow-block contemporaneous shocks, the SVAR’s interpretation of exogenous policy.
Liquidity effect. \(\text{nbr}\) contracts sharply on impact — the Fed withdraws reserves to engineer the rate hike. (Total reserves’ response is muddier: when reserves are tight, banks under stress pay the higher discount rate to borrow from the Fed; borrowed reserves rise and partially offset the fall in \(\text{nbr}\).)
Money supply contracts. \(M_2\) declines as the policy contraction propagates — the broad monetary aggregate moves with reserves and the rate, more gradually than \(\text{nbr}\).
What FEVD measures:
Decomposition. The \(h\)-step forecast error is \(x_{t+h} - \mathbb{E}_t x_{t+h} = \sum_{j=0}^{h-1} \Psi_j\, \varepsilon_{t+h-j}\). With \(\Omega_\varepsilon = I\), variance is additive across shocks:
\[ \operatorname{Var}_h(x_{i,t+h}) = \underbrace{\textstyle\sum_{j} (\Psi_j)_{i1}^2}_{\text{shock 1}} + \cdots + \underbrace{\textstyle\sum_{j} (\Psi_j)_{iK}^2}_{\text{shock $K$}}. \]
FEVD = shock \(k\)’s piece divided by the total:
\[ \mathrm{FEVD}_{ik}(h) = \frac{\sum_{j=0}^{h-1}(\Psi_j)_{ik}^{2}}{\sum_{j=0}^{h-1}\sum_{\ell=1}^{K}(\Psi_j)_{i\ell}^{2}}. \]
Shares sum to 1; inherits meaning from the identification.
Share of forecast-error variance attributable to the FFR shock, by variable and horizon.
Share of forecast-error variance attributable to the FFR shock:
A different idea: instead of restricting impact responses, restrict long-run cumulative responses.
The cumulative effect of a shock at horizon \(\infty\) is
\[ \Psi(1) = \sum_{j=0}^{\infty} \Psi_j = \Phi(1)\, B^{-1}. \]
For a stable VAR(1), iterating gives \(\Phi_j = A_1^j\), so
\[ \Phi(1) = \sum_{j=0}^{\infty} A_1^j = (I - A_1)^{-1}. \]
(For VAR\((p)\): \(\Phi(1) = (I - A_1 - \cdots - A_p)^{-1}\).)
A long-run zero restriction like
\[ \Psi(1)_{ij} = 0 \]
says shock \(j\) has no permanent effect on the level of variable \(i\).
For \(K=2\) with one long-run zero, \(\Psi(1) = CB^{-1}\) is lower triangular — an ordering assumption, the long-run analog of recursive ordering, imposed on \(\Psi(1)\) rather than \(B^{-1}\).
Square \(\Psi(1)\):
\[ \Psi(1)\Psi(1)' \;=\; C\,\Sigma_u\,C' \;\equiv\; V, \]
where \(C = (I - A_1 - \cdots - A_p)^{-1}\). \(V\) is data. Take Cholesky: \(LL' = V \Rightarrow \Psi(1) = L\), then \(B^{-1} = C^{-1} L\).
With \(V = \begin{bmatrix} v_{11} & v_{12}\\ v_{12} & v_{22} \end{bmatrix}\),
\[ L = \begin{bmatrix} \sqrt{v_{11}} & 0 \\ v_{12}/\sqrt{v_{11}} & \sqrt{v_{22} - v_{12}^2/v_{11}} \end{bmatrix}. \]
Same Cholesky bivariate formulas as the recursive case — applied to the long-run covariance \(V\) rather than \(\Sigma_u\).
The founding long-run-restriction SVAR. The question:
What are the dynamic effects of aggregate demand and supply disturbances? How much of the variation in output and unemployment does each account for?
Data. U.S. quarterly, 1948Q1–1987Q4 (BQ’s sample).
Variables. Output growth \(\Delta y_t\) and unemployment \(U_t\). Bivariate VAR(8).
Two basic kinds of disturbance hit the economy:
The distinction matters for policy.
Output (\(Y_t\)). Long-run output is determined by supply-side fundamentals — capital, labor, productivity. Quantity theory:
\[MV = PY, \qquad Y \to Y^* \;\Longrightarrow\; \Delta M \to \Delta P.\]
Unemployment (\(U_t\)). Long-run unemployment is the natural rate \(U^*\), set by labor-market frictions, demographics, and institutions. Neither supply nor demand shocks change \(U^*\). Both shocks long-run neutral on unemployment.
The asymmetry BQ exploits: only supply moves output’s long-run level; nothing moves unemployment’s.
The asymmetry — demand transient, supply permanent — gives a single zero in the long-run cumulative response matrix \(\Psi(1)\):
\[\Psi(1)_{\Delta y,\, d} = 0\]
Permanent on output, persistent on unemployment.
Transient on output, hump on unemployment.
| Cholesky | Blanchard–Quah | |
|---|---|---|
| Restriction on: | impact (\(B^{-1}\)) | long-run sum (\(\Psi(1)\)) |
| Form: | zeros in \(B^{-1}\) | zeros in \(\Psi(1)\) |
| Information needed: | within-period timing | long-run neutrality |
| Computation: | \(\mathrm{chol}(\Sigma_u)\) | \(\mathrm{chol}(C\Sigma_u C')\) |
The two schemes encode different economic information. Use whichever is more defensible for the question at hand.
Both Cholesky and Blanchard–Quah are point-identifying: the restrictions pin down a single \(B^{-1}\).
Sign restrictions are weaker — they say only:
A monetary contraction raises the policy rate, lowers output, and lowers inflation, on impact (or for the first \(h\) horizons).
That is set identification: we keep every \(B^{-1}\) consistent with the data (\(B^{-1}B^{-1\prime} = \Sigma_u\)) and with the sign pattern. The IRF becomes a set of curves, not a single curve.
Structural identification requires \(\Sigma_u = B^{-1}(B^{-1})'\) (with \(\Omega_\varepsilon = I\)).
The Cholesky factor \(L\) is one solution. But for any orthogonal \(Q\) (\(QQ' = I\)):
\[(LQ)(LQ)' = L\, QQ'\, L' = LL' = \Sigma_u.\]
So \(\tilde B^{-1} = LQ\) is also a valid decomposition. \(Q\) parametrizes everything the data leave unidentified.
Each identification scheme is a rule for picking \(Q\):
We search the identified set by Monte Carlo over \(Q \in O(K) = \{Q : QQ' = I\}\):
Suppose we have a variable \(z_t\) — outside the VAR — that is correlated with one structural shock and only that shock:
\[ \mathbb{E}[z_t\, \varepsilon_{1t}] \neq 0, \qquad \mathbb{E}[z_t\, \varepsilon_{jt}] = 0\ \ (j \neq 1). \]
This is an instrument for \(\varepsilon_{1t}\). It directly identifies the first column of \(B^{-1}\), without imposing any timing or long-run restrictions.
Goal: recover \(b = (B^{-1})_{\bullet 1}\) — the impact response to shock \(\varepsilon_{1t}\).
Cross-moment isolates the column. From \(u_t = B^{-1}\varepsilon_t\) and the instrument’s properties (\(\mathbb{E}[z_t\varepsilon_{1t}] = \alpha\), others \(= 0\)): \[\mathbb{E}[u_t z_t] \;=\; B^{-1}\,\mathbb{E}[\varepsilon_t z_t] \;=\; b\cdot\alpha.\]
The cross-moment is \(b\) scaled by the unknown \(\alpha\).
OLS estimates this. Regress \(\hat u_t\) on \(z_t\) (no intercept): \[\hat\beta \;=\; \frac{\sum_t \hat u_t z_t}{\sum_t z_t^2} \;\xrightarrow{p}\; b\cdot c, \qquad c = \frac{\alpha}{\mathbb{E}[z_t^2]}.\]
OLS recovers \(b\) up to the unknown scalar \(c\) — direction yes, magnitude no.
Variance constraint pins \(|c|\). \(\Omega_\varepsilon = I \Rightarrow \Sigma_u = B^{-1}(B^{-1})'\), hence \(b'\,\Sigma_u^{-1}\,b = 1\). Plug in \(\hat\beta = c\cdot b\): \[\hat\beta'\,\Sigma_u^{-1}\,\hat\beta \;=\; c^2 \;\;\Rightarrow\;\; |c| = \sqrt{\hat\beta'\,\widehat\Sigma_u^{-1}\,\hat\beta}.\]
Result. \(\hat b = \hat\beta / |c|\), with sign chosen by the positive-diagonal convention from earlier.
Other columns of \(B^{-1}\) remain unidentified — fine if we only need the IRF for shock 1.
We use Proxy SVAR to identify a financial transmission channel for monetary policy that recursive VARs without a credit variable can’t show.
Setup.
Instrument. Quarterly-summed FF4 surprises from the USMPD database (Bauer-Swanson) — change in 4-month-ahead fed-funds futures in a 30-min window around FOMC events.
Spirit follows Gertler-Karadi (2015); we use the updated USMPD series in place of the original Gürkaynak-Sack-Swanson data.
FF4 is the federal funds futures contract for the 4-month-ahead month, traded on the CME.
Quote convention. The contract cash-settles at expiration to \(100 - r_{\text{realized}}\) (where \(r\) is the average realized FF rate over the contract month). By no-arbitrage: \[P_t \;\approx\; 100 - \mathbb{E}_t[r_{4\text{-month ahead}}],\] so the implied rate \(= 100 - P_t\) recovers market expectations from the price.
Trading mechanics — not a spot purchase.
The “surprise” = change in the FF4 implied rate over a 30-minute window around an FOMC announcement (in basis points; positive = rate revised up).
The 30-min window argument.
So \(z_t\) is relevant (it IS the surprise component) and exogenous (other shocks don’t move in 30 min).
Quarterly aggregation. Fed meets ~2× per quarter → 2 event-time surprises per quarter. Sum within-quarter to get the quarterly instrument \(z_t\). Aggregation preserves both validity properties.
The credit channel.
Proxy SVAR uses an external \(z_t\) to identify one shock — but the identification rests on standard IV conditions.
Exogeneity is plausible, not testable
Weak-instrument risk (relevance)
Even when exogeneity and relevance hold, the instrument identifies the response to a specific type of shock — not a generic one.
LATE-style heterogeneity
| Scheme | Restriction | What it pins down | Cost |
|---|---|---|---|
| Cholesky | Triangular \(B^{-1}\) | All shocks, given order | Strong timing |
| Long-run (BQ) | Zeros in \(\Psi(1)\) | All shocks | Long-run neutrality |
| Sign restrictions | Sign of impact / IRFs | Set of admissible IRFs | Set, not point ID |
| Proxy SVAR (IV) | Instrument exogeneity | One column of \(B^{-1}\) | Need a credible \(z_t\) |
None of these is “the right” scheme. Each makes economic content explicit. Choosing among them is a modeling decision, not a statistical one.
The general relationship between reduced-form innovations and structural shocks: \[ u_t \;=\; C(L)\,\varepsilon_t. \]
Standard SVAR is the special case \(C(L) = B\) (constant matrix). Identify \(B\) from \(\Sigma_u\) \(+\) restrictions, recover \(\varepsilon_t = B^{-1} u_t\).
If \(C(L)\) has lags but is invertible: the VAR’s \(\Phi(L)\) (with enough lags) absorbs the lag structure, leaving \(u_t = B\,\varepsilon_t\) in the limit. Standard SVAR still works.
If \(C(L)\) is non-invertible: no VAR — finite or infinite order — can recover \(\varepsilon_t\) from past \(u\)’s. This is non-fundamentality.
The canonical source of non-invertible \(C(L)\): anticipation.
From \(\Phi(L)\,y_t = C(L)\,\varepsilon_t\), the structural MA of observables is \[ y_t = M(L)\,\varepsilon_t, \qquad C(L) = \Phi(L)\,M(L). \]
\(\Rightarrow\) checking \(C(L)\) non-invertibility reduces to checking \(\det M(z)\) for inside-disk roots. The next slide does this in a GE model.
Neoclassical growth model with \(q\)-period tax foresight.
Model ingredients
Observables
Structural shocks
Solving the equilibrium (log utility, full depreciation, Cobb–Douglas production) gives, for \(q = 2\): \[ M(L) \;=\; \begin{pmatrix} 1 & 0 \\ \dfrac{1}{1 - \alpha L} & -\dfrac{\kappa(L + \theta)}{1 - \alpha L} \end{pmatrix}, \] with \(\theta = \alpha\beta(1-\tau) \in (0,1)\) and \(\kappa = (1-\theta)\tau/(1-\tau)\).
Determinant: \[ \det M(z) \;=\; -\frac{\kappa(z+\theta)}{1-\alpha z}, \qquad \text{root at } z = -\theta. \]
Since \(\theta = \alpha\beta(1-\tau) \in (0,1)\), the root sits inside the unit disk. \(M(L)\) non-invertible \(\Rightarrow C(L)\) non-invertible \(\Rightarrow\) standard SVAR fails.
Economic intuition.
Suppose we have an external instrument \(z_t\) with
Impact response. Under non-invertibility, \(u_t = C(L)\,\varepsilon_t = \sum_{j \geq 0} C_j\,\varepsilon_{t-j}\). Then \[\begin{align*} \mathrm{Cov}(u_t,\, z_t) &= \sum_{j \geq 0} C_j\,\mathrm{Cov}(\varepsilon_{t-j},\, z_t) \\ &= C_0\,\mathrm{Cov}(\varepsilon_t,\, z_t) \quad\text{(past-shock terms zero)} \\ &= (C_0)_{\bullet,\tau} \cdot \mathrm{Cov}(z_t,\,\varepsilon^\tau_t). \end{align*}\] Since \(C_0 = \Phi(0)\,M_0 = M_0\), this recovers \((M_0)_{\bullet,\tau}\) — the impact column (\(h = 0\) of the IRF).
Full IRF. For later horizons, use local-projection IV (LP-IV).
Do (given an identification):
Don’t:
Standard extensions worth knowing the names of:
Estimation:
Identification: