Xiamen University, Chow Institute
April, 2026
Example: the Cobb-Douglas production function with a multiplicative shock
\[ Y_i = A K_i^{\alpha} L_i^{\gamma} e^{u_i} \]
is nonlinear in the parameters. But taking logs:
\[ \log(Y_i) = \log(A) + \alpha \log(K_i) + \gamma \log(L_i) + u_i \]
This is now linear and can be estimated by OLS.
Beyond linearizing multiplicative models:
We can apply the log transformation to \(y\), to \(x\), to both, or to neither. Each choice changes how \(\beta_1\) is interpreted:
| Model | Specification | Interpretation of \(\beta_1\) |
|---|---|---|
| Level-level | \(y = \beta_0 + \beta_1 x + u\) | \(\Delta y = \beta_1 \Delta x\) |
| Level-log | \(y = \beta_0 + \beta_1 \log(x) + u\) | \(\Delta y \approx (\beta_1/100)\, \%\Delta x\) |
| Log-level | \(\log(y) = \beta_0 + \beta_1 x + u\) | \(\%\Delta y \approx (100\beta_1)\, \Delta x\) |
| Log-log | \(\log(y) = \beta_0 + \beta_1 \log(x) + u\) | \(\%\Delta y \approx \beta_1\, \%\Delta x\) |
In the model \(\log(y) = \beta_0 + \beta_1 x + u\), consider a change \(\Delta x\):
\[ \log(y + \Delta y) - \log(y) = \beta_1 \Delta x \]
The left side is \(\log(1 + \Delta y / y) \approx \Delta y / y\) for small \(\Delta y / y\).
So:
\[ \frac{\Delta y}{y} \approx \beta_1 \Delta x \quad\Longrightarrow\quad \%\Delta y \approx (100\beta_1)\, \Delta x \]
This approximation relies on \(\log(1+r) \approx r\) for small \(r\).
In the model \(y = \beta_0 + \beta_1 \log(x) + u\), consider a change \(\Delta x\):
\[ \Delta y = \beta_1 [\log(x + \Delta x) - \log(x)] = \beta_1 \log\!\left(1 + \frac{\Delta x}{x}\right) \]
For small \(\Delta x / x\), using \(\log(1+r) \approx r\):
\[ \Delta y \approx \beta_1 \cdot \frac{\Delta x}{x} = \frac{\beta_1}{100} \cdot \%\Delta x \]
In the model \(\log(y) = \beta_0 + \beta_1 \log(x) + u\), consider a change \(\Delta x\):
\[ \log(y + \Delta y) - \log(y) = \beta_1 [\log(x + \Delta x) - \log(x)] \]
Applying \(\log(1+r) \approx r\) to both sides:
\[ \frac{\Delta y}{y} \approx \beta_1 \cdot \frac{\Delta x}{x} \quad\Longrightarrow\quad \%\Delta y \approx \beta_1\, \%\Delta x \]
\(\beta_1\) is the elasticity of \(y\) with respect to \(x\): the percent change in \(y\) for a one percent change in \(x\).
\[ \log(y + \Delta y) - \log(y) = \beta_1 \Delta x \quad\Longrightarrow\quad \frac{y + \Delta y}{y} = e^{\beta_1 \Delta x} \]
\[ \frac{\Delta y}{y} = e^{\beta_1 \Delta x} - 1 \quad\Longrightarrow\quad \%\Delta y = 100 \cdot \big[\exp(\beta_1 \Delta x) - 1\big] \]
\[ \widehat{\log(\text{wage})} = 0.584 + 0.083\; \text{educ} \]
\[ \widehat{\log(\text{salary})} = 4.82 + 0.257\; \log(\text{sales}) \]
If we change the units of sales, does the estimated elasticity change?
In the CEO salary regression, sales are measured in millions of dollars:
\[ \widehat{\log(\text{salary})} = 4.82 + 0.257\; \log(\text{sales}) \]
Suppose we re-measure sales in thousands of dollars. Then \(\text{sales}_{\text{new}} = 1000 \cdot \text{sales}\), and:
\[ \log(\text{sales}_{\text{new}}) = \log(1000) + \log(\text{sales}) \]
The slope \(\hat\beta_1 = 0.257\) is unchanged — only the intercept shifts.
In log-log models, the elasticity is invariant to units of measurement.
Consider the specification:
\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + u \]
The marginal effect of \(x\) on \(y\):
\[ \frac{\partial E[y \mid x]}{\partial x} = \beta_1 + 2\beta_2 x \]
Data: wage1
\[ \widehat{\text{wage}} = \underset{(0.35)}{3.73} + \underset{(0.04)}{0.298}\; \text{exper} \underset{(0.0009)}{- 0.0061}\; \text{exper}^2 \]
Marginal effect:
\[ \frac{\partial \widehat{\text{wage}}}{\partial \text{exper}} = 0.298 - 2(0.0061) \cdot \text{exper} = 0.298 - 0.0122 \cdot \text{exper} \]
Does this mean returns to experience become negative after 24 years? Not necessarily — could be OVB (e.g., omitting education) or misspecification (e.g., \(\log(\text{wage})\) may be more appropriate).
A quadratic is a local approximation — should we take it literally everywhere?
\[ \begin{aligned} \widehat{\log(\text{price})} = \underset{(0.57)}{13.39} &\underset{(0.11)}{- 0.902}\; \log(\text{nox}) \underset{(0.04)}{- 0.087}\; \log(\text{dist}) \\ &\underset{(0.17)}{- 0.545}\; \text{rooms} + \underset{(0.01)}{0.062}\; \text{rooms}^2 \underset{(0.006)}{- 0.048}\; \text{stratio} \end{aligned} \]
Marginal effect of rooms:
\[ \frac{\partial \widehat{\log(\text{price})}}{\partial \text{rooms}} = -0.545 + 0.124 \cdot \text{rooms} \]
Quadratic terms capture curvature in a single variable. But the second-order Taylor expansion of \(f(x_1, x_2)\) also introduces cross-terms:
\[ \begin{aligned} y &= f(x_1, x_2) \\ &\approx \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \beta_4 x_1^2 + \beta_5 x_2^2 + \cdots \end{aligned} \]
The interaction term \(x_1 x_2\) allows the effect of \(x_1\) to depend on \(x_2\):
\[ \frac{\partial E[y \mid x_1, x_2]}{\partial x_1} = \beta_1 + \beta_3 x_2 \]
Data: attend
\[ \begin{aligned} \widehat{\text{stndfnl}} = \underset{(1.36)}{2.05} &\underset{(0.01)}{- 0.0067}\; \text{atndrte} \underset{(0.48)}{- 1.63}\; \text{priGPA} \\ &\underset{(0.10)}{- 0.128}\; \text{ACT} + \underset{(0.10)}{0.296}\; \text{priGPA}^2 \\ &+ \underset{(0.002)}{0.0045}\; \text{ACT}^2 + \underset{(0.004)}{0.0056}\; \text{priGPA} \cdot \text{atndrte} \end{aligned} \]
Partial effect of attendance (only \(\text{atndrte}\) and its interaction appear):
\[ \frac{\partial \widehat{\text{stndfnl}}}{\partial \text{atndrte}} = -0.0067 + 0.0056 \cdot \text{priGPA} \]
Evaluating at \(\overline{\text{priGPA}} = 2.59\):
\[ \widehat{APE}_{\text{atndrte}} = -0.0067 + 0.0056 \times 2.59 = 0.0078 \]
Attendance matters more for students with higher prior GPA.
In the model \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + u\):
\(\beta_1\) gives the effect of \(x_1\) when \(x_2 = 0\).
In the attendance example, \(\beta_1 = -0.0067\) is the effect of attendance for a student with \(\text{priGPA} = 0\) — not meaningful.
We want the effect at a meaningful value of \(x_2\), such as its sample mean.
Solution: replace \(x_1 x_2\) with \((x_1 - \bar{x}_1)(x_2 - \bar{x}_2)\) in the interaction term only:
\[ y = \alpha_0 + \alpha_1 x_1 + \alpha_2 x_2 + \beta_3 (x_1 - \bar{x}_1)(x_2 - \bar{x}_2) + u \]
In the attendance example: replace \(\text{priGPA} \cdot \text{atndrte}\) with \((\text{priGPA} - \overline{\text{priGPA}}) \cdot \text{atndrte}\). Then \(\hat\alpha_1\) directly estimates the APE of attendance — and OLS reports its standard error, so we get inference for free.
We’ve seen how transformations change the model. But even without transforming, the choice of units affects the numbers we report.
Consider \(y_i = \hat\beta_0 + \hat\beta_1 x_i + \hat{u}_i\).
Question: If we change the units of \(x\) or \(y\) (e.g., dollars \(\to\) thousands of dollars), what happens to:
Adding a constant: \(x_i^* = x_i + c\)
\[ y_i = \hat\beta_0^* + \hat\beta_1^* x_i^* + \hat{u}_i^* = (\hat\beta_0^* + \hat\beta_1^* c) + \hat\beta_1^* x_i + \hat{u}_i^* \]
Matching coefficients: \(\hat\beta_1 = \hat\beta_1^*\) (unchanged), \(\hat\beta_0 = \hat\beta_0^* + \hat\beta_1^* c\).
Multiplying by a constant: \(x_i^* = a\, x_i\)
\[ \hat\beta_1 = a\, \hat\beta_1^* \quad\Longrightarrow\quad \hat\beta_1^* = \hat\beta_1 / a \]
The slope rescales inversely with the unit change.
Adding a constant: \(y_i^* = y_i + c\)
\[ \hat\beta_0^* = \hat\beta_0 + c, \quad \hat\beta_1^* = \hat\beta_1 \]
Multiplying by a constant: \(y_i^* = a\, y_i\)
\[ \hat\beta_0^* = a\, \hat\beta_0, \quad \hat\beta_1^* = a\, \hat\beta_1 \]
For the model \(y_i = \beta_0 + \beta_1 x_i + u_i\):
| Transformation | Intercept | Slope | \(R^2\) |
|---|---|---|---|
| Independent variable | |||
| \(x + c\) | \(\beta_0 - c\beta_1\) | \(\beta_1\) | unchanged |
| \(a \cdot x\) | \(\beta_0\) | \(\beta_1 / a\) | unchanged |
| Dependent variable | |||
| \(y + c\) | \(\beta_0 + c\) | \(\beta_1\) | unchanged |
| \(a \cdot y\) | \(a\beta_0\) | \(a\beta_1\) | unchanged |
Problem: How do we compare the relative importance of regressors measured in different units?
Solution: Standardize all variables to have mean zero and standard deviation one.
Original model:
\[ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + u_i \]
Standardized model:
\[ \frac{y_i - \bar{y}}{\hat\sigma_y} = \beta_1^* \frac{x_{i1} - \bar{x}_1}{\hat\sigma_1} + \beta_2^* \frac{x_{i2} - \bar{x}_2}{\hat\sigma_2} + \cdots + \beta_k^* \frac{x_{ik} - \bar{x}_k}{\hat\sigma_k} + u_i^* \]
The standardized coefficient:
\[ \beta_j^* = \frac{\hat\sigma_j}{\hat\sigma_y} \hat\beta_j \]
Interpretation: a one standard deviation increase in \(x_j\) is associated with a \(\beta_j^*\) standard deviation change in \(y\), holding all else equal.
\[ \begin{aligned} \widehat{\text{price}} = \underset{(5055)}{20{,}871} &\underset{(354)}{- 2{,}706}\; \text{nox} \underset{(33)}{- 154}\; \text{crime} \\ &+ \underset{(394)}{6{,}726}\; \text{rooms} \underset{(188)}{- 1{,}027}\; \text{dist} \underset{(127)}{- 1{,}148}\; \text{stratio} \end{aligned} \]
Standardized regression:
\[ \begin{aligned} \widehat{z_{\text{price}}} = &\underset{(0.04)}{-0.340}\; z_{\text{nox}} \underset{(0.03)}{- 0.143}\; z_{\text{crime}} \\ &+ \underset{(0.03)}{0.514}\; z_{\text{rooms}} \underset{(0.04)}{- 0.235}\; z_{\text{dist}} \underset{(0.03)}{- 0.270}\; z_{\text{stratio}} \end{aligned} \]
Lecture 4b — Dummy Variables: