Today
- Conclude Instrumental Variables
- Simultaneous Equations
- Measurement error
Instrumental Variables: Model and Estimator
- Interested in coefficients of linear model \(Y_{1i}=X_i^{\prime}\beta+u_i\) \[=\beta_0+\beta_1Y_{2i} + \beta_2Y_{3i} + \ldots + \beta_{\ell}Y_{\ell i} + \beta_{\ell+1}Z_{1i} + ... + \beta_{k+\ell}Z_{ki} + u_{i}\]
- But regressors are endogenous \[E[Y_{-1i}u_i]\neq 0\]
- Solution: Use instrumental variables \(Z_j\) \(j=0\ldots m>k+1\) that are
- Exogenous \(E[Z_{ij}u_i]=0\ \forall j=0\ldots m\)
- Relevant: no exact linear relationships among variables in \(\hat{X}\)
- Apply Two Stage Least Squares (2SLS) estimator
- Regress \(X\) on \(Z\) to get predicted \(\hat{X}\)
- Regress \(Y\) on \(\hat{X}\) to get \(\hat{\beta}^{2SLS}\)
Sources of endogeneity: Why Use 2SLS?
\[E[Y_{-1i}u_i]\neq 0\]
- One reason (Last Week): Omitted Variables
- Another: Simultaneity
- Causality runs in “wrong” direction
- \(Y_2\) causes \(Y_1\), but \(Y_1\) also causes \(Y_2\)
- Graph representing relationship is not acyclic
- Economic equilibria: system of variables mutually determined
- Ex. 1: Supply and demand
- Ex. 2: Game theory
Simultaneous equations systems (Code)
library(dagitty) #Library to create causal graphs
library(ggdag) #library to plot causal graphs
simeqgraph<-dagify(Y1~Y2+Z1,Y2~Y1+Z2) #create graph
#Set position of nodes
coords<-list(x=c(Y2 = 1, Y1 = 0, Z1=-1, Z2=2),
y=c(Y2 = 0, Y1 = 0, Z1=0,Z2=0))
coords_df<-coords2df(coords)
coordinates(simeqgraph)<-coords2list(coords_df)
#Plot causal graph
ggdag(simeqgraph)+theme_dag_blank()+
labs(title="Simultaneous System with Instruments")
Simultaneous equations systems
- Represent by system of structural equations \[Y_1=\beta_{11}Y_2+\beta_{12}Z_1+u_{1}\] \[Y_2=\beta_{21}Y_1+\beta_{22}Z_2+u_{2}\]
- E.g. Supply and demand
- \(Y_1\) is price, \(Y_2\) is quantity
- Equation 1 is supply curve
- As quantity sold increases, ask a higher price
- Equation 2 is demand curve
- As price goes up, demand less quantity
- \(Z_1\) are forces shifting supply curve: e.g., price of inputs
- \(Z_2\) are forces shifting demand curve: e.g. prices of substitutes
- \(u_1,u_2\) are unobserved supply/demand shifters
Simultaneity Bias
- Substitute equation 1 in equation 2 \[Y_2=\beta_{21}(\beta_{11}Y_2+\beta_{12}Z_1+u_{1})+\beta_{22}Z_2+u_{2}\]
- Solve for reduced form for \(Y_2\) in terms of \(Z\) \[Y_2=\frac{1}{1-\beta_{21}\beta_{11}}(\beta_{21}\beta_{12}Z_1+\beta_{22}Z_2+\beta_{21}u_{1}+u_{2})\]
- Valid if \(\beta_{21}\beta_{11}\neq 1\)
- Now consider estimating equation 1 by OLS
\[Y_1=\beta_{11}Y_2+\beta_{12}Z_1+u_{1}\]
- \(Y_2\) contains \(u_1\) and so is correlated with \(u_1\)
Handling simultaneity bias
- Similar to omitted variables
- Unobserved demand shifts directly affect quantity and also price
- Difference is that simultaneity means any term which shifts demand will affect quantity sold in addition to price,
- Control strategy not possible even if all variables observed
- If there is any variation in \(u_1\) at all, it will be correlated with \(Y_2\)
- But we CAN isolate variation due only to supply shifts \(Z_2\)
- Estimate by 2SLS with \(Z_2\) as excluded instrument
- Assumes \(Z_2\) only affects supply and not demand
General simultaneous systems
Can have a system with \(\ell\) equations and \(\ell\) endogenous variables \[Y_1=\beta_{11}Y_2+\ldots+\beta_{1\ell}Y_{\ell}+Z_1^\prime\beta_{1\ell+1}+u_{1}\] \[Y_2=\beta_{21}Y_1+\beta_{22}Y_3+\ldots+\beta_{2\ell}Y_{\ell}+Z_2^\prime\beta_{2\ell+1}+u_{2}\] \[\ldots\] \[Y_{\ell}=\beta_{\ell 1}Y_1 + \beta_{\ell 2}Y_2 +\ldots+ \beta_{\ell\ell}Y_{\ell-1}
+Z_{\ell}^\prime\beta_{\ell \ell+1} +u_{\ell}\]
- Represent full set of causal relationships between \((Y_1,\ldots,Y_\ell)\) and also effect of exogenous variables \(Z\) not determined inside the system
\(Z_1^\prime\) through \(Z_{\ell}^\prime\) may be vectors containing shared elements if one variable affects multiple variables
Estimation of simultaneous equations models
- Can estimate one equation in system by 2SLS estimation
- If reduced form exists and
- There are terms \(Z\) that enter into other equations but not directly into this one
- Need at least as many excluded \(Z\) terms as included endogenous variables \(Y\) to estimate any one equation consistently
- May only be able to estimate some equations in system
- Execute in R equation by equation using ivreg
- Command to do all at once is systemfit in systemfit library
Measurement Error
- One more source of endogeneity
- \(X^{*}\) is assigned randomly, but we don’t observe it directly
- Instead observe noisy proxy \(X=X^{*}+\epsilon\)
- \(\epsilon\perp X^{*}\) is nonsystematic error in measurement process
- Maybe survey taker sometimes writes numbers slightly off
- Maybe tool used to measure sometimes picks up unrelated noise
- We care about coefficient of \(Y=\beta_0+\beta_1X^{*} + e\)
- How bad is it to estimate instead regression of \(Y\) on \(X\)?
Measurement-error bias
- OLS Estimate is \[\beta_{OLS}=\frac{Cov(X,Y)}{Var(X)}\]
- True object of interest is \[\beta=\frac{Cov(X^*,Y)}{Var(X^*)}\] \[\beta_{OLS}=\frac{Cov(X^{*}+\epsilon,Y)}{Var(X^{*}+\epsilon)}\] \[=\frac{Cov(X^{*},Y)+Cov(\epsilon,Y)}{Var(X^{*})+Var(\epsilon)}\]
- By randomness of error, \(Cov(\epsilon,Y)=0\) \[=\frac{Cov(X^{*},Y)}{Var(X^{*})+Var(\epsilon)}\]
- \(\left|\beta_{OLS}\right|<\left|\beta\right|\)
- Magnitude smaller since divided by larger number
Intuition
- Attenuation bias
- Measurement error makes OLS estimate smaller than true effect
- Limiting case: \(X\) is pure noise: no information about \(X^{*}\) at all
- Then \(Y\) is unrelated to \(X\)
- Coefficient becomes 0
- Equivalent to a form of endogeneity
- \(Y= \beta_0 +\beta_1 X^{*} + e = \beta_0 + \beta_1 (X-\epsilon) + e = \beta_0 + \beta_1 X-\beta_1\epsilon + e = \beta_0 + \beta_1 X+u\)
- Here \(u=e-\beta_1\epsilon\), \(E[Xu]=E[X^{*} +\epsilon,e-\beta_1\epsilon]=-\beta_1 E[\epsilon^2]\neq 0\)
Solving Measurement error bias
- If we have an IV correlated with \(X^*\) but uncorrelated with the measurement error in it, we can estimate effect by IV
- Special case: 2 noisy measurements: \(X_1=X^{*}+ \epsilon_1\), \(X_2=X^{*}+\epsilon_2\) \(\epsilon_1\perp\epsilon_2\)
- \(Cov(X_1,X_2)=Var(X^{*})\neq 0\) and \(Cov(X_2,Y-X_1\beta_1)=0\), since measurement errors are independent
- Example: if survey-takers sometimes misrecord a response, survey same households twice, use one survey as IV for other
- Compare alternate strategy
- Take 2 noisy measurements and average
- Variance of measurement error decreases, but doesn’t disappear
- Unlike IV which gives estimate which gives consistent estimator even if error variance large
Instrumental Variables: Overview
- Endogeneity causes bias in OLS estimates of linear model
- Omitted variables
- Simultaneity
- Measurement Error
- IV estimator handles all issues
- Instrument \(Z\) must be exogenous and relevant
- Exclusion: no direct effect of (some) \(Z\) on \(Y_1\)
- \(Z\) correlated with endogenous regressor \(Y_{-1}\)