Two Stage Least Squares and IV Inference

Today

Multivariate IV: Two Stage Least Squares
Testing IV Assumptions

Multiple Instruments

Is there a multivariate version of IV like there is for regression?
Yes, but now 2 types of multiple variables
- Multiple instruments \(Z\)
- Multiple endogenous regressors
To start: multiple instruments
Allows us to weaken random assignment of instrument to conditional random assignment
- Any case where control strategy will let us estimate causal effect of instrument, can do modified IV
Deal with constant effects linear model case only

IV model with multiple instruments

Let \(Z\) be an \(m+1\times 1\) vector of instruments
Include constant term \(Z_0=1\)
Regressor of interest, call it \(Y_2\), still endogenous scalar (correlated with residual)
Allow first \(k+1<m+1\) instruments to affect \(Y_1\) directly
- These act as control variables
Still need at least one excluded instrument with no direct effect
Model becomes \[Y_{1i}=\beta_0+\beta_1Y_{2i} + \beta_2Z_{1i} + \beta_3Z_{2i} + ... + \beta_{k+1}Z_{ki} + u_{i}\]
Endogeneity means \[E[Y_2u]\neq 0 \]
For convenience call \(X=(1,Y_2,Z_1,\ldots,Z_k)^\prime\) \[Y_{1i}=X_i^{\prime}\beta+u_i\]

Interpretation: Causal Graph (code)

#Library to create and analyze causal graphs
library(dagitty) 
library(ggdag) #library to plot causal graphs
#create graph
iv2graph<-dagify(Y1~Y2+Z1+W,Y2~Z1+Z2+W,Z2~Z1) 
#Set position of nodes 
  coords<-list(x=c(Y2 = 0, W = 1, Y1 = 2, Z1=-1, Z2=-1),
          y=c(Y2 = 0, W = 0.1, Y1 = 0, Z1=0.1,Z2=0)) 
  coords_df<-coords2df(coords)
  coordinates(iv2graph)<-coords2list(coords_df)
  #Plot causal graph
ggdag(iv2graph)+theme_dag_blank()+labs(title="Multivariate IV")

Interpretation: Causal Graph

Want effect of endogenous regressor \(Y2\) on outcome \(Y1\)
- Unobserved confounder \(W\) prevents using regression
- Even when adjusting for observed confounders \(Z1\)
Excluded instruments \(Z2\) directly affect \(Y2\) but not \(Y1\)
By adjusting for included instruments \(Z1\), effect of excluded instrument \(Z2\) on \(Y2\) can be estimated
Effect of \(Z2\) on \(Y1\) can also be estimated by controlling for \(Z1\)
Like univariate IV, but now conditional random assignment
Can have multiple included instruments \(Z1\) or excluded instruments \(Z2\)

IV assumptions

Exclusion restriction is now
- \(E[Z_ju]= 0\) for all instruments \(j=0\ldots m\)
Let’s look at special case where there is exactly one instrument not included in the stuctural equation, \(m=k+1\)
Model and exogeneity give us system of k+2 equations in k+2 unknowns \[E[Z_{ji}(Y_{1i}-\beta_0-\beta_1Y_{2i}-\beta_2Z_{1i} - ... - \beta_{k+1}Z_{ki})]=0\ \forall j=0\ldots m\]
Estimate by sample means in place of expectations
Multivariate IV estimator
Exactly standard IV in case where k=0 (no controls)
Relevance condition now says that this system has a unique solution
- Requires \(Z\) to be related to \(Y_2\)

Many excluded instruments

If \(m>k+1\), exclusion restictions give more equations than unknowns
Model said to be overidentified
Can drop any subset of Zs to just use \(k+2\) of them
- So long as relevance still holds
Or use any \(k+2\) linear combinations of \(Z\)
Let \(\tilde{Z}\) be vector of \(k+2\) linear combinations of elements of \(Z\)
- Use subsets or weighted averages of instruments
Then \[E[\tilde{Z}_i(Y_{1i}-X_i^{\prime}\beta)]=0\] holds
Solving for \(\beta\) and replacing mean with sample average gives \(\hat{\beta}^{IV}\)

Example: Incarceration and crime

Suppose we have \(m+1\) judges, and an indicator for each
\(Y_1\) is recidivism, \(Y_2\) is sentence length
With no controls, multivariate IV is just univariate IV with a particular pair of judges as IV
Can use linear combinations to get more precise estimates: compare average of judges who give lots of time to average of those who give little
With controls, get effect of incarceration on crime conditioning on (exogenous) characteristics
Adding controls can also help ensure instrument uncorrelated with residual
- Include any omitted variable which is correlated with instrument and affects outcome
- E.g., if judges assigned randomly conditional on district, and district correlated with crime, include district as an included instrument

Finding the “right” linear combination

Intuitively, by combining the multiple valid IV estimators, should get better estimate (at least if model right)
Do this by choosing “right” linear combination of instruments
Under some assumptions, there is a choice which gives smallest variance
Called Two Stage Least Squares
First, Regress \(Y_2\) on \(Z\) to get predicted value \(\hat{Y}_2=\phi_0 +\phi_1Z_1+\phi_2 Z_2+...\phi_mZ_m\)
Use this and \(Z_0\ldots Z_k\) as k+2 elements of \(\tilde{Z}\)

Implementing 2SLS

Equivalent to regressing \(Y_2\) on \(Z\) to get predicted value \(\hat{Y}_2=\phi_0 +\phi_1Z_1+\phi_2 Z_2+...\phi_mZ_m\)
Then replacing \(Y_2\) with \(\hat{Y}_2\) in structural function and running regression \[Y_{1i}=\beta_0+\beta_1\hat{Y}_{2i} + \beta_2Z_{1i} + \beta_3Z_{2i} + ... + \beta_{k+1}Z_{ki} + u_{i}\]
Denote matrix of regressors here \(\hat{X}_i\)
Coefficient on \(\hat{Y}_{2i}\) is \(\hat{\beta}_1^{2SLS}\)
Need at least one excluded instrument with non-zero first stage coefficient for \(\hat{Y}_{2i}\) to not be a linear combination of other regressors
- This is relevance or no multicollinearity condition
Caveat: Don’t do this in R by running OLS twice!
- Why? Standard errors will be wrong
- Don’t take into account first stage uncertainty
- Use \(ivreg\) command in package \(AER\)

2SLS assumptions

(2SLS1) Linear Model \[Y_{1i}=\beta_0+\beta_1Y_{2i} + \beta_2Z_{1i} + \beta_3Z_{2i} + ... + \beta_{k+1}Z_{ki} + u_{i}\]
(2SLS2) Random sampling: \((Y_{1i},Y_{2i},Z_i)\) drawn i.i.d. from population satisfying linear model assumptions
(2SLS3) Relevance
- Need at least as many instruments correlated with \(X\) as parameters
(2SLS4) Exogeneity \[E[Z_{ij}u_i]=0\ \forall j=1\ldots m+1\]
To perform inference, sometimes assume (2SLS5) Homoskedasticity \[E[u^2|Z]=\sigma^2\] (\(\sigma^2\) a finite nonzero constant)

Results

Under (1-4), 2SLS consistent
Relevance condition more complicated
Necessary conditions:
- At least one instrument is not included as a control: \(m+1>k\)
- First stage regression of \(Y_2\) on \(Z\) has nonzero coefficient on an excluded regressor
- Otherwise it is a linear combination of included \(Z\), and so there is multicollinearity
Under (1-4), asymptotically normal inference is possible
Under (1-5), 2SLS is also choice of linear combination of instruments with smallest asymptotic variance
As usual, can use Robust SEs if (5) fails

Example: Cigarette Demand with controls (Code)

#Load library containing IV command 'ivreg' 
library(AER)
# Load data on cigarette prices and quantities
data("CigarettesSW")
#Use real prices as X
CigarettesSW$rprice <- with(CigarettesSW, 
                            price/cpi)
#Use changes in cigarette tax 
# as supply curve shifting instrument
CigarettesSW$tdiff <- with(CigarettesSW, 
                           (taxs - tax)/cpi)
#data from different states in 1995
c1995 <- subset(CigarettesSW, 
                year == "1995")

Example: Cigarette Demand with controls

Predict cigarette demand controlling for income
- Income may affect sales, but also state tax policy
Use same data as last class
\(Y_1\) log Quantity Demanded
\(Y_2\) log Price
\(Z_1\) (included) income level
\(Z_2\) (excluded) state cigarette tax rates \[Y_1=\beta_0+\beta_1Y_2+\beta_2Z_1 + u\]
To run 2SLS, use \(ivreg\) in \(AER\) library, with syntax
ivreg(y1 ~ y2 + z1 + … + zk | z1 + … + zm)

Results (Code 1)

#To get IV estimate of effect of x on y using
#  z as instrument syntax is 
# ivreg(y1 ~ y2 + z1 + ... + zk | z1 + ... + zm)
# Effect of log(price) on log(quantity) 
# controlling for income
# Elasticity
fm_ivreg <- ivreg(log(packs) ~ log(rprice) + income 
                  | tdiff + income, data = c1995)
#Obtain (robust) standard errors
ivresults<-coeftest(fm_ivreg, 
         vcov = vcovHC(fm_ivreg, type = "HC0"))

Results (Code 2)

#Compare to simple IV (no controls), 
#first stage, and reduced form
fm_ols<-lm(log(packs) ~ log(rprice) + income,
           data=c1995)
fm_simpleIV<-ivreg(log(packs) ~ log(rprice) 
                  | tdiff, data = c1995)
fm_firststage<-lm(log(rprice)~tdiff + income,
                  data = c1995)
fm_reducedform<-lm(log(packs)~tdiff + income,
                   data = c1995)

Results (Code 3)

#Obtain robust standard errors for each
fm_ols.coef<-coeftest(fm_ols, 
    vcov = vcovHC(fm_ols, type = "HC0"))
fm_simpleIV.coef<-coeftest(fm_simpleIV, 
    vcov = vcovHC(fm_simpleIV, type = "HC0"))
fm_firststage.coef<-coeftest(fm_firststage, 
    vcov = vcovHC(fm_firststage, type = "HC0"))
fm_reducedform.coef<-coeftest(fm_reducedform, 
    vcov = vcovHC(fm_reducedform, type = "HC0"))

Results (Code 4)

library(stargazer)
stargazer(fm_simpleIV.coef,ivresults,fm_ols.coef,
    type="html",header=FALSE,no.space=TRUE,
    title="2SLS, OLS, Simple IV",
    column.labels=c("Simple IV"), # "2SLS","OLS",
    omit.table.layout="nl")

Results

**2SLS, OLS, Simple IV**


	2SLS	OLS	Simple IV
	(1)	(2)	(3)

log(rprice)	-0.919^***	-1.107^***	-1.084^***
	(0.312)	(0.187)	(0.312)
income	-0.000^*	-0.000
	(0.000)	(0.000)
Constant	8.974^***	9.865^***	9.720^***
	(1.487)	(0.894)	(1.496)

2SLS vs First Stage and Reduced Form (Code)

stargazer(ivresults,fm_firststage.coef,fm_reducedform.coef,
    type="html",header=FALSE,no.space=TRUE,
    title="2SLS, First Stage, and Reduced Form",
    column.labels=c("log(packs)",
                    "log(rprice)",
                    "log(packs)"), 
    omit.table.layout="n")

2SLS vs First Stage and Reduced Form

**2SLS, First Stage, and Reduced Form**

	Dependent variable:


	log(packs)	log(rprice)	log(packs)
	(1)	(2)	(3)

log(rprice)	-0.919^***
	(0.312)
tdiff		0.029^***	-0.026^***
		(0.005)	(0.010)
income	-0.000^*	0.000	-0.000^***
	(0.000)	(0.000)	(0.000)
Constant	8.974^***	4.610^***	4.738^***
	(1.487)	(0.028)	(0.063)

Multiple endogenous variables

Linear model with endogeneity \[Y_{1i}=\beta_0+\beta_1Y_{2i} + \beta_2Y_{3i} + \ldots + \beta_{\ell}Y_{\ell i} + \beta_{\ell+1}Z_{1i} + ... + \beta_{k+\ell}Z_{ki} + u_{i}\]
\(Y_{-1}=(Y_2\ldots Y_\ell)\) are endogenous regressors

\[E[Y_{-1i}u_i]\neq 0\]

More compact notation
- \(X_i=(1,Y_{2i},\ldots,Y_{\ell i},Z_{1i},\ldots,Z_{ki})^{\prime}\)
- \(Y_{1i}=X_i^{\prime}\beta + u_{i}\)
Instruments \(Z=(Z_0,Z_1,\ldots,Z_k,Z_{k+1},\ldots,Z_m)^{\prime}\) \[E[Z_{i}u_i]=0\]
\((Z_0,Z_1,\ldots,Z_k)\) are exogenous regressors, or included instruments
\((Z_{k+1},\ldots,Z_m)\) are excluded instruments

Alternate Representation of 2SLS

Consider one endogenous regressor case
Can interpret first stage as regressing ALL variables in \(X\) on \(Z\)
- \(Y_{2i}= Z_i^\prime\phi_{Y_2}+e_i\)
- \(Z_{0i}= Z_i^\prime\phi_{Z_0}+e_i\)
- \(\ldots\)
- \(Z_{ki}= Z_i^\prime\phi_{Z_k}+e_i\)
Then in second stage \(\hat{X}_i=(\hat{Z}_{0i},\hat{Y}_{2i},\hat{Z}_{1i},\ldots,\hat{Z}_{ki})\)
Note that \(\hat{Z}_{ji}=Z_{ji}\) since \(Z\) predicts itself perfectly
So this yields exactly the same predictor
We can do this also in multiple endogenous variables case

Full 2SLS

First stage
- Regress each element of \(X_i\) on \(Z_i\) by OLS
- Compute predicted values \(\hat{X}_i=(\hat{Z}_{0i},\hat{Y}_{2i},\ldots,\hat{Y}_{\ell i},\hat{Z}_{1i},\ldots,\hat{Z}_{ki})^\prime\)
Second stage
- Regress \(Y_{1i}\) on \(\hat{X}_i\) by OLS
- Need at least \(\ell-1\) excluded instruments with nonzero coefficients in at least some first stage regressions to avoid multicolinearity
- Second stage coefficients are Two Stage Least Squares estimator

2SLS assumptions

(2SLS1) Linear Model \[Y_{1i}=\beta_0+\beta_1Y_{2i} + \beta_2Y_{3i} + \ldots + \beta_{\ell}Y_{\ell i} + \beta_{\ell+1}Z_{1i} + ... + \beta_{k+\ell}Z_{ki} + u_{i}\]
(2SLS2) Random sampling: \((Y_{1i},\ldots,Y_{\ell i},Z_i)\) drawn i.i.d. from population satisfying linear model assumptions
(2SLS3) Relevance
- There are no exact linear relationships among the variables in \(\hat{X}_i=(1,\hat{Y}_{2i},\ldots,\hat{Y}_{\ell i},Z_{1i},\ldots,Z_{ki})\)
(2SLS4) Exogeneity \[E[Z_{ij}u_i]=0\ \forall j=1\ldots m+1\]
Sometimes also assume (2SLS5) Homoskedasticity \[E[u^2|Z]=\sigma^2\] (\(\sigma^2\) a finite nonzero constant)

Results

Under (1-4), 2SLS consistent
Relevance condition more complicated
Necessary conditions:
- At least \(\ell-1\) instruments not included as a control: \(m\geq k+\ell\)
- First stage regressions of \(Y_{-1}\) on \(Z\) have at least \(\ell\) nonzero coefficients on excluded regressor
- Otherwise it is a linear combination of included \(Z\), and so there is multicollinearity in second stage
Under (1-5), asymptotically normal inference is possible
Under (1-5), 2SLS is also choice of linear combination of instruments with smallest asymptotic variance
As usual, inference possible under heteroskedasticity (5 false)
Unlike OLS, 2SLS not necessarily unbiased

2SLS inference

Test values of individual coefficients using a Z test
Test multiple coefficients with a Wald test
Can also test some model assumptions (command is \(diagnostics=TRUE\) option in \(summary\) after \(ivreg\))
- Relevance
- Endogeneity
- Exclusion restrictions

Failure of relevance

What happens if (IV3) \(Cov(Z,X)\neq 0\) or (2SLS3) fails or is close to failing?
\(\hat{\beta}_1^{IV}\) gives division by 0, or close to it
2SLS estimator can’t solve system of equations
Irrelevant instrument gives undefined limit
- \(Z\) just doesn’t have any effect
If \(Z\) irrelevant or nearly so, IV involves division by something close to 0 and variable, so huge and sometimes positive, sometimes negative
Results in large standard errors and bias, finite-sample distribution with many outliers (infinite variance)
If only one endogenous regressor, can test it using F test in first stage regression of \(X\) on \(Z\)
Rule of thumb (Homoskedastic 1 variable case): F<10, estimate may be unreliable, even if n large
- Instruments are “weak”

Testing endogeneity

With # excluded instruments= # endogenous regressors, exclusion restriction not testable
- Regardless of true joint distribution of \((Y,Z)\), can always find \(\beta\) satisfying the moment conditions
If we believe IV assumptions (exclusion and relevance), we can at least test whether IV is doing any good
If \(E[Y_{-1}u]=0\), so \(Y_{-1}\) exogenous, IV and OLS will both give consistent estimates
But IV estimator will have larger variance, since instrument uses only part of variation in \(Y_{-1}\) to find \(\beta\)
(Durbin-Wu-)Hausman test uses difference between IV and OLS to test null that IV not needed against alternative that it is.

Testing IV Assumptions: Exclusion

With multiple excluded instruments, can compare IV estimate computed using different subsets of \(Z\)
If they differ more than expected due to sampling variation, something in assumptions is wrong
Either (at least one) exclusion restriction \(E[Z_j u]=0\) is wrong
Or constant effects assumption is wrong
In LATE setting, different IV’s will result in different groups of compliers
2SLS assumptions valid for linear model only if treatment effect on all subgroups identical
Can test by Sargan (or J) test: essentially a Wald test of the exclusion restrictions
Not rejecting does NOT mean model assumptions all valid
Use \(diagnostics=TRUE\) option in \(summary\) after ivreg: gives Sargan test, Hausman test, and F test for weak instruments

Conclusions

Multivariate IV model permits use of conditionally random instrument to generate conditionally random variation in endogenous regressor
2SLS can be used to estimate
- First stage OLS of \(X\) on \(Z\)
- Second stage OLS of \(Y1\) on \(\hat{X}\)
Requires exclusion and relevance, which can (sometimes) be tested
- Relevance condition testable by first stage F test,
- Validity testable if more excluded instruments than endogenous variables
- Endogeneity testable if IV valid
Next class
- More use cases for IV
- Begin Panel Data

Two Stage Least Squares and IV Inference

73-374 Econometrics II

Today

Multiple Instruments

IV model with multiple instruments

Interpretation: Causal Graph (code)

Interpretation: Causal Graph

IV assumptions

Many excluded instruments

Example: Incarceration and crime

Finding the “right” linear combination

Implementing 2SLS

2SLS assumptions

Results

Example: Cigarette Demand with controls (Code)

Example: Cigarette Demand with controls

Results (Code 1)

Results (Code 2)

Results (Code 3)

Results (Code 4)

Results

2SLS vs First Stage and Reduced Form (Code)

2SLS vs First Stage and Reduced Form

Multiple endogenous variables

Alternate Representation of 2SLS

Full 2SLS

2SLS assumptions

Results

2SLS inference

Failure of relevance

Testing endogeneity

Testing IV Assumptions: Exclusion

Conclusions