Causality
- Want to know causal effect of \(X\) on \(Y\)
- How does education affect wages?
- How does changing the price you charge affect how much your customers buy?
- How does sentencing affect crime?
Causal Models
- Data comes from a world where some variables cause other variables
- We describe these relationships by a structural model
- Potential outcomes, structural equations, or causal graphs
- In some cases, observed data allows us to learn information about (parameters of) causal model
- Last classes: 2 cases where regression can be used to find causal effects
- Today
- When can we use control?
- What to do when control not possible?
Control: Review
- In situations where control is useful, \(X\) determined by \(W\), with all other variation in \(X\) random (\(Y^x\perp X|W\))
- Causal effect of \(X\) on \(Y\) given by adjustment formula \[E(Y|do(X))=\int E(Y|X,W)P(W)dW\]
- Linear model for \(E[Y|X,W]\) \[Y =\beta_0 +\beta_{1}X +\beta_{2}W + U\]
- Correct specification of \(E[Y|X,W]\) means \(E[U|X,W]=0\)
- (Mean) independence of residual
- \(\beta_1\) gives average treatment effect (ATE) of \(X\)
d-separation in general graphs
- Want rules for arbitrary graphs describing when control is sufficient to recover causal effect
- First, define rule describing what conditioning implies for independence
- When does conditioning on set of nodes \(W\) imply for two disjoint sets of nodes \(X\) \(Y\) that \(X\perp Y|W\)?
- We say a (non-directed) path from \(X_i\) to \(Y_j\) is blocked by \(W\) if it contains either
- A collider which is not in \(W\) and of which \(W\) is not a descendant
- A non-collider which is in \(W\)
- We say \(X\) and \(Y\) are d-separated (by \(W\)) if all paths between \(X\) and \(Y\) are blocked
- Theorem (Pearl, 2009)
- \(X\perp Y|W\) if and only if \(X\) and \(Y\) are d-separated by \(W\)
Backdoor Criterion
- Using d-separation, can define a complete criterion for when control recovers \(P(Y|do(X=x))\)
- A set of variables \(W\) satisfies Backdoor Criterion between \(X\) and \(Y\) if
- No node in \(W\) is a descendant of \(X\)
- \(W\) blocks every path between \(X\) and \(Y\) that contains an edge directed into \(X\)
- Theorem (Pearl, 2009): If \(W\) satisfies the backdoor criterion between \(X\) and \(Y\), the adjustment formula recovers the causal effect of \(X\) on \(Y\) \[P(Y|do(X=x))=\int Pr(Y|W,X=x)Pr(W)dw\]
Backdoor criterion, intuition (Code)
library(dagitty) #Library to create and analyze causal graphs
library(ggdag) #library to plot causal graphs
#Check if W satisfies criterion
isAdjustmentSet(graphname,"W",exposure="X",outcome="Y")
#Find variables that satisfy criterion, if they exist
adjustmentSets(graphname,exposure="X",outcome="Y")
Backdoor criterion, intuition
- Blocking path component ensures that adjustment variables account for confounding by causes other than the cause of interest and do not introduce additional bias by inducing new correlations through colliders
- Non-descendant component avoids “bad controls” which are themselves affected by treatment
- I will not force you to recall exact definition of d-separation or backdoor criterion
- Point is that given a causal story, systematic method can recover whether control is sufficient
- In
dagitty
, check backdoor criterion between \(X\) and \(Y\) by
#Check if W satisfies criterion
isAdjustmentSet(graphname,"W",exposure="X",outcome="Y")
#Find variables that satisfy criterion, if they exist
adjustmentSets(graphname,exposure="X",outcome="Y")
Example: finding adjustment sets (Code)
complicatedgraph<-dagify(Y~A+C,B~X+Y,A~X,X~C,D~B) #create graph
#Set position of nodes
coords<-list(x=c(X = 0, A = 1, B = 1, C=1, Y=2, D=2),
y=c(X = 0, A = 0.1, B=0, C=-0.1, Y = 0, D=0.1))
coords_df<-coords2df(coords)
coordinates(complicatedgraph)<-coords2list(coords_df)
#Plot causal graph
ggdag(complicatedgraph)+theme_dag_blank()
+labs(title="Complicated Graph")
Example: finding adjustment sets
Measuring effect of X on Y
- Go back to education \(X\) versus wages \(Y\) to get intuition
- \(A\) caused by \(X\), causes \(Y\): e.g., occupation, experience
- Mediators: descendant of \(X\), so do not adjust for it
- \(B\) caused by both \(X\) and \(Y\): e.g., current wealth or lifestyle
- Colliders: descendant of \(X\), so do not adjust for it
- \(D\) caused by \(B\) only: e.g. consequences of wealth
- Descendants of collider: still causes bias when adjusted for
- \(C\), causes \(X\) and \(Y\): e.g. ability, background
- Confounder: must condition on it
- Backdoor criterion calculates this automatically
adjustmentSets(complicatedgraph,exposure="X",outcome="Y")
## { C }
Imperfect control (Code)
confoundgraph<-dagify(Y~X+W,X~W) #create graph
#Set position of nodes
coords<-list(x=c(X = 0, W = 1, Y = 2),
y=c(X = 0, W = -0.1, Y = 0))
coords_df<-coords2df(coords)
coordinates(confoundgraph)<-coords2list(coords_df)
#Plot causal graph
ggdag(confoundgraph)+theme_dag_blank()
Imperfect control
- When backdoor criterion fails, OLS will not generally give valid estimates of causal effect
- Why not?
- May have unobserved causal factors influencing both Y and X
- Can see this in confounding setting
- Can recover ATE by estimating \[Y =\beta_0 +\beta_{1}X +\beta_{2}W + U\] \[E[U|X,W]=0\]
Imperfect Control, ctd
- Suppose we don’t include \(W\)
- Maybe we just forgot about it
- Maybe it’s not something we can get data about
- If we omit \(W\), regression becomes \[Y= \beta_0+\beta_1X +\epsilon\] \[\epsilon = \beta_2W+U\]
- Result is \(Cov(X,\epsilon)\neq 0\)
- Estimate of \(\beta_1\) is biased: omitted variables bias
Endogeneity
- General case where OLS gives inconsistent estimate of parameters \[Y=\mathbf{x}^\prime\beta + u\] \[E[\mathbf{x}u]\neq 0 \]
- This is saying OLS assumption (4’) \(E[\mathbf{x}u] = 0\) violated for linear model with causal coefficients
- Here \(\beta\) contains true structural parameter of interest
- \(u\) NOT just difference between \(Y\) and best linear predictor of \(Y\)
- Omitted variables are common cause of endogeneity
- OLS estimate not consistent for \(\beta\)
An alternative: Instrumental variables (IV)
- Consider a simple regression with endogeneity \[Y=\beta_0+\beta_1X + u\] \[E[Xu]\neq 0 \]
- We don’t have controls to add to regression to solve this
- We might have some variables \(Z\) that don’t matter for \(Y\), or at least not directly
- In particular, \(Z\) affects \(X\) but not \(Y\)
- All influence of \(Z\) on \(Y\) comes via effect on \(X\)
- None through whatever is in \(u\)
- Then we may be able to find effect of \(X\) by looking only at variation in \(X\) which is coming through \(Z\)
Goal of IV estimation
- \(Z\) is randomly assigned, but \(X\) isn’t
- AND \(Z\) only has direct effect on \(X\), not \(Y\)
- The “Part of \(X\) determined by \(Z\)” is then effectively random
- Example: Effect of going to a particular school \(X\) on wages \(Y\)
- Admission to school determined by lottery \(Z\)
- Not all admitted students attend: some go to other schools
- Reasons determined by unobserved personal characteristics \(W\)
- \(X\) not random, due to missing controls
- \(Z\) is random, uncorrelated with \(W\)
- We could get effect of \(Z\) on \(Y\), but really want effect of \(X\)
- IV lets us use the randomness of \(Z\) to estimate effect of \(X\)
Causal graph of IV model (Code)
ivgraph<-dagify(Y~X+W,X~W, X~Z) #create graph
#Set position of nodes
coords<-list(x=c(X = 0, W = 1, Y = 2, Z=-1),
y=c(X = 0, W = 0.1, Y = 0, Z=0))
coords_df<-coords2df(coords)
coordinates(ivgraph)<-coords2list(coords_df)
#Plot causal graph
ggdag(ivgraph)+theme_dag_blank()
+labs(title="Instrumental Variables")
Causal graph of IV model
- No arrows into \(Z\)
- Arrow from \(Z\) to \(X\), but not to \(Y\)
- Z has direct effect on X but not on Y
- If \(W\) were observed, could use control to find effect of \(X\) on \(Y\)
- If \(W\) not observed, but \(Z\) is, can use instrumental variables estimator
Finding an instrument
- Instruments offen the result of a natural experiment
- \(X\) maybe determined by many things,
- But one of them acts effectively like an experiment
- Something like a random number generator somehow influences \(X\), among other influences
- Maybe this is an actual random number generator
- E.g. a lottery, or a coin flip
- We don’t care about effect of coin flip itself on outcome
- Want to use it estimate effect of \(X\)
- E.g., Admission lottery for school allows us to estimate effect of Enrollment on outcome (like wages)
- Even if enrollment not random
- Determined in part by unobserved preference for school
- Lottery has no direct effect on wages, but does affect enrollment
Estimation in IV Model
Let \(\beta_1\) be true causal effect. Then regression equation \[Y=\beta_0+\beta_1X + u\] has \[E[Xu]\neq 0 \]
- \(X\) is called the endogenous regressor
- We have another variable \(Z\) called the instrument
It satisfies 2 conditions: Exogeneity and Relevance
Exogeneity: \[Cov(Z,u)=0\]
Relevance: \[Cov(Z,X)\neq 0\]
- Let’s see how these let us solve the endogeneity problem
Then go back to see what they mean
One more assumption
- Model \[Y=\beta_0+\beta_1X + u\]
- Add 1 more assumption \[E[u]=0\]
- Assumption is weak
- Can always change \(\beta_0\) to equal \(\beta_0+E[u]\) if not true
- If we don’t care about \(\beta_0\), this is not a problem
- IV will still let us estimate \(\beta_1\)
Finding \(\beta_1\) using IV
- Model \[Y=\beta_0+\beta_1X + u\]
- Exogeneity \[Cov(Z,u)=0\]
- Relevance \[Cov(Z,X)\neq 0\]
- Mean 0 errors \[E[u]=0\]
Finding \(\beta_1\)
- Rewrite Assumptions using model
- Mean 0 errors \[E[Y-\beta_0-\beta_1X]=0\]
- Exogeneity \[Cov(Z,u)=E[Zu]-E[Z]E[u]=E[Zu]=0\]
- This becomes \[E[Z(Y-\beta_0-\beta_1X)]=0\]
- System of 2 linear equations in 2 unknowns
Solving the system
\[E[Y-\beta_0-\beta_1X]=0\] \[E[Z(Y-\beta_0-\beta_1X)]=0\]
This becomes \[\beta_0=E[Y]-\beta_1E[X]\]
Plug in to second equation \[E[Z((Y-E[Y])-\beta_1(X-E[X]))]=0\] \[\beta_1=\frac{E[Z(Y-E[Y])]}{E[Z(X-E[X])]}\] \[\beta_1=\frac{Cov(Z,Y)}{Cov(Z,X)}\]
IV Estimator
- We just identified the parameter in terms of features of observable distribution \[\beta_1=\frac{Cov(Z,Y)}{Cov(Z,X)}\] \[\beta_0=E[Y]-\beta_1E[X]\]
- Suppose we have random sample of n draws of \((X_i,Y_i,Z_i)\)
- We can estimate by replacing expectations with sample averages
\[\hat{\beta}_1^{IV}=\frac{\frac{1}{n}\sum_{i=1}^{n}(Z_i-\bar{Z})(Y_i-\bar{Y})}{\frac{1}{n}\sum_{i=1}^{n}(Z_i-\bar{Z})(X_i-\bar{X})}\] \[\hat{\beta}_0^{IV}=\frac{1}{n}\sum_{i=1}^{n}Y_i-\hat{\beta}_1^{IV}\frac{1}{n}\sum_{i=1}^{n}X_i\]
Relevance Condition
- We assumed relevance: \(Cov(Z,X)\neq 0\)
- Need this for estimator to have a solution, to define \[\beta_1=\frac{Cov(Z,Y)}{Cov(Z,X)}\]
- In words, instrument needs to be correlated with endogenous regressor
- This allows us to use \(Z\) say something about effect of \(X\)
Exclusion restriction
- We assumed exogeneity of \(Z\): \(E[Zu]=0\)
- Sometimes this is called Exclusion Restriction
- In words, \(Z\) cannot be correlated with the residual
- Even though, by assumption, \(X\) is
- Usually because \(Z\) has no direct causal effect on \(Y\)
- In case of omitted variable bias, means \(Z\) is uncorrelated with omitted variable
- Note difference from strategy of control:
- To control, we want to find the omitted variable and include it
- For IV, want something totally unrelated to omitted variable
- Idea: want to find a source of variation in \(X\) which is effectively random, even if \(X\) in general is determined nonrandomly
IV interpretation
- IV adjusts reduced form (quasi)experimental estimate of effect of instrument \(Z\) on outcome \(Y\) by amount experimental treatment actually affected \(X\)
- Consider 2 cases
- First stage \(\pi\) small, but total effect \(\rho\) of instrument on outcome big
- Then effect must have come from very large effect of \(X\)
- \(X\) didn’t move much even though outcome did
- First stage \(\pi\) big but total effect \(\rho\) of instrument on outcome small
- \(X\) must have small effect
- \(X\) moved a lot with the experiment but outcome \(Y\) mostly stayed the same
Example: Effect of longer prison sentences on recidivism
- Example drawn from [Green and Winik (2010, Criminology)]
- Policy question: how long should prison sentences be?
- Incarceration has many potential costs and benefits, need to know what they are to evaluate.
- To measure benefits, need measure of outcomes given longer or shorter sentences
- Outcome variable: \(Y\)
- Measure of recidivism for people on trial for a particular crime
- Variable of interest \(X\)
- Length of prison sentence given to offender
- Ideal experiment
- Randomly assign prison sentences to criminals
- Measure recidivism and regress on sentence length
- But this hasn’t been done
Endogeneity problem
- Instead, look at prison sentences assigned the usual way
- A judge using sentencing guidelines and discretion
- Endogeneity problem
- People assigned longer prison sentences likely differ from those given shorter sentences
- May be more likely to be guilty,
- Or have characteristics that make judge think they are
- Or likely to be at risk of recidivism
- If we could include all variables influencing judge’s decision and find functional form replicating their judgment, could control for them
- This is hard:
- Decisions include component based on perceive probability of reoffending
- Likely to be correlated with actual reoffending
- Instead, look for instrument
- Something affecting sentences which is random
IV solution
- In this case, can use one random component of trials:
- Judges assigned randomly to defendants
- Z is random judge assignment (say, to judge 1 or judge 2)
- Don’t care about effect of judge 1 vs judge 2 per se
- Given by reduced form regression of \(Y\) on \(Z\)
- Clear that \(Z\) satisfies exogeneity: \(E[Zu]=0\)
- Random choice of judge is unrelated to defendant characteristics
- \(Z\) satisfies relevance \(Cov[ZX]\neq0\) if 1 judge tends to give harsher sentences than the other on average for any given defendant
- Maybe judge 2 interprets sentencing guidelines in different way, or just has different temperament or judgment
- IV finds the difference in recidivism associated with variation in sentence length due to having been randomly assigned a different judge
IV Results
- Green and Winik find IV estimates close to 0
- Mechanically: effect of which judge one assigned to on recidivism rate is extremely small
- Dividing this effect by the difference in sentences due to the difference in judges, which was non-negligible, suggests that different sentences can’t have affected recidivism that much
Conclusions
- Causal graph determines conditional independence relationships
- Backdoor criterion describes when control strategy can be used to estimate treatment effect
- When proper controls not observed or included, regression coefficients have endogeneity bias
- Might still be able to estimate effect if data contains an instrument
- An instrument is a variable which leads to variation in a treatment \(X\) without directly affecting the outcome \(Y\)
- With an instrument, can use instrumental variables estimator to measure treatment effect
Next 2 classes
- More about IV
- Read Ch 15 in Wooldridge
- Read Ch 3 in Angrist and Pischke