Causality

Causality again

Last class, discussed causal models and experiments
Potential outcomes $Y^x$ describe how $Y$ would be if $X$ changed to $x$
Can always observe $Y=Y^X$, what happened given observed $X$
With experiment, can learn distribution of $Y^x$, $P(Y|do(X=x))$
- Assigning $X$ at random implies $P(Y|do(X=x))=P(Y|X)$
Today:
- Inference in experiments
- What to do with non-experimental data

Experimental Data

Increasingly, substantial amount of economic data from experiments

Field experiments: assign social program at random
- JPAL runs hundreds of economic experiments all around the world
Businesses experiment with design, marketing, customer interactions
- Called “A/B tests” by tech companies
- Microsoft, Amazon, Facebook, Google conduct over 10,000 each/year ¹

Problems with nonexperimental data

Consider comparing average outcomes between two groups \[E[Y_i|X_i=1]-E[Y_i|X_i=0]=E[Y_i^1|X_i=1]-E[Y_{i}^{0}|X_i=0]\] Add and subtract $E[Y_i^0|X_i=1]$ \[=\stackrel{\text{ATT}}{E[Y_i^1-Y_i^0|X_i=1]}+\stackrel{\text{"selection bias"}}{(E[Y_i^0|X_i=1]-E[Y_i^0|X_i=0])}\]
First term is ATT “average treatment effect on the treated”
- For those assigned to treatment group, causal effect of the treatment
- May differ from ATE if treatment assigned to groups for whom efficacy differs
Second term is selection bias
- Difference in baseline outcome levels between group selected for treatment and group not selected
- Nonzero if treatment and control group systematically differ in ways relevant to the outcome

Example: Job Training and Earnings

US runs many job training programs for low skill workers
Goal is to get them back from bad economic situation to find higher-paying work
Is training effective? Hard to tell \[E[\text{earnings|training}]-E[\text{earnings|no training}]=\] \[E[\text{earnings change|do(training),trained}]+\] \[(E[\text{untrained earnings|trained}]-E[\text{untrained earnings|not trained}])\]
Want to know effectiveness of program
First term gives this, at least for participants
Problem is, second term is probably very negative
- People get job training because they have a bad job, or no job at all
- On average, those who get training are those with lower untrained earnings

Experiments and treatment effects

Suppose treatment assigned randomly: $(Y_i^0,Y_i^1) \perp X_i$
Independence means conditional distributions same as unconditional
Selection bias now \[E[Y_i^0|X_i=1]-E[Y_i^0|X_i=0]=E[Y_i^0]-E[Y_i^0]=0\]
ATT now \[E[Y_i^1-Y_i^0|X_i=1]=E[Y_i^1-Y_i^0]=\text{ATE}\]
By L.L.N. Difference in means consistently estimates ATE for randomized experiments
In small samples, estimate not exact
- May have drawn sample where unobseved variables differ between treatment and control groups

Random coefficients

Write potential outcomes model in more familiar form \[Y_i=Y_i^0+(Y_i^1-Y_i^0)X_i\]
Define $\beta_{0,i}=Y_i^0$, $\beta_{1,i}=Y_i^1-Y_i^0$, then \[Y_i=\beta_{0,i}+\beta_{1,i}X_i\]
Slope is treatment effect, intercept is value if not treated
Result is a linear model with random coefficients
Like linear model, but slope terms no longer constant

Relating to standard linear model

Taking averages, can write as
- $\beta_{0,i}:=\bar{\beta}_0+e_{0i}$
- $\beta_{1,i}:=\bar{\beta}_1+e_{1i}$
- $E[e_{0i}]=E[e_{1i}]=0$
Random coefficients model becomes \[Y_i=\bar{\beta}_0+\bar{\beta}_1X_i+e_{0i}+X_ie_{1i}\]
A standard linear model with heteroskedastic errors
Slope coefficient $\bar{\beta}_1$ is ATE
Endogeneity: under nonrandom assignment, residual may be correlated with $X_i$

Estimation

If X assigned randomly, $X_i\perp e_{0i}$ (no selection bias) and $X_i\perp e_{1i}$ (treatment effect independent of treatment assignment)
Model becomes standard linear model satisfying Assumptions (1-4)
$\hat{\beta}_1$ OLS estimator same as difference in means
Heteroskedasticity has meaningful interpretation
- Residual $e_{0i}+X_ie_{1i}$ has variance which depends on $X$ so long as $e_{1i}\neq 0$
- “Heterogeneous treatment effects”
OLS with robust standard errors gives valid inference on ATE for experimental data
Equivalent to two-sample t-test on difference in means with unequal variances

Example: National Supported Work Program Experiment (Code 1)

#Load package containing data set 
#see sekhon.berkeley.edu/matching/ for info
#If not installed, run following command
install.packages("Matching",
 repos = "http://cran.us.r-project.org", 
 dependencies=TRUE)
library(Matching)
data(lalonde) #Load data set
library(sandwich) #Robust SEs
library(lmtest) #testing with robust SEs
    
# Show treatment unrelated to covariates
balancereg<-lm(formula = treat ~ age + educ + black + 
          hisp + married + nodegr + re74 + re75 +
          u74 + u75, data = lalonde)

Example: National Supported Work Program Experiment (Code 2)

#Build robust standard errors
balancereg.vcov<-vcovHC(balancereg, type="HC0")
#Conduct Wald test that coefs jointly 0
#using asymptotic Chi-squared distribution
jointtest<-waldtest(balancereg, 
  vcov=vcovHC(balancereg,type="HC0"), test = "Chisq")

Example: National Supported Work Program Experiment

National Supported Work Program (analyzed in Lalonde 1986, Dehejia & Wahba 1998)
Take group of low earnings workers, randomly assign to on the job training ($X=1$) or no intervention ($X=0$)
Compare post program earnings for treated and untreated workers
Can see it is uncorrelated with observable characteristics by regressing tratment on pre-treatment covariates
A Wald test that all coefficients 0 tests a null which is implied by independence

jointtest<-waldtest(balancereg, 
  vcov=vcovHC(balancereg,type="HC0"), test = "Chisq")

Results

## Wald test
## 
## Model 1: treat ~ age + educ + black + hisp + married + nodegr + re74 + 
##     re75 + u74 + u75
## Model 2: treat ~ 1
##   Res.Df  Df  Chisq Pr(>Chisq)  
## 1    434                        
## 2    444 -10 21.564    0.01749 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Non-rejection (at 1% level) reassuring
Note that point of experiment is that treatment independent of all characteristics, observed and unobserved
These tests can’t be used to verify experiment assumption

Estimating treatment effect (Code)

#Run simple regression of outcome (Real earnings in 78) 
# on treatment in program
experiment<-lm(formula = re78 ~ treat, data = lalonde)
#Construct Robust SEs, test significance of causal effect
experiment.vcov<-vcovHC(experiment, type="HC0")
experiment.output<-coeftest(experiment,
      df=Inf,vcov. = experiment.vcov)
library(stargazer) #Display table of results
stargazer(experiment.output,type="html",style="all",
      header=FALSE,omit.table.layout="ldn",
      title="Job Training and 1978 Real Wages")

Estimating treatment effect

Setting is experimental, with binary treatment
Run OLS regression of earnings on training, with robust SEs
Coefficient has interpretation as Average Treatment Effect

**Job Training and 1978 Real Wages**

treat	1,794.343^***
	(669.316)
	t = 2.681
	p = 0.008
Constant	4,554.802^***
	(339.438)
	t = 13.419
	p = 0.000

Interpretation

Strong and significant coefficient on treatment
- Treated earn $1800 more annually on average
Treatment effect $\bar{\beta}_1$ large relative to SE
- Unlikely difference due to sampling error
Also large relative to intercept
- $\bar{\beta}_0$: average earnings of workers conditional on no training
Experimental design suggests this difference is causal
If you took more workers from same population and put them in same program, they would also earn about $1800/year more on average

Observational Data and Natural Experiments

For many questions, no experiment run
- Too costly or unethical, or just not run
Instead have observational data from observed economic process
Key to experimental setup is random assignment
- Someone or something set $X$ independently of other determinants of $Y$
Doesn’t have to be conscious
- Could be random natural event or capricious policymaker
If this happen have natural experiment
- Can again interpret effect causally
Rare, but useful to exploit when it does happen

Control

In practice, rare in observational data that relevant variable affecting $Y$ randomly assigned
Instead, have many $X_{ij}$ determined by a variety of mechanisms with causal links between them
Today: convenient special cases where regression suffices
- Next class: some more general theory

Controlling for covariates

New setup
- $Y$ is outcome of interest
- $X$ is treatment
- $W$ are pre-treatment covariates
$W$ may cause both X and Y, but are not caused by them
If we observe all the $W$ that determine assignment of treatment and might themselves affect $Y$, can still learn about effects of $X$
Instead of random assignment $Y^x\perp X$, we can weaken to conditionally random assignment \[Y^x\perp X|W\]

Example: Consumer Credit

Effect of obtaining mortgage loan $X$ on consumer spending $Y$
People who can get credit would likely have very different spending habits than those who can’t, even in world where they didn’t get loan
- “Selection Bias”: $E[Y^0|X=1]-E[Y^0|X=0]\neq 0$
But, if we have electronic loan application data, info on form, $W$, contains all characteristics that determine treatment $X$
Any variation in loan approval comes from bank decisions, made without knowledge of other individual characteristics and so independent of them
Result: $Y^x\perp X|W$
- Conditional on loan data, there should be no relationship between consumer characteristics and what spending would be if given or not given loan
Given $W$, approval is “effectively” randomly assigned

Identifying Treatment Effects $P(Y|do(x))$

Assume $Y^x\perp X|W$, and go through same identification argument as in experiment case \[P(Y|X=x,W=w)=P(Y^X|X=x,W=w)\] \[=P(Y^x|X=x,W=w)\] \[=P(Y^x|W=w)=P(Y|do(X),observe(W=w))\]
By causal consistency, conditioning, and conditional independence
Result: recover distribution of conditional outcome Y, in situations where $W$ is observed
To get unconditional effect $P(Y^x)$, integrate over w \[=\int P(Y^x|W=w) P(w)dw=\int P(Y|X=x,W=w)P(w)dw\]

Estimating Treatment Effects

Result says we can recover causal effects if we know $P(Y|X,W)$
We have tool for estimating conditional expectations
- Multivariate regression!
Impose linear model on P(Y|X,W) \[E[Y|X,W]=X\beta_1+W^\prime\beta_2\]
If c.e.f. correctly specified, $\beta_1$ is causal effect of $X$ on $Y$, given $W$ observed
Need to regress $Y$ on $X$ AND $W$
- Get unbiased estimate of causal effect
In this linear additive case, treatment effect is slope $\beta_1$
In this case, omitting $W$, or some component of it, from regression would cause bias in estimate of $\beta_1$
This is called “confounding” or “omitted variable bias”

Omitted variables again

Unbiased estimation requires correct specification of conditional expectation function
Suggests including appropriate functional forms and all possible elements of $W$ important here
We now have causal interpretation of omitted variables formula
Suppose $E[Y|X,W]=\beta_0+\beta_1X+\beta_2W$
Regression of $Y$ on $X$ alone gives $\tilde{\beta}_0+\tilde{\beta}_1X$
Regression of $W$ on $X$ gives $\delta_0+\delta_1X$
Omitted variables formula says $\tilde{\beta}_1=\beta_1+\beta_2*\delta_1$
Omitted variables bias occurs when both
1. confounder correlated with outcome ($\beta_2\neq 0$)
2. confounder correlated with treatment
Error precisely corresponds to selection bias: causal effect of treatment is to take unit with same characteristics and change status, but if units who get or don’t get treatment differ in some way that affects the outcome, comparison reflects both the treatment effect and the difference in the units selected

Covariates in binary treatment framework

Random assignment $(Y_i^0,Y_i^1) \perp X_i$ relaxed to \[(Y_i^0,Y_i^1) \perp X_i | W_i\]
Outcome is $Y_i=Y^0_i+(Y^1_i-Y^0_i)X_i$, with expectation \[E[Y_i|X_i,W_i]=E[Y^0_{i}|W_i]+E[Y^1_i-Y^0_i|W_i]X_i\]
Assume a linear CEF for $Y^0_i$: $Y^0_{i}=\bar{\beta}_0+\gamma W +e_{0i}$ with $E[e_{0i}|W]=0$
If treatment effect independent of controls, have \[Y^1_i-Y^0_i=\bar{\beta}_1+e_{1i}\]
Result is random coefficients linear regression, with ATE=$\bar{\beta}_1$ \[Y_i=\bar{\beta}_0+\bar{\beta}_1X_i+\gamma W_i+e_{0i}+X_ie_{1i}\]
Under these conditions, OLS with robust SEs recovers average treatment effect

Heterogeneous treatment effects

In general, causal effect may vary with $W$
Loan increases spending more for consumers with some characteristics than others
In structural representation, this means $W_i$ correlated with $e_{1i}$
When $W$ added to regression, it is now correlated with residual
Assume linear CEF \[Y^1_i-Y^0_i=\bar{\beta}_1+\delta W_i+e_{1i}\] structural formula now \[Y_i=\bar{\beta}_0+\bar{\beta}_1X_i+\gamma W_i+\delta W_i\times X_i+e_{0i}+X_ie_{1i}\]
Use linear regression again, with interaction terms (and robust SEs)
Average treatment effect is $\bar{\beta}_1+\delta E[W_i]$

Lessons

Causality defined in terms of model of what things could be like
Experiments can recover causal effects on average
Regression with appropriate controls can recover causal effects
Regression with missing controls causes omitted variable bias

Next class

More on causality
- Structural models
- When regression does and doesn’t work

Kohavi and Thomke (2017), Harvard Business Review↩︎

Causality

73-374 Econometrics II

Causality again

Experimental Data

Problems with nonexperimental data

Example: Job Training and Earnings

Experiments and treatment effects

Random coefficients

Relating to standard linear model

Estimation

Example: National Supported Work Program Experiment (Code 1)

Example: National Supported Work Program Experiment (Code 2)

Example: National Supported Work Program Experiment

Results

Estimating treatment effect (Code)

Estimating treatment effect

Interpretation

Observational Data and Natural Experiments

Control

Controlling for covariates

Example: Consumer Credit

Identifying Treatment Effects \(P(Y|do(x))\)

Estimating Treatment Effects

Omitted variables again

Covariates in binary treatment framework

Heterogeneous treatment effects

Lessons

Next class