Problem Setup

library(dagitty) #Library to create and analyze causal graphs
library(ggplot2) #Plotting
library(npcausal) #Obtain from https://github.com/ehkennedy/npcausal ince not on CRAN 
suppressWarnings(suppressMessages(library(ggdag))) #library to plot causal graphs
yxzdag<-dagify(Y~X+Z,X~Z) #create graph with arrow from X to Y
#Set position of nodes so they lie on a straight line
  coords<-list(x=c(X = 0, Y = 2, Z=1),y=c(X = 0, Y = 0, Z=1))
  coords_df<-coords2df(coords)
  coordinates(yxzdag)<-coords2list(coords_df)
ggdag(yxzdag)+theme_dag_blank()+labs(title="Z confounds relationship of X to Y") #Plot causal graph

Identification: Assumptions

Identification: Derivation

When might we have ignorability?

The bad old days

Estimators for \(E[Y^x]\)

Inverse Propensity Lemma

Double robustness of AIPW

Choice of Estimators for \(\pi\), \(\mu\)

Handling nuisance error: Regularity or Sample Splitting

AIPW limit distribution

Interpreting conditions

Proof that \(\sqrt{n_1}(\widehat{\gamma}^{n_1}_x-\widehat{\gamma}^{exact})=o_p(1)\)

Communicating results

Software

set.seed(42) #Reproducible numbers
n <- 100;
z <- matrix(runif(n*5),nrow=n)
b <- as.vector(c(1,1,-1,1,-1))
g1 <- rnorm(5); g2 <- rnorm(5);
pi<-exp(z%*%b)/(1+exp(z%*%b))
x <- rbinom(n,1,pi)
y <- rnorm(n,mean=x+z%*%g1+sin(z%*%g2))
#ATE by cross-fit AIPW, with weighted average of ML algorithms
ate.res <- ate(y,x,z, sl.lib=c("SL.mean","SL.gam","SL.ranger","SL.glm"))
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |======================================================================| 100%
##      parameter       est        se      ci.ll       ci.ul  pval
## 1      E{Y(0)} -1.388809 0.1867523 -1.7548431 -1.02277420 0.000
## 2      E{Y(1)} -0.255938 0.1434104 -0.5370224  0.02514642 0.074
## 3 E{Y(1)-Y(0)}  1.132871 0.2008997  0.7391072  1.52663408 0.000
# Compare pure regression and IPW estimates from same data
#Regression
(EY0reg<-mean(ate.res$nuis[,3]))
## [1] -1.425214
(EY1reg<-mean(ate.res$nuis[,4]))
## [1] -0.1762365
(ATEreg<-EY1reg-EY0reg)
## [1] 1.248978
#IPW
(EY0ipw<-mean(y*(1-x)/ate.res$nuis[,1]))
## [1] -1.445764
(EY1ipw<-mean(y*x/ate.res$nuis[,2]))
## [1] -0.2120509
(ATEipw<-EY1ipw-EY0ipw)
## [1] 1.233713

Which to use?

What to do about non-ignorability

What else can go wrong? Overlap

When is overlap condition a problem?

set.seed(1234)
n<-10000
shift <- 2
za<-rnorm(n/2,0,1)
zb<-rnorm(n/2,shift,1)
z<-c(za,zb)
xa<-rep(0,n/2)
xb<-rep(1,n/2)
X<-factor(c(xa,xb))

#Apply Bayes rule: P(X=1|Z)=P(Z|X=1)P(X=1)/(P(Z|X=1)P(X=1)+P(Z|X=0)P(X=0))
#By P(X=1)=P(X=0)=0.5 and normality obtain
probXgivenZ= 1/(1+exp(0.5*shift-z))

dataf<-data.frame(z,X,probXgivenZ)

ggplot(data=dataf)+geom_density(aes(x=z,fill=X,color=X),alpha=0.5)+
    geom_line(aes(x=z,y=probXgivenZ))+
    ylab("P(X=1|Z)")+
    ggtitle("P(Z|X=1), P(Z|X=0), and P(X=1|Z)", subtitle = paste("Normal Distributions shifted by",shift))

What to do about overlap

Finite sample overlap issues

Extensions

Conclusions

References

Colangelo, Kyle, and Ying-Ying Lee. 2020. “Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments.” arXiv Preprint arXiv:2004.03036.
D’Amour, Alexander, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. 2021. “Overlap in Observational Studies with High-Dimensional Covariates.” Journal of Econometrics 221 (2): 644–54.
Heiler, Phillip, and Ekaterina Kazak. 2021. “Valid Inference for Treatment Effect Parameters Under Irregular Identification and Many Extreme Propensity Scores.” Journal of Econometrics 222 (2): 1083–1108.
Jiang, Wei, Ashlyn Aiko Nelson, and Edward Vytlacil. 2014. “Liar’s Loan? Effects of Origination Channel and Information Falsification on Mortgage Delinquency.” Review of Economics and Statistics 96 (1): 1–18.
Kennedy, Edward. 2021. “Npcausal: R Library.” https://github.com/ehkennedy/npcausal.
Kennedy, Edward H, Zongming Ma, Matthew D McHugh, and Dylan S Small. 2017. “Non-Parametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79 (4): 1229–45.
Khan, Shakeeb, and Elie Tamer. 2010. “Irregular Identification, Support Conditions, and Inverse Weight Estimation.” Econometrica 78 (6): 2021–42.
Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer. 2015. “Prediction Policy Problems.” American Economic Review 105 (5): 491–95.
Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of Machine Learning. MIT press.
Narita, Yusuke, and Kohei Yata. 2021. “Algorithm Is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules.” arXiv Preprint arXiv:2104.12909.
Robins, James M, Miguel A Hernán, and Larry Wasserman. 2015. “On Bayesian Estimation of Marginal Structural Models.” Biometrics 71 (2): 296.
Sasaki, Yuya, and Takuya Ura. 2018. “Estimation and Inference for Moments of Ratios with Robustness Against Large Trimming Bias.” Econometric Theory, 1–47.
Van Der Vaart, Aad W, and Jon Wellner. 1996. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media.
Wager, Stefan. 2020. “Class Notes: STATS 361: Causal Inference.” Stanford University. https://web.stanford.edu/~swager/stats361.pdf.