Problem Setup

Classical answer: maximum likelihood

\[0=\frac{1}{n}\sum_{i=1}^{n}\frac{d}{d\theta}\log\ell(x_i,\widehat{\theta})\approx \frac{1}{n}\sum_{i=1}^{n}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+\frac{1}{n}\sum_{i=1}^{n}\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})(\widehat{\theta}-\theta_0)\] - Rearrange, apply LLN and Slutsky \[\sqrt{n}(\widehat{\theta}^{MLE}-\theta_0)= -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\mathbb{E}[\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})])^{-1}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+o_p(1)\] - Supposing \(\Psi\) differentiable in \(\theta\), apply first order Taylor expansion again (“Delta Method”) \[\sqrt{n}(\widehat{\psi}^{MLE}-\psi_0)= -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{d\psi}{d\theta}(\mathbb{E}[\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})])^{-1}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+o_p(1)\]

Why not MLE

Plug in

Convergence

Sample Splitting

Machine Learning estimators

Some useful Machine Learning Estimators

How to avoid plug in bias

Method: “Cross-fit Double Machine Learning”

  1. \(Eq(x_i,\eta_0)=\theta_0\)
  2. \(\partial_{\eta}Eq(x_i,\eta_0)[\eta-\eta_0]=0\) (our influence function is “Neyman orthogonal”)
  3. \(\int_0^1\partial^2_{r}E[\phi(x_i,\theta,\eta_0+r(\widehat{\eta}-\eta_0))]\leq E[(\widehat{\eta}-\eta_0)^2]\) 2nd order remainder
  4. \(E[(\widehat{\eta}-\eta_0)^2]=o_p(n^{-1/2})\) MSE rate condition
  5. \(\phi(x_i,\theta,\eta^{\prime})-\phi(x_i,\theta,\eta)\leq L(x_i)\Vert\eta^{\prime}-\eta\Vert\) Moment is \(L\)-Lipschitz w.r.t. \(\eta\)

Intuition

Convergence result

Empirical Process term (b)

Influence Functions

Examples

Example: Partially linear model

Code example

library(DoubleML) #Cross-fitting
library(mlr3) #ML algorithms and fitting framework
library(mlr3learners) #Extra ML algorithms
library(data.table) #data format this needs for some reason
library(ggplot2) #Plotting software
lgr::get_logger("mlr3")$set_threshold("warn")
set.seed(1111)
#Choose Random Forest for nuisance function estimates 
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_g = learner$clone() #Use random forest for prediction of outcome
ml_m = learner$clone() #use random forest for prediction of treatment
#Simulate data from partially linear model with effect 0.5, 20 nuisance predictors
data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='data.table') 
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
#Partially linear regression with cross-fitting
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
dml_plr_obj$fit(store_predictions=TRUE)
print(dml_plr_obj)

Results

#Choose Random Forest for nuisance function estimates 
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_g = learner$clone() #Use random forest for prediction of outcome
ml_m = learner$clone() #use random forest for prediction of treatment
#Simulate data from partially linear model with effect 0.5, 20 nuisance predictors
data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='data.table') 
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
#Partially linear regression with cross-fitting
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
dml_plr_obj$fit(store_predictions=TRUE)
print(dml_plr_obj)
## ================= DoubleMLPLR Object ==================
## 
## 
## ------------------ Data summary      ------------------
## Outcome variable: y
## Treatment variable(s): d
## Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
## Instrument(s): 
## No. Observations: 500
## 
## ------------------ Score & algorithm ------------------
## Score function: partialling out
## DML algorithm: dml2
## 
## ------------------ Machine learner   ------------------
## ml_g: regr.ranger
## ml_m: regr.ranger
## 
## ------------------ Resampling        ------------------
## No. folds: 5
## No. repeated sample splits: 1
## Apply cross-fitting: TRUE
## 
## ------------------ Fit summary       ------------------
##  Estimates and significance testing of the effect of target variables
##   Estimate. Std. Error t value Pr(>|t|)    
## d   0.48562    0.04127   11.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual on Residual Plot from Simulation

Vhat<-data$d-dml_plr_obj$predictions$ml_g
Zetahat<-data$d-dml_plr_obj$predictions$ml_g
residuals<-data.frame(Vhat,Zetahat)

ggplot(data=residuals,aes(x=Vhat,y=Zetahat))+
    geom_point(alpha=0.5)+
    geom_smooth(method="lm",formula="y~x")+
    ggtitle("Y Residuals vs X residuals",subtitle = "Line fit by OLS on full sample")

Beyond a partially linear model

library(dplyr)
xvars<-dplyr::select(data,starts_with("X"))

Kernel Estimate from Simulation Data

library(npcausal)
kernreg<-ctseff(data$y,data$d,xvars,bw.seq = seq(.5, 2, length.out = 10),
        sl.lib=c("SL.mean","SL.ranger"))
plot.ctseff(kernreg)+title("Kernel Estimate of Average Potential Outcome ")

## integer(0)

More general applications

Conclusions

References

Ai, Chunrong, and Xiaohong Chen. 2003. “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions.” Econometrica 71 (6): 1795–1843.
Belloni, Alexandre, Victor Chernozhukov, Iván Fernández-Val, and Christian Hansen. 2017. “Program Evaluation and Causal Inference with High-Dimensional Data.” Econometrica 85 (1): 233–98.
Chen, Xiaohong, and Andres Santos. 2018. “Overidentification in Regular Models.” Econometrica 86 (5): 1771–1817.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney Newey. 2017. “Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review 107 (5): 261–65. https://doi.org/10.1257/aer.p20171038.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.
Chernozhukov, Victor, Whitney K Newey, and Rahul Singh. 2021. “Automatic Debiased Machine Learning of Causal and Structural Effects.” https://arxiv.org/abs/1809.05224.
Chetty, Raj. 2009. “Sufficient Statistics for Welfare Analysis: A Bridge Between Structural and Reduced-Form Methods.” Annu. Rev. Econ. 1 (1): 451–88.
Colangelo, Kyle, and Ying-Ying Lee. 2020. “Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments.” arXiv Preprint arXiv:2004.03036.
Farrell, Max H, Tengyuan Liang, and Sanjog Misra. 2021. “Deep Learning for Individual Heterogeneity.” arXiv Preprint arXiv:2010.14694.
Hines, Oliver, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. 2021. “Demystifying Statistical Learning Based on Efficient Influence Functions.” arXiv Preprint arXiv:2107.00681.
Ichimura, Hidehiko, and Whitney Newey. 2021. “The Influence Function of Semiparametric Estimators.” Quantitative Economics.
Kallus, Nathan, Xiaojie Mao, and Masatoshi Uehara. 2020. “Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond.” https://arxiv.org/abs/1912.12945.
Kennedy, Edward H, Sivaraman Balakrishnan, and Max G’Sell. 2020. “Sharp Instruments for Classifying Compliers and Generalizing Causal Effects.” The Annals of Statistics 48 (4): 2008–30.
Kennedy, Edward H, Zongming Ma, Matthew D McHugh, and Dylan S Small. 2017. “Non-Parametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79 (4): 1229–45.
Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.
Van Der Vaart, Aad W, and Jon Wellner. 1996. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media.