\[0=\frac{1}{n}\sum_{i=1}^{n}\frac{d}{d\theta}\log\ell(x_i,\widehat{\theta})\approx \frac{1}{n}\sum_{i=1}^{n}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+\frac{1}{n}\sum_{i=1}^{n}\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})(\widehat{\theta}-\theta_0)\] - Rearrange, apply LLN and Slutsky \[\sqrt{n}(\widehat{\theta}^{MLE}-\theta_0)= -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\mathbb{E}[\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})])^{-1}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+o_p(1)\] - Supposing \(\Psi\) differentiable in \(\theta\), apply first order Taylor expansion again (“Delta Method”) \[\sqrt{n}(\widehat{\psi}^{MLE}-\psi_0)= -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{d\psi}{d\theta}(\mathbb{E}[\frac{d^2}{d\theta^2}\log\ell(x_i,\bar{\theta})])^{-1}\frac{d}{d\theta}\log\ell(x_i,\theta_0)+o_p(1)\]
Problem: we don’t have a correctly specified parametric model of full data distribution
Partition \(\Theta=(\theta,\beta)\) where \(\beta\) is nuisance, which may be complicated
\(\beta\) may be, eg, a conditional expectation function, a density, etc, that we don’t want to restrict
Can we avoid depending on \(\beta\) if we just care about \(\theta\)?
If \(\beta\) indexes (conditionally) independent part of model, often yes
In more general case, can’t avoid some dependence on \(\beta\)
DoubleML
in R/Python
mlr3
/scikit-learn
hdm
, kernel version (no crossfitting)
in npplreg
in np
library(DoubleML) #Cross-fitting
library(mlr3) #ML algorithms and fitting framework
library(mlr3learners) #Extra ML algorithms
library(data.table) #data format this needs for some reason
library(ggplot2) #Plotting software
lgr::get_logger("mlr3")$set_threshold("warn")
set.seed(1111)
#Choose Random Forest for nuisance function estimates
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_g = learner$clone() #Use random forest for prediction of outcome
ml_m = learner$clone() #use random forest for prediction of treatment
#Simulate data from partially linear model with effect 0.5, 20 nuisance predictors
data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
#Partially linear regression with cross-fitting
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
dml_plr_obj$fit(store_predictions=TRUE)
print(dml_plr_obj)
#Choose Random Forest for nuisance function estimates
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_g = learner$clone() #Use random forest for prediction of outcome
ml_m = learner$clone() #use random forest for prediction of treatment
#Simulate data from partially linear model with effect 0.5, 20 nuisance predictors
data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
#Partially linear regression with cross-fitting
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
dml_plr_obj$fit(store_predictions=TRUE)
print(dml_plr_obj)
## ================= DoubleMLPLR Object ==================
##
##
## ------------------ Data summary ------------------
## Outcome variable: y
## Treatment variable(s): d
## Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
## Instrument(s):
## No. Observations: 500
##
## ------------------ Score & algorithm ------------------
## Score function: partialling out
## DML algorithm: dml2
##
## ------------------ Machine learner ------------------
## ml_g: regr.ranger
## ml_m: regr.ranger
##
## ------------------ Resampling ------------------
## No. folds: 5
## No. repeated sample splits: 1
## Apply cross-fitting: TRUE
##
## ------------------ Fit summary ------------------
## Estimates and significance testing of the effect of target variables
## Estimate. Std. Error t value Pr(>|t|)
## d 0.48562 0.04127 11.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Vhat<-data$d-dml_plr_obj$predictions$ml_g
Zetahat<-data$d-dml_plr_obj$predictions$ml_g
residuals<-data.frame(Vhat,Zetahat)
ggplot(data=residuals,aes(x=Vhat,y=Zetahat))+
geom_point(alpha=0.5)+
geom_smooth(method="lm",formula="y~x")+
ggtitle("Y Residuals vs X residuals",subtitle = "Line fit by OLS on full sample")
mgcv
, gam
use splines for each
nonparametric component: see https://noamross.github.io/gams-in-r-course/DoubleMLIRM
(for “Interactive Regression Model”) with
score="ATE"
in DoubleML
)library(dplyr)
xvars<-dplyr::select(data,starts_with("X"))
library(npcausal)
kernreg<-ctseff(data$y,data$d,xvars,bw.seq = seq(.5, 2, length.out = 10),
sl.lib=c("SL.mean","SL.ranger"))
plot.ctseff(kernreg)+title("Kernel Estimate of Average Potential Outcome ")
## integer(0)