Going beyond the data you have

Generalization

Improving generalization

External validity

Transportability

Selection Diagrams

Example: Covariate shift

Selection diagram: source and target differ in distribution of observed confounder Z

Modified Example: Experiment to Observation

Selection diagram: source and target differ in distribution of observed confounder Z

Example: Label Shift

Selection diagram: source and target differ in distribution of outcome Y

Anti-causal Label Shift

Selection diagram: source and target differ in distribution of outcome Y

Application: Metanalysis

Selection diagram for hierarchical metanalysis: covariates and outcomes may differ by study

Sample Selection

Modeling Sample Selection

S linked to no other variables

Examples

S linked to X but not Y

S linked to Y but not X

Selection based on covariates

S linked to Z

Estimation of selection adjustment

Outcome dependent sampling

Y associated with Z

Alternate estimates in selection model

Final application: more metanalysis

Conclusions

References

Andrews, Isaiah, and Maximilian Kasy. 2019. “Identification of and Correction for Publication Bias.” American Economic Review 109 (8): 2766–94.
Bareinboim, Elias, and Judea Pearl. 2014. “Transportability from Multiple Environments with Limited Experiments: Completeness Results.” In Advances in Neural Information Processing Systems, edited by Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger. Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/69adc1e107f7f7d035d7baf04342e1ca-Paper.pdf.
Bareinboim, Elias, Jin Tian, and Judea Pearl. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Bia, Michela, Martin Huber, and Lukáš Lafférs. 2021. “Double Machine Learning for Sample Selection Models.” https://arxiv.org/abs/2012.00745.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.
Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. 2015. “Preserving Statistical Validity in Adaptive Data Analysis.” In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, 117–26.
Foster, Dylan J., and Vasilis Syrgkanis. 2020. “Orthogonal Statistical Learning.” https://arxiv.org/abs/1901.09036.
Freund, Yoav, and Robert E Schapire. 1997. “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting.” Journal of Computer and System Sciences 55 (1): 119–39.
Hardt, Moritz, and Benjamin Recht. 2021. Patterns, Predictions, and Actions: A Story about Machine Learning. https://mlstory.org. https://arxiv.org/abs/2102.05242.
Hazan, Elad. 2019. “Introduction to Online Convex Optimization.” https://arxiv.org/abs/1909.05207.
Heckman, James. 1979. “Sample Selection Bias as a Specification Error.” Econometrica: Journal of the Econometric Society, 153–61.
———. 2010. “The Effect of Prayer on God’s Attitude Toward Mankind.” Economic Inquiry 48 (1): 234–35.
Hünermund, Paul, and Elias Bareinboim. 2019. “Causal Inference and Data Fusion in Econometrics.” arXiv Preprint arXiv:1912.09104.
Lee, David S. 2009. “Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.” The Review of Economic Studies 76 (3): 1071–1102.
Lipton, Zachary, Yu-Xiang Wang, and Alexander Smola. 2018. “Detecting and Correcting for Label Shift with Black Box Predictors.” In International Conference on Machine Learning, 3122–30. PMLR.
Mbakop, Eric, and Max Tabord-Meehan. 2021. “Model Selection for Treatment Choice: Penalized Welfare Maximization.” Econometrica 89 (2): 825–48.
Meager, Rachael. 2019. “Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments.” American Economic Journal: Applied Economics 11 (1): 57–91.
Pearl, Judea, and Elias Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science 29 (4): 579–95.
Schölkopf, Bernhard, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris M Mooij. 2012. “On Causal and Anticausal Learning.” In International Conference on Machine Learning.
Semenova, Vira. 2021. “Generalized Lee Bounds.” https://arxiv.org/abs/2008.12720.