# Top Papers 2020

The following is a look back at my reading for 2020, identifying a totally subjective set of the top 10 papers I read this year. My reading patterns, as usual, have not been so systematic, so if your brilliant work is missing it either slipped past my attention or is living in an ever-expanding set of folders and browser tabs on my to-read list. I’ll exclude papers I refereed, for privacy purposes (a fair amount if you include conferences and also cutting out a lot of the macroeconomics from my list). Themes I focused on were Bayesian computation, the optimal policy estimation/dynamic treatment regime/offline reinforcement learning space, and survival/point process models, all more-or-less project-related and in all of which I’m sure I’m missing some foundational understanding. I spent a brief time in March mostly reading about basic epidemiology, which I am led to believe many others did as well, but didn’t take it anywhere.

Papers, in alphabetical order

• Adusumilli, Geiecke, Schilter. Dynamically optimal treatment allocation using reinforcement learning
• Approximation methods for estimating viscosity solutions of HJB equations and their resulting optimal policies policies from data. These methods will form a key step in taking continuous time dynamic macro models (see Moll lecture notes) to data.
• Andrews & Mikusheva Optimal Decision Rules for Weak GMM
• The Generalized Method of Moments defines a semiparametric estimator implicitly, making it often unclear what the form of the nuisance parameter being ignored actually is, especially in cases of irregular identification. This paper takes a middle ground between the fully Bayesian semiparametric approach which puts a (usually Dirichlet Process) prior over the infinite dimensional nuisance space and the regular frequentist approach which ignores it entirely, showing weak convergence to a Gaussian Process, which is tractable enough to characterize and apply to obtain approximate Bayesian tests and decision rules without strong identification.
• Cevid, Michel, Bühlmann, & Meinshausen Distributional Random Forests : Heterogeneity Adjustment and Multivariate Distributional Regression
• Conditional density estimation by random forests with splits by (approximate) kernel MMD distribution tests. Produces a set of conditional weights that can be used to represent and visualize possibly multivariate conditional distributions. An R package is available and this really quickly became one of my go-to data exploration tools.
• See also: Lee and Pospisil have a related method splitting by sieve $L^2$ distance tests which is more or less similar, though more tailored to low dimensional outputs.
• Gelman, Vehtari, Simpson, Margossian, Carpenter, Yao, Kennedy, Gabry, Bürkner, Modrák. Bayesian Workflow
• A comprehensive overview of what Bayesian statisticians actually do when analyzing data, as opposed to the mythology in our intro textbooks (roughly, the likelihood is given to you by God, you think real hard and come up with a prior, then you apply Bayes rule and are done). It includes all the bits of sequential model expansion and checking and computational diagnostics and compromise between simplicity, convention, and domain expertise you actually go through to build a Bayesian model from scratch. The contrarian in me would love to see more frequentist analysis of this paradigm. A lot of the checks are there to make sure you’re not fooling yourself; how well do they work in practice?
• See also Michael Betancourt’s notebooks for worked examples of this process.
• Giannone, Lenza, Primiceri Priors for the Long Run
• Exact rank constraints for cointegration are often uncertain, making pure VECM modeling a bit fraught, but standard priors on the VAR form are not strongly constraining of long run relationships, and improper treatment of initial conditions can lead to spurious inference on trends. This proposes a simple class of priors which allow “soft” constraints.
• Kallus and Uehara Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
• Characterizes the semiparametric efficiency bound for the value of a dynamic policy and provides a doubly robust estimator combining the appropriate variants of a regression statistic and a (sequential) probability weighting statistic, allowing use of nonparametric and (with sample splitting) machine learning estimates in reinforcement learning while retaining parametric convergence rates.
• See also companion papers on estimating the policy and policy gradient and extending to the case of deterministic policies (which require smoothing) among others, or watch the talk for an overview.
• Sawhney & Crane Monte Carlo Geometry Processing: A Grid-Free Approach to PDE-Based Methods on Volumetric Domains
• I don’t usually read papers in computer graphics, but I do care a lot about computing Laplacians and this paper offers a clever new Monte Carlo based method that allows computation on much more complicated domains. It’s not yet obvious to me whether the method generalizes to the PDE classes I and other macroeconomists tend to work with, but even if not it should still be handy for many applications.
• Schmelzing Eight Centuries of Global Real Interest Rates, R-G, and the ‘Suprasecular’ Decline, 1311–2018
• An enormous data collection process and public good which will be informing research on interest rates for years to come. As with any such effort at turning messy historical data into aggregate series, many contestable choices go into data selection, standardization, and normalization, and I don’t think the author’s simple trend estimates of a several hundred year decline will be the last word on the statistical properties or future implications of this data, but now that it’s out there we have a basis for discussion and testing.
• See also: lots of useful historical macro data collection (going not quite so far back) by the folks at the Bonn Macrohistory Lab.
• Wolf SVAR ( Mis- ) Identification and the Real Effects of Monetary Policy
• A nice practical application of Bayesian model checking, applying SVAR methods to simulated macro data when the (usually a bit suspect) identifying restrictions need not hold exactly. It finds that early sign-restricted BVARs with uniform (Haar) priors tend to be biased toward finding monetary neutrality, and do not in fact provide noteworthy evidence contradicting the implied shock responses of typical central bank monetary DSGEs. Of course, such models have many other problems and not being contradicted by one test is not dispositive, but macro debates would be elevated if people would check to make sure that their contradictory evidence is in fact contradictory (respectively, supportive).
• Wang and Blei Variational Bayes under Model Misspecification
• Describes what (mean field) variational Bayes ends up targeting, at least in cases where a Bernstein von Mises approximation works well enough. Also covers the much more nontrivial case with latent variables.
• I will judiciously refrain from comment on other recent works by this pair (discussion 1, 2) except to say that dimensionality reduction in causal inference deserves more study and this manifold learning approach to create a nonparametric version of interactive fixed effects estimation looks like a useful supplement the standard panel data toolbox.