# Papers I Liked 2021

A list of 10 papers I read and liked in 2021. As in previous years, this is by date read rather than released or published, and selection is in no particular order. Overall, my list reflects my interests this year, prompted by research and teaching, in online learning, micro-founded macro, and causal inference, and, to the extent possible, intersections of these areas. As usual, I’m likely to have missed a lot of great work even in areas on which I focus, so absence likely indicates that I didn’t see it, or it’s on my ever expanding to read list, so ping me with your recommendations!

• Block, Dagan, and Rakhlin Majorizing Measures, Sequential Complexities, and Online Learning
• Sequential versions of metric entropy type conditions which are the bread and butter of iid data analysis extended to the setting of online estimation.
• See also: This builds on Rakhlin, Sridharan, and Tewari (2015)’s essential earlier work on uniform martingale laws of large numbers by sequential versions of Rademacher complexity. More generally, there were many advances in inference and estimation for online-collected data this year: see the papers at the NeuRIPS Causal Inference Challenges in Sequential Decision Making workshop for a few.
• Klus, Schuster, and Maundet Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
• A Koopman operator $\mathcal{K}[f_t](x)$ mapping $f\to E[f(X_{t+\tau})|X_t=x]$, is a way of summarizing a possibly nonlinear and high dimensional dynamical system using a linear operator, which allows summarization and computation using linear algebra tools. Since this is effectively an evaluation operator, it pairs nicely with kernel mean embeddings and RKHS theory, which gives precisely the properties needed to make these objects well behaved, allowing analysis in arbitrarily high dimension at no extra cost.
• See also: Budišić, Mohr, and Mezić (2009) Applied Koopmanism for an intro to Koopman analysis of dynamical systems.
• Antolín-Díaz, Drechsel, and Petrella Advances in Nowcasting Economic Activity: Secular Trends, Large Shocks and New Data
• Classic linear time series models used in forecasting, causal, and structural macroeconomics have taken a beating in the past two years with the huge fluctuations due to the pandemic. But a dirty secret known to forecasters is that black box ML models designed to be much more flexible have, if anything, an even more dismal track record. This work adding carefully specified and empirically validated mechanisms for shifts, outliers, mean and volatility changes and so on to the kind of dynamic factor models that have substantially outperformed offers a chance to substantially improve fit and handling of big shifts while retaining performance. This attention to distributional properties of macro data is surprisingly rare, and should encourage more work on understanding the sources of these features.
• Karadi, Schoenle, and Wursten Measuring Price Selection in Microdata: It’s Not There
• A venerable result in sticky price models, going back to Golosov and Lucas, is that “menu costs” of price changes ought to result in very limited actual real response of output to monetary impulses because even though costs keep prices fixed most of the time, any product that is seriously mispriced will be selected to have its price changed, so real effects should be minimal. This paper tests that theory directly using price microdata and shows that in response to identified monetary shocks, the prices that change do not appear to be those which are out of line, suggesting a much smaller selection effect than in baseline menu cost models. I liked this paper, beyond the importance of its empirical results, as a model for combining micro and macro data: to claim a microeconomic mechanism responds to an aggregate shock, your results are much more credible actually measuring variation in that shock and the micro response to it, rather than only using macro or only using micro variation.
• See also: Wolf The Missing Intercept: A Demand Equivalence Approach describing how both causal variation at the micro (cross-sectional) level and at the macro (time series) level are necessary to identify aggregate responses. This kind of hybrid approach is a welcome change which takes into account both the value of “identified moments” using microeconomic causal inference tools in macro with the reality that if you want to credibly measure aggregate causal effects, you need random variation at the aggregate level also.
• Hall, Payne, Sargent, and Szöke Hicks-Arrow Prices for US Federal Debt 1791-1930
• A time series of risk and term structure adjusted U.S. interest rates going way back, estimated using a Bayesian hierarchical term structure model, which allows handling the variety of bond issuance terms and missingness that make comparing over time using models for modern yield curves quite difficult.
• See also: Turing, the probabilistic programming language used for these results, which combines modern MCMC sampling algorithms with the full power of Julia’s Automatic Differentiation stack to allow fitting even complicated structural models with elements not standard in more statistics-specialized programming languages and benefitting from the ability of Bayes to handle inference with complicated missingness and dependence structures that become extremely challenging without it, even for simulation-based estimators.
• Callaway and Sant’Anna Difference-in-Differences with multiple time periods
• The diff-in-differdämmerung struck hard this year, with methods for handling DiD (particularly but not only with variation in treatment timing) up in the air and new papers coming out at an increasing pace. In trying to summarize at least a bit of this literature for a new class, I found this paper, and others by Pedro Sant’Anna and collaborators, crystal clear about the sources of the issues and how to resolve them, with the bonus of well-documented software and extensive examples.
• Farrell, Liang, and Misra Deep Learning for Individual Heterogeneity: An Automatic Inference Framework
• Derives influence functions and doubly robust estimators for conditional loss-based estimation allowing, e.g., nonparametric dependence of coefficients on high-dimensional inputs in Generalized Linear Models. Results are flexible enough to be widely applicable, and simple enough to be easy to implement and interpret.
• See also: Hines, Dukes, Diaz-Ordaz, Vansteelandt Demystifying statistical learning based on efficient influence functions for an overview of this increasingly essential but always-confusing topic
• Foster and Syrgkanis Orthogonal Statistical Learning
• A very general theory extending “Double Machine Learning” approach to loss function based estimation where instead of a root n estimable regular parameter, you may have a more complex object like a function (e.g., a conditional treatment effect, a policy, etc) which you want to make robust to high dimensional nuisance parameters.
• See also: I went back and reread the published version of the original “Double ML” paper to write up teaching notes, which was helpful for really thinking through the results.
• Rambachan and Shephard When do common time series estimands have nonparametric causal meaning?
• Potential outcomes for time series are a lot harder than you would think at first, because repeated intervention necessarily vastly expands the space of possible relationships between treatments, and between treatments and outcomes. This paper lays out the issues and proposes some solutions.
• See also: I based my time series causal inference teaching notes mostly on this paper.
• Breza, Kaur, and Shamdasani Labor Rationing
• How do economies respond to labor supply shocks? Breza and et al just go out there and run the experiment, setting up a bunch of factories and hiring away a quarter of eligible workers in half of 60 villages in Odisha. In peak season wages rise, like textbook theory, but in lean season, wages do nothing as it appears most workers are effectively unemployed.
• See also: The authors’ other work in the same setting testing theories of wage rigidity. For example, they find strong experimental support for Bewley’s morale theory for why employers don’t just cut wages. By running experiments at the market level, they have been able to provide a lot of compelling evidence on issues that have previously been relegated to much more theoretical debate.