I felt like I barely did any serious reading this year, and maybe that’s even true, but my read folder contains 168 papers for 2023, so even subtracting the ones that are in there by mistake, that’s enough to pick a few highlights. As usual, I hesitate to call these favorites, but I learned something from them. They are in no particular order except chronological by when I read them. Themes are kind of all over the place because this year has been one of topical whiplash for me. Broadly, early in the year I was reading a lot more economics, and later in the year more Machine Learning. Computational econ was a focus because I taught that class again after a 2 year hiatus and added Python. Learning Python was a bigger focus: I can say that I am now quite middling at it, which was an uphill battle. I spent the middle of the year trying to catch up with the whole language modeling thing that is apparently hot right now. A lot of the learning on each of these topics was books and classes, so I will add a section on those too.
Classes and Books
- Python, introductory
- I quite liked the QuantEcon materials for the basics, though that’s idiosyncratic to it being targeted to numerical methods in economics and to having already used the Julia materials.
- Python, advanced
- Please help me, I’m dying. Send recs. Part of it is that I still need a deeper foundation in the basics of computation (like, command line utils, not CS theory). Part of it is that the one good thing about Python, its huge community and rich library ecosystem, is also the terrible thing about it, the whole thing being a huge and ever shifting set of incompatible hacks and patches fixing basic flaws in older patches fixing basic flaws in, etc ad infinitum.
- General Deep learning
- Melissa Dell’s Harvard class is the only one I’m aware of that’s aimed at economists that will explain modern practical deep learning, including contemporary vision, text, and generative architectures, with a focus on transformers. Use this if you want to do research with text, images, documents. Taught by an economic historian, but orders of magnitude more up to date than anything by an econometrician or computational economist, including what gets published in top econ journals (which are great, but not for ML).
- Natural Language Processing
- Jurafsky and Martin, Speech and Language Processing, 3rd ed: Learn the history of NLP, up to the modern era. A lot of the old jargon remains, the methods mostly don’t. But this will explain the tasks and how we got to modern methods.
- HuggingFace Transformers is the library people actually use for text processing. This is mostly a software how to, but then again modern NLP is pretty much nothing but software, so you may as well get it directly.
- Grimmer, Roberts, and Stewart, Text as Data: Fantastic on research methods, and how to learn systematically from document corpora. Technical methods are from the Latent Dirichlet Allocation era, now charmingly dated, though their stm software will get you quite far very quickly in the exploratory phase of a project.
Papers I liked
- Russo and van Roy (2013): “Eluder Dimension and the Sample Complexity of Optimistic Exploration”
- Recommended to me as “well-written”. Foundational for interesting modern work in bandits and RL.
- García-Trillos, Hosseini, Sanz-Alonso “From Optimization to Sampling Through Gradient Flows”
- A quick and readable explanation of how Langevin-based sampling algorithms are just gradient descent in the right space: over the past two years I’ve caved in to the optimal transport bandwagon. For a comprehensive overview, see the monograph by Sinho Chewi or the Simons Program, especially the bootcamp lectures by Eberle.
- Bouscasse, Nakamura, Steinsson “When Did Growth Begin?
New Estimates of Productivity Growth in England from 1250 to 1870”
- Structural Bayesian estimation of a neo-Malthusian model of English population and wage history. Modeling here both allows transparent interpretation of data and expression of many sources of uncertainty in historical series that often go unacknowledged. On these issues, as my favorite paper title of the year put it, “We Do Not Know the Population of Every Country in the World for the Past Two Thousand Years”
- Kovachki, Li, Liu, Azizzadenesheli, Bhattacharya, Stuart, Anandkumar, JMLR (2023) “Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs”
- Learning of nonlinear operators (maps with functions as input and output), as opposed to linear ones, has been a weak spot of functional data analysis. Neural operator architectures are part of a class of methods that are usable in the setting. Applications include speeding up massive scientific models, generative models of functions, etc.
- Mikhail Belkin, Acta Numerica (2021) “Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation”
- Since Bartlett (1997) and re-emphasized by Zhang et al (2016), we’ve known classical learning theory doesn’t quite work for neural networks in the modern regime. They are overparameterized, interpolate (“overfit”) the training data, do not converge uniformly, and bounds based on theories like VC or Rademacher complexity are typically vacuous. But they seem to generalize fine. We’re still assembling the story here, and I don’t think it’s completely stitched up, but this gives a good overview of the problems and elements of the solutions (data dependent bounds, selecting good global minima among the many that exist by some aspect of the training dynamics), and some precise results in the NTK regime.
- See also work on PAC-Bayes bounds by people in the Gordon-Wilson lab, with a different and more promising data-dependent approach: see eg “PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization” or “The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning”
- Hu and Laurière “Recent Developments in Machine Learning Methods for Stochastic Control and Games”
- Survey on the Neural PDEs literature for optimal control and mean field games. The applications where neural networks improve upon classical numerical methods are currently being scoped out, but they seem useful in certain high dimensional situations that have eluded traditional techniques (specifically, inequality with portfolio choice, aggregate risk, and aging).
- Egami, Hinck, Stewart, Wei “Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning”
- You can and should use classical semiparametric techniques with sample-splitting to get confidence intervals when using large language models. The methods are old and well established, but LLM users need to hear it. See also Zrnic and Candès and Mozer and Miratrix who also suggested exactly the same estimator (literally the same formula in all 3 papers), but who cares, any good idea should be published multiple times.
- Lew, Tan, Grand, Mansinghka: “Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs”
- Language models like LLaMA are autoregressive probability models of sequences. You should be able to do all kinds of sampling algorithms on that sequence, not just the typical beam search with some penalties. Full Bayesian inference by filtering is just one example: see also work like “The Consensus Game: Language Model Generation via Equilibrium Search” which computes a Nash equilibrium over language output. All of this is greatly facilitated by having the actual probabilities output by the model and requires many samples, so own a lot of GPUS or use a small model, but this is promising that future inference will look very different from current practice.
- David Donoho “Data Science at the Singularity”
- Old man yells at cloud computing. Kind of an opinion piece: one of the top scientists of the previous generation of ML on how the real secret to modern ML success is nothing about theory or methods but a research paradigm of “frictionless reproducibility” and ceaseless competition. See also Ben Recht’s running commentary on his ML class from a related perspective.
- Bengs, Busa-Fekete, El Mesaoudi-Paul, Hüllermeier JMLR (2021) “Preference-based Online Learning with Dueling Bandits: A Survey”
- Learning from comparisons, rather than numerical values, leads to a field that combines bandits, sorting algorithms, voting theory, and preference estimation, and a dazzling array of algorithms based on each of these perspectives. This work touches on the issues that arise when trying to figure out, for example, what it is that “Reinforcement Learning with Human Feedback” is optimizing language model output for.