Aggregate shocks in cross-sectional data, or the alternative to a macroeconomic model isn't no macroeconomic model, it's a bad macroeconomic model

Inspired by the release of a new and quite clear explainer on the topic by Hahn, Kuersteiner, and Mazzocco (HKM) amid a growing trend of using microeconomic data to learn about macroeconomic or aggregate effects, I believe it’s a good time to write something about what microeconometricians and applied microeconomists ought to know about dealing with aggregate effects. Broadly, this refers to any time-dependent variability in a data-generating process that can’t be modeled is independent across individual observations. In economic data, this usually comes about from changes in prices, preferences, technologies, information, or institutions shared by all units of observation. Depending on the setting and the statistical methods used, even if one’s primary interest is in purely individual-level variation, this variability can affect the identification and estimation of micro-level parameters.

The primary way such effects are handled in microeconomic research, especially based on cross-sectional data, is to ignore them completely. The second most common way to handle variability in aggregates, when using panel or repeated cross-section data, is to include a set of time dummies or time trend in the set of regressors. An even more sophisticated approach is to also allow for time-varying standard errors in the context of heteroskedasticity-robust inference. While each of these approaches is often reasonable, they implicitly embed assumptions regarding the aggregate environment which require justification if the estimates are to have a meaningful interpretation.

Consider the most common option, simply not incorporating any aggregate effects. If the sample is drawn from a specific population at a specific time and so all faces the same aggregate environment and a parameter is estimated from the variability within the population, a common interpretation is that the estimate is valid ‘conditional on’ the environment. Whether the estimate will hold outside of this environment is a question purely of external validity. Formally, using the notation of HKM, with units \(i=1\ldots n\) all observed at time \(t\), a population of samples \(y_{it}\) is drawn from a distribution \(f(y_{it}|\nu_t)\) where \(\nu_t\) represents the characteristics of the time and environment in which the sample is drawn, such a cross-sectional study can conceivably identify ‘parameters’ \(\gamma(f)\) which are functionals of the distribution \(f(y_{it}|\nu_t)\). These functionals will in general depend on \(\nu_t\). If they are estimated by a procedure with standard limiting properties, Andrews (2005) formalizes the argument that they are consistent ‘conditional on’ $ _t$ by applying the machinery of stable convergence, which provides a variant of weak convergence in which the limit distribution, instead of, being, say, normal, is normal conditional on the information in \(\nu_t\).1 If the functional of interest does not depend on \(\nu_t\), or depends only on a subvector of \(\nu_t\) which is shared with the target environment about which one wants to make a prediction, then the estimate can be considered to have external validity, otherwise it may not, which may or may not be a problem. For example, it may be that we are specifically interested in recovering some aspect of \(\nu_t\), like the return to education, which is determined by the structure of the labor market.

The above however is the most optimistic case, for two reasons. The first is that even if the true parameter is not a function of \(\nu_t\), its estimator might be, and so inference may be affected. Andrews provides limit theory for this case for linear regression and several similar estimators. More worryingly, one is often interested in parameters for which identification is affected by the presence aggregate uncertainty. HKM offer a simple example about educational choices, archetypal for a broad class of problems in which one models decision-making. The basic idea is that we are interested in some variable over which individuals have a choice, like college major or corporate fixed capital expenditure, either because we are interested in how the decision is made or because we want to study the effects of this variable and do not have (quasi-)experimental variation. Standard practice then is to assume that the people making the decision are reasonably well-informed and make the decision with potential costs and benefits in mind, and try to infer those (potentially subjective) costs and benefits by observing the decisions and the outcomes thereof. Letting the decision variable be \(y^1_{it}\) and other individual-level outcomes and characteristics as \(y^2_{it}\), and the marginal benefits and costs as functions of these variables, up to unknown parameters \(\theta\), optimal decisions are characterized by the first order condition \(E[MB(y^1_{it},y^2_{it},\theta)]=E[MC(y^1_{it},y^2_{it},\theta)]\). The assumption that agents are reasonably well informed is often translated to assuming that the distribution of uncertainty in the outcomes of the choice matches the true distribution of outcomes (up to some distortions parameterized by the subjective cost and benefit functions). If this is the case, the parameters can be estimated by replacing the expectations with the empirical measure and finding the parameters which minimize the deviation from equality. This is the generalized method of moments, and this application is its raison d’être, for better or worse the source of interest in this method by econometricians.2

So, what does including an aggregate shock \(\nu_t\) do to this application? That depends on what it affects. Obviously if it enters directly into cost and benefit functions and can’t be represented by \(\theta\), this is an example of misspecification and causes bias, but there’s nothing particularly unique to this pitfall about it being an aggregate variable. The more interesting case is when the moments are correctly specified but the expectation is with regards to a set of outcomes which are affected by the aggregate uncertainty. The idea being, students decide to major in computer science, perceiving the expected wages to be high, but by the time they graduate, the industry slows down and the actual wages are not so high, or companies cut investment, perceiving that future sales will be low, but then demand picks up and the product is highly profitable. This could lead the naive researcher to two possible conclusions. If the cost and benefit functions are flexibly parameterized, the observation that most people made a decision with low observed returns could lead to estimates of subjective or unobserved benefits which are large, or costs which are low. The GMM estimates will say ‘students just really love computer science, so they take it even though the extrinsic rewards aren’t great.’ Alternately, if fewer free parameters are provided, this could lead to the model specification being rejected, with the possible conclusion that individuals are not making informed decisions: students just don’t have a clue what their pay will be when they graduate and so choose majors which don’t pay. In both cases, the conclusion would be changed if the expectation were over a measure containing uncertainty in \(\nu_t\): the cross section estimates converge to \(E[MB(y^1_{it},y^2_{it},\theta)-MC(y^1_{it},y^2_{it},\theta)|\nu_t]\), which need not equal 0, even if \(E_\nu [E[MB(y^1_{it},y^2_{it},\theta)-MC(y^1_{it},y^2_{it},\theta)|\nu_t]]=0\).

In the above example, in addition to more serious examples where the variation in aggregates must be incorporated explicitly into the model to be correctly specified, the solution is to use information on variation in the aggregates.3 A simple way to do this is with long panels or repeated cross sections. A few words are in order regarding the asymptotics in these cases. An important point, emphasized by HKM, is that parameters identified by variation in aggregate variables generally have a distribution theory which is dominated by the aggregates. Consider the GMM example above, where aggregate variables do not enter explicitly into the formulation of the estimator at all. If the conditional distribution of the idiosyncratic variables is allowed to be affected by the aggregates in an arbitrary manner, the convergence of the empirical measure \(\frac{1}{NT}\sum_{t=1}^T\sum_{i=1}^N g(y_{it},\theta)\) to the joint measure is at a rate which depends only on T! That is, if the aggregate variables are stationary and, say, mixing, and both N and T approach infinity, the error in the approximation is \(O(\sqrt{T})\). N doesn’t enter at all, so long as it grows to infinity.4 The reason for this is pretty simple: these variables are not independent over i and t: all variables at a given time are drawn from a distribution which depends on \(\nu_t\), and so effectively, each cross-section counts as a single observation of an aggregate.

The case of completely arbitrary dependence creates difficulty for empirical researchers, given the relative rarity of data sets with long duration. While some effects may only be identified by this kind of variation (especially those which are “really” macroeconomic, like responses to aggregate variables, even if at the individual level), one can learn a lot from cross-sectional variation even in the presence of aggregate shocks so long as there exists some kind of structure to the relationship. “Independence” is a very strong structure, but can reasonably be conjectured for some objects which have no reason to be affected by aggregate variables, or at least ones that vary at the scales one is interested in. More generally, additive effects constant across units are often imposed in estimation: in the case of panel data, this allows aggregate effects to be purged by the use of time dummies. Dropping the assumption of constant effect across units, time dummies and heteroskedasticity robust variance together allow purging of purely additive effects. This assumption is quite powerful, allowing rates of convergence to go from depending only on T to depending only on N (in certain cases: in nonlinear models, purely idiosyncratic variability may induce dependence of estimates on T). Still, it is a strong assumption since it rules out any interaction of aggregates with other variables. Structured interactions (multiplication with a time dummy) can also be handled with little cost in rates of convergence.

If more general relationships, such as linear factor structure, are desired, identification generally requires growth in both N and T. Here aggregate shocks are allowed to have systematically different effects on different units, but effects are linear and dimensionality of the heterogeneity is restricted. This particular structure and its variants have spawned a large literature: the case in which the factors and their effects are a nuisance parameter and the object of interest is a regression coefficient for an observable covariate is referred to as the ‘interactive fixed effects’ model: see the many contributions of H.R. Moon on this topic. If one thinks of unit fixed effects as a type of permanent temporal dependence unchanged at all times for an individual and time effects as a type of cross sectional dependence unchanged at all times across individuals, interactive fixed effects allow for combinations of these two extremes. If one is willing to assume all microeconomic and macroeconomic effects can be subsumed within a linear model, this offers very substantial generality.

It is also popular to overcome these issues by imposing a structure which allows sequential identification of parameters regarding the cross-section which can then be used in a time series context, or vice versa. These methods are particularly popular in the fields of finance and accounting, where reasonably long panel data sets are available for assets or firms, and a number of heuristic approaches to multi-step inference have been developed. These are often called ‘Fama-MacBeth regressions’ after the approach used by Fama and MacBeth in tests of the CAPM, which involved running time-series regressions for each stock to find its beta and then using the estimated betas as data in a cross-section regression of returns on beta.

The case covered by HKM imposes a similar but more general class of structures, with the advantage of allowing any kind of nonlinearity in the cross-sectional and aggregate effects and also the advantage of properly accounting for the aggregate uncertainty when constructing standard errors, which standard approaches often ignore. To be precise, what they impose is a kind of separability assumption: given one data set with a large cross section but short T and one data set with a long span but only aggregate variables, one can identify the full set of model parameters. The examples they give have the kind of ‘triangular’ identification structure as in Fama-MacBeth, in which a subset of parameters are identifiable from one data set, and then once those parameters are known the other parameters are identifiable from the other, but the high level conditions they impose don’t require this, and the methods allow cases in which, for example, the cross section identifies the sum of two parameters and the time series their difference.

The idea is simple. Consider again the education choice example. While in principle the decision could depend on quite a variety of cultural, institutional, and macroeconomic factors, the most directly relevant one is the wages of occupations available to different majors. If we can model how these evolve and parameterize the decision rule as a function of wages, we may then hope to have a measure of reasonably expected costs and benefits of the choice at any fixed time which does not require observing a long time series of decisions in different environments, just observing the environment in the cross section we do have and plugging in a reasonable forecast given that environment into the decision rule. A nice feature of this method is that this forecast need not come from the same data set. If we have a separate data set with a long time series of wages by occupation, we can estimate a forecasting rule from that data and use the estimates to adjust the measures of expected costs and benefits in a data set with a cross-sectional sample of individuals. With this information, one can either back out the subjective benefits of the decision or more accurately assess whether the decisions were ex ante reasonable and well-informed. The downside of methods of this type relative to using a long panel under the assumption of arbitrary dependence is the need to specify a forecasting rule which can be estimated using only a time series of aggregate information (or possibly, such a time series and a small set of parameters which can be inferred from a cross section). This requires taking a stand on how to model aggregate variables and their effects on individuals, but then so does assuming no effect and proceeding with only cross-sectional data.

Inference in this setup takes a form which is very similar to standard two-step estimation, to account for the estimation error in the forecasting rule when estimating the cross-section parameters. The paper also expends quite a bit of effort on the case in which the forecasting rule involves a unit root or near-unit root process. In addition to the usual complications from unit root limit theory,5 there is a conceptual issue in this case, in that the prediction from a unit root process is history dependent, with initial conditions never washing out in the limit. Since the cross-sectional data is observed at some point in the history of the process, the limiting uncertainty about the process is affected by the point in history at which the cross-section is observed.

The overall message of this line of inquiry is that ‘individual’ and ‘aggregate’ effects cannot always be cleanly separated, and that when one depends on the other, our knowledge of both may be limited by the one on which information is most scarce. Information can be economized on by imposing structure, but since generally speaking the information on aggregates is most limited, this may require explicit modeling of aggregates. In other words, seeking to answer microeconomic questions does not offer an escape from the need to think about macroeconomics. Dreary, right?


  1. Not to be confused with convergence to a stable distribution. Stable limit theorems strictly speaking differ from convergence in distribution, as the limiting distribution in convergence in distribution need not live on the same probability space as the variables approaching the limit and so does not necessarily have any particular relationship with the conditioning variable. Stable convergence is, however, still weaker than the almost sure convergence ensured by, say, the Skorokhod representation theorem which ensures the existence of a representation of the data which is measurable with respect to the same sigma-field as he limiting variable, or strong coupling results like the the KMT strong approximation which find a sequence of approximating variables which live on the same space as the data. Instead, the limit variable lives on a separate sigma field which contains the sub-sigma-field generated by the conditioning variable but not the full sigma field on which all the data live. (See definition 1 in HKM for a formal statement). The advantage of threading the needle in this way is that one can retain the influence of the ‘aggregate’ variables while relying only on conditions much closer to those used to ensure weak convergence, like a standard (martingale) central limit theorem. As most limit theory in econometrics relies on weak convergence results, partly for robustness and partly for historical reasons, this approach can be applied to most standard and many nonstandard estimators.

  2. It was later found that GMM offers an organizing framework for a wide variety of estimation procedures and nowadays it finds many applications divorced from this specific economic context. But Lars Hansen’s motivating examples were in asset pricing, where the consumption Euler equation takes precisely the form of equating subjective marginal cost and benefit.

  3. As I understand it this is indeed being done in modern labor economics research, with substantial focus on how the decisions are influenced by the economic environment, making the example a bit of a straw man, but properly accounting for the uncertainty induced by this variation is still an important issue.

  4. This result is similar to but not quite what HKM show in Section 3.1 where they discuss the issue informally, or in their stable limit theorem, which applies to a different case. In a class paper I wrote a few years ago there is a formal proof of a CLT for long panels with aggregate shocks by triangular array asymptotics which gives a precise statement. Interestingly, the proof is not my own: I explained the problem to Don Andrews during office hours and he derived the result within about 15 minutes.

  5. Basically, establishing convergence to stochastic integrals by showing weak convergence by tightness in the Skorokhod topology. While highly involved and nontrivial, the unit root case avoids one technical complication common to other models built on empirical processes, as the empirical process in the estimator for the unit root model is measurable in the sigma-field over the Skorokhod topology, while empirical processes in general need not be measurable, something of a complication for stable convergence. This complication shows up in a variety of non-smooth or semiparametric estimators such as simulated method of moments or quantile regression which one might want to use for either the cross-section or time-series component of the model. In the standard case this is resolved by the machinery of weak convergence in outer measure; I suspect that it could be extended to stable convergence in a straightforward fashion, but it remains to be done.

Related