ARIMA

Outline

Moving Average Probability Model
- Properties
- Relation to Autoregression
Extensions
- ARIMA model
- Seasonal ARIMA Model
Multivariate Models
Application: Macro Forecasting

Probability Models and Hidden Variables

With additive and autoregressive models, series is sum of many fixed components and one residual or error term
- \(y_t=s(t)+h(t)+g(t)+\sum_{j=1}^{p}b_jy_{t-p}+\epsilon_t\)
Given past data \(\mathcal{Y}_{t-1}\), all randomness in predicted distribution comes from \(\epsilon_t\)
This is convenient for producing a predictor and writing down a likelihood
- If \(\epsilon_t\sim f(x)\), conditional likelihood of observation \(t\) is \(\ell(y_t|\mathcal{Y}_{t-1})=f(y_t-s(t)-h(t)-g(t)-\sum_{j=1}^{p}b_jy_{t-p})\)
Interpretation is that all predictive information is encoded in additive terms
In principle, there can be multiple sources of variation in \(y_t\) which are not seen
- “Hidden”, or latent variables, which may evolve over time, may affect \(y_t\)
For forecasting, we don’t care about the hidden variables themselves: don’t need to figure out what they are
- But they can create detectable patterns in the series we do care about
Allowing for hidden variables can allow simple way to account for these patterns
- But also create challenges due to indirect and complicated relationship between model and predictor
The Moving Average (MA) Model is one of the most popular and useful hidden variables models

The Moving Average Model: Origin

Russian economist Eugen Slutsky investigated how regular economic patterns could arise from pure randomness
- He took a long string of random numbers from the lottery, and took the average of the first 10
- Then he shifted the window over by 1, taking an average of the 2nd through 11th numbers
- He repeated this through the full string, producing a sequence of overlapping averages, a moving average
In this process, he produced a sequence that looked remarkably like the fluctuations observed in economic series
He theorized that such a mechanism, where sequences of purely random shocks are combined to produce a single number, might describe the origins of economic cycles
Today, moving average refers to a process generated by overlapping weighted combinations of random variables

Slutsky’s Moving Average (cf Mahon & Davies 2009)

The Moving Average Model

Consider a sequence of mean 0 uncorrelated shocks \(e_t\) \(t=1\ldots T\) which are not observed
- \(E[e_t]=0\), \(E[e_te_{t-h}]=0\) for all t, for all \(h\neq0\)
The observed data is still \(\{y_t\}_{t=1}^T\)
The Moving Average Model (of order q) says that \(y_t\) is a weighted sum of present and past shocks
- For all \(t\), \(y_t=e_t+\sum_{j=1}^{q}\theta_{j}e_{t-j}\)
Coefficients \(\theta=\{\theta_j\}_{j=1}^{q}\) determine relationship between observations over time
In lag polynomial notation, let \(\theta(L)=1+\sum_{j=1}^{q}\theta_jL^j\), then \(y_t=\theta(L)e_t\)
To produce a full likelihood, strengthen assumption to \(e_t\overset{iid}{\sim}f()\), usually \(N(0,\sigma_e^2)\)
Compare to the Autoregression model: \(y_t\) is a weighted sum of a present shock and past observations
- For all \(t\), \(y_t=e_t+\sum_{j=1}^{q}b_{j}y_{t-j}\)
- Lag polynomial representation \(b(L)y_t=e_t\) has lags on the left side instead of the right

Properties

The appeal of the MA model comes from the fact that although \(e_t\) never directly seen, the properties of the data are characterized explicitly by the model
In particular, the Autocovariance function is determined by parameters \(\theta\)
\(\gamma(h):=Cov(y_{t},y_{t-h})=Cov(e_t+\sum_{j=1}^{q}\theta_{j}e_{t-j},e_{t-h}+\sum_{j=1}^{q}\theta_{j}e_{t-j-h})\) \(=\sigma_e^2\sum_{j=0}^{q}\sum_{k=0}^q\theta_{j}\theta_{k}\delta_{k=h+j}\)
- In words, covariances are determined by the moving average components shared between times
Consider, e.g. MA(1), \(y_t=e_t+\theta_1e_{t-1}\)
- \(\gamma(0)=Var(y_t)=\sigma_e^2(1+\theta_1^2)\), \(\gamma(1)=Cov(y_t,y_{t-1})=\sigma_e^2\theta_1\), \(\gamma(j)=0\), for all \(j\geq 2\)
Last property, that autocovariance function drops to 0 at \(q+1\), is true for any \(MA(q)\)
- Beyond the window of length q, the moving averages do not overlap, and observations are uncorrelated
MA(q) model allows modeling short term correlations up to length \(q\) horizon: predicted mean goes to 0 in finite time
By allowing long enough \(q\), can, with many coefficients, describe very general patterns of relationships
As an MA model is linear, like AR model, does not allow for general patterns of higher-order properties like conditional variance, skewness etc

Challenge: Identifiability

Difficulty of models with latent variables is that one cannot learn about the \(e_t\) directly, but only their properties
With two or more random components determining one observation, might not be able to distinguish which one was the source of any change
- This is called an identification problem in econometrics
Consider MA(1) processes \(y_t=e_t+\frac{1}{2}e_{t-1}\), \(e_t\overset{iid}{\sim}N(0,\sigma_{e}^2)\) and \(y_t=u_t+2u_{t-1}\), \(u_t\overset{iid}{\sim}N(0,\frac{\sigma_{e}^2}{4})\)
We see ACF of first is \(\gamma(0)=\frac{5}{4}\sigma_e^2\), \(\gamma(1)=\frac{\sigma_e^2}{2}\)
- ACF of second process is exactly the same, so by normality, distribution is also exactly the same
This is an example of a general phenomenon: factorizing \(\theta(L)=\Pi_{j=1}^{q}(1+t_jL)\) with inverse roots \(t_j\), there is an equivalent representation \(\tilde{\theta}(L)=\Pi_{j=1}^{q}(1+\frac{1}{t_j}L^j)\) with flipped roots and a different \(\sigma^2_e\)
Does this make a difference? Not for forecasting: properties of series the same either way
- Can simply restrict interest to a representation with all roots inside the unit circle, for Bayesian or statistical approach
General lesson is that when building model out of unobserved parts, might not be able to learn all about them

Relationship to Autoregression Model

Properties of MA model can be seen by repeated substitution
- Consider MA(1) \(y_t=e_t+\theta_1e_{t-1}\) rearrange as \(e_t=y_{t}-\theta_1e_{t-1}\)
Can substitute in past values \(e_{t-1}\) into this formula repeatedly
- \(e_t=y_{t}-\theta_1(y_{t-1}-\theta_1e_{t-2})=y_{t}-\theta_1y_{t-1}+\theta^2_1e_{t-2}\)
- \(=y_{t}-\theta_1y_{t-1}+\theta^2_1(y_{t-2}-\theta_1e_{t-3})=y_{t}-\theta_1y_{t-1}+\theta^2_1y_{t-2}-\theta^3_1e_{t-3}=\ldots\)
Continuing indefinitely, have, in lag polynomial notation, \((1-\sum_{j=1}^{\infty}(-\theta_1)^jL^j)y_t=e_t\)
- This is exactly an (infinite order) autoregression model
This equivalence holds in general: a finite order MA model is an infinite order AR model
Intuition: because \(e_t\) never seen exactly, must use all past information in \(\mathcal{Y}_t\) to predict it
Observable implication: PACF will decay to 0 smoothly, rather than dropping off like AR
Equivalence also holds in reverse: a finite order AR model is an infinite order MA model
- Repeatedly substituting, AR(1) \(y_t=b_1y_{t-1}+e_t\) becomes \(y_t=e_t+\sum_{j=1}^{\infty}b^j_1e_{t-j}\)
Can use either, but one representation may be much less complex

Estimation

Because the shocks are not observed, likelihood formula for MA model surprisingly complicated
- In general, no closed form formula exists for conditional likelihood
Reason is that \(y_t\) and \(y_{t-1}\) both depend on \(e_{t-1}\), but to know what part of \(y_{t-1}\) comes from \(e_{t-1}\) need to know what comes from \(e_{t-2}\), which affects \(y_{t-2}\) which requires… etc
A variety of solutions exist which nevertheless allow valid estimates
Likelihood can be constructed by recursive algorithm, step by step
- Requires conditioning each period, which requires using Bayes rule, which requires integration
- But if \(e_t\sim N(0,\sigma_e^2)\), there is a fast and exact recursive algorithm
Typical approach: use likelihood from normal case even if you don’t think distribution is normal
- This is called a quasi- or pseudo-likelihood, and \(\theta\) can be estimated by maximizing it: this is default method in R in arima command
- Or use penalized estimation, or do Bayesian inference with it
Alternative 1: Choose parameters \(\widehat{\theta}\) to match estimated ACF to model-implied ACF
- Utilizes fact that only covariances are modeled by process, doesn’t need normality
Alternative 2: Convert to infinite AR form and estimate by least squares, truncating at some finite order
- Not exact, but since decay usually exponential, truncated part is close to negligible and prediction becomes very easy: just use AR formulas

Combinations: ARMA models

To efficiently match all features of correlation patterns, can also use both AR and MA
An Autoregressive Moving Average Model of Orders p and q or ARMA(p,q) model has form, for all t, \[y_{t}=\sum_{j=1}^{p}b_jy_{t-j}+e_{t}+\sum_{k=1}^{q}\theta_{k}e_{t-k}\]
- Where \(e_t\) is still a mean 0 white noise sequence
In lag polynomial notation \((1-\sum_{j=1}^{q}b_jL^j)y_t=(1+\sum_{j=1}^{q}\theta_jL^j)e_t\)
By same equivalency results, an ARMA model is equivalent to an infinite AR model or an infinite MA model
- Can denote infinite MA representation as \(y_t=\psi(L)e_t:=\frac{\theta(L)}{b(L)}e_t\)
- ARMA model allows finite representation for very general patterns
Estimation again usually by normal (quasi-)likelihood, with recursive algorithm
In addition to invertibility condition for MA roots, have one more equivalency
- If factorizations of \(b(L)\) and \(\theta(L)\) share a root, can factor out from both sides and get model with exact same predictions
- Not a problem for prediction: just present in factorized form
- Estimation can behave weirdly when roots are “close” (worse approximation, etc)

ARIMA models

Condition for stationarity or an ARMA model is that the AR part satisfy conditions for stationarity of AR model
- All roots of lag polynomial \(b(x)=(1-\sum_{j=1}^{q}b_jL^j)\) are outside the unit circle
In the case of d unit roots, differencing \(y_t\) d times can restore stationarity
If \(\Delta^d y_t\) is an ARMA(p,q) process, \(y_t\) is called an ARIMA(p,d,q) process
ARIMA model allows long run random trend, plus very general short run patterns
Exactly the same unit root tests as in AR case apply: run Phillips-Perron, KPSS, or ADF to determine d
auto.arima executes following steps
- Test \(y_t\) for unit roots by KPSS test, differencing if found to be nonstationary, repeating until stationarity
- Represent ARMA (quasi)likelihood by recursive algorithm at different orders \((p,q)\)
- Use AICc to choose orders, then estimate \(b,\theta\) by maximizing (quasi)likelihood
This takes statistical approach to prediction: looks for best model of data
- Performs well for mean forecasting over models close to ARIMA class
Bayesian approach requires priors over all AR and MA coefficients
- Often use infinite AR representation for simplicity: use priors to discipline large number of coefficients

Seasonal Models

Use ARIMA around around trend model to account for deterministic growth, seasonality, etc
Can also extend model to add non-deterministic seasonal patterns
For a series of frequency \(m\), may have particular relationships across intervals of length \(m\) or regular subsets
Can create seasonal versions of ARIMA models by allowing relationships across \(m\) lags
Useful for modeling commonly seen spikes in ACF at seasonal intervals
- Eg. strong December sales one year may be followed by strong December sales next year, on average
Seasonal AR(1) is \(y_t=B_1y_{t-m}+e_t\), seasonal MA(1) is \(y_t=e_t+\Theta_1e_{t-m}\), seasonal first difference is \(y_t-y_{t-m}\)
Can combine and extnd to higher orders to create Seasonal ARIMA (P,D,Q)
- \((1-\sum_{j=1}^{P}B_jL^{mj})(1-L^m)^Dy_t=(1+\sum_{k=1}^{Q}\Theta_jL^{mj})e_t\)
Can add on top of standard ARIMA to match seasonal and cyclical patterns by multiplying lag polynomials
- Seasonal ARIMA(p,d,q)(P,D,Q) takes form \((1-\sum_{n=1}^{p}b_nL^{n})(1-\sum_{j=1}^{P}B_jL^{mj})(1-L)^d(1-L^m)^Dy_t=(1+\sum_{k=1}^{q}\Theta_jL^{j})(1+\sum_{k=1}^{Q}\Theta_jL^{mj})e_t\)
Estimation similar to standard ARIMA: test for integration order, use quasi-maximum likelihood to fit coefficients
- Seasonal components permitted by default in auto.arima: can exclude if series known to be deseasonalized

Application Continued: Macroeconomic Forecasts

Let’s take the 7 series from last class forecasting exercise, following Litterman (1986)
- GNP Growth, Inflation, Unemployment, M1 Money Stock, Private Fixed Investment, Commercial Paper Interest Rates, and Inventory Growth quarterly from 1971-2018
Choose level of differencing using tests, then compare (by AICc) AR only, MA only, and ARMA choices for each series
- Implement by auto.arima with restrictions to AR order p or MA order q
Series restricted to AR or MA only appear to need more parameters than if allowed to use both
- Unrestricted model for inflation is ARIMA(0,1,1), but if forced to MA(0), select AR(4): need many AR coefficients to approximate MA property
- Unemployment has reverse: choose ARIMA(2,0,0), but need MA(3) to match if AR order restricted to 0
Resulting forecasts of most series similar across specifications due to rapid mean reversion
- MA reverts to mean (or trend if series integrated or trending) in finite time, AR reverts over infinite time, but distance decays exponentially fast
- Unemployment is only series where AR vs MA difference notable: slower mean reversion than other series

#Libraries
library(fredr) # Data from FRED API
library(fpp2) #Forecasting and Plotting tools
library(vars) #Vector Autoregressions
library(knitr) #Use knitr to make tables
library(kableExtra) #Extra options for tables
library(dplyr) #Data Manipulation
library(tseries) #Time series functions including stationarity tests
library(gridExtra)  #Graph Display

# Package "BMR" for BVAR estimation is not on CRAN, but is instead maintained by an individual 
# It must be installed directly from the Github repo: uncomment the following code to do so

# library(devtools) #Library to allow downloading packages from Github
# install_github("kthohr/BMR")

# Note that if running this code on Kaggle, internet access must be enabled to download and install the package
# If installed locally, there may be difficulties due to differences in your local environment (in particular, versions of C++)
# For this reason, relying local installation is not recommended unless you have a spare afternoon to dig through help files

library(BMR) #Bayesian Macroeconometrics in R

##Obtain and transform NIPA Data (cf Lecture 08)

fredr_set_key("8782f247febb41f291821950cf9118b6") #Key I obtained for this class

## Load Series: Series choices and names as in Litterman (1986)

RGNP<-fredr(series_id = "GNPC96",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01"),
           units="cch") #Real Gross National Product, log change

INFLA<-fredr(series_id = "GNPDEF",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01"),
           units="cch") #GNP Deflator, log change

UNEMP<-fredr(series_id = "UNRATE",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01"),
           frequency="q") #Unemployment Rate, quarterly

M1<-fredr(series_id = "M1SL",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01"),
           frequency="q",
           units="log") #Log M1 Money Stock, quarterly

INVEST<-fredr(series_id = "GPDI",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01"),
           units="log") #Log Gross Domestic Private Investment

# The 4-6 month commercial paper rate series used in Litterman (1986) has been discontinued: 
# For sample continuity, we merge the series for 3 month commercial paper rates from 1971-1997 with the 3 month non-financial commercial paper rate series
# This series also has last start date, so it dictates start date for series

CPRATE1<-fredr(series_id = "WCP3M",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("1996-10-01"),
           frequency="q") #3 Month commercial paper rate, quarterly, 1971-1997

CPRATE2<-fredr(series_id = "CPN3M",
           observation_start = as.Date("1997-01-01"),
           observation_end = as.Date("2018-07-01"),
           frequency="q") #3 Month AA nonfinancial commercial paper rate, quarterly, 1997-2018

CPRATE<-full_join(CPRATE1,CPRATE2) #Merge 2 series to create continuous 3 month commercial paper rate series from 1971-2018

CBI<-fredr(series_id = "CBI",
           observation_start = as.Date("1971-04-01"),
           observation_end = as.Date("2018-07-01")) #Change in Private Inventories

#Format the series as quarterly time series objects, starting at the first date
rgnp<-ts(RGNP$value,frequency = 4,start=c(1971,2),names="Real Gross National Product") 
infla<-ts(INFLA$value,frequency = 4,start=c(1971,2),names="Inflation")
unemp<-ts(UNEMP$value,frequency = 4,start=c(1971,2),names="Unemployment")
m1<-ts(M1$value,frequency = 4,start=c(1971,2),names="Money Stock")
invest<-ts(INVEST$value,frequency = 4,start=c(1971,2),names="Private Investment")
cprate<-ts(CPRATE$value,frequency = 4,start=c(1971,2),names="Commercial Paper Rate")
cbi<-ts(CBI$value,frequency = 4,start=c(1971,2),names="Change in Inventories")


#Express as a data frame
macrodata<-data.frame(rgnp,infla,unemp,m1,invest,cprate,cbi)

nlags<-6 # Number of lags to use
nseries<-length(macrodata[1,]) #Number of series used

Series<-c("Real GNP Growth","Inflation","Unemployment","Money Stock","Private Investment","Commercial Paper Rate","Change in Inventories")

#Use auto.arima to choose AR order after KPSS test without trend
#Do this also for MA, and for ARMA
ARIstatmodels<-list()
IMAstatmodels<-list()
ARIMAstatmodels<-list()
Integrationorder<-list()
ARorder<-list()
MAorder<-list()
ARorder2<-list()
MAorder2<-list()


for (i in 1:nseries){
  ARIstatmodels[[i]]<-auto.arima(macrodata[,i],max.q=0,seasonal=FALSE) #Apply auto.arima set to (nonseasonal) ARI only
  IMAstatmodels[[i]]<-auto.arima(macrodata[,i],max.p=0,seasonal=FALSE) #Apply auto.arima set to (nonseasonal) IMA only
  ARIMAstatmodels[[i]]<-auto.arima(macrodata[,i],seasonal=FALSE) #Apply auto.arima set to (nonseasonal) ARIMA
  Integrationorder[i]<-ARIMAstatmodels[[i]]$arma[6] #Integration order chosen (uses KPSS Test)
  ARorder[i]<-ARIstatmodels[[i]]$arma[1] #AR order chosen in AR only (uses AICc)
  MAorder[i]<-IMAstatmodels[[i]]$arma[2] #MA order chosen in MA only (uses AICc)
  ARorder2[i]<-ARIMAstatmodels[[i]]$arma[1] #AR order chosen in ARMA (uses AICc)
  MAorder2[i]<-ARIMAstatmodels[[i]]$arma[2] #MA order chosen in ARMA (uses AICc)
  
}

Estimated AR, MA, and ARMA orders for Macro Series

armamodels<-data.frame(as.numeric(Integrationorder),as.numeric(ARorder),
                       as.numeric(MAorder),as.numeric(ARorder2),as.numeric(MAorder2))

rownames(armamodels)<-Series

colnames(armamodels)<-c("d","p (AR only)","q (MA only)","p (ARMA)","q (ARMA)")

armamodels %>%
kable(caption="Autoregression, Moving Average, and ARMA Models") %>%
  kable_styling(bootstrap_options = "striped")

Autoregression, Moving Average, and ARMA Models
	d	p (AR only)	q (MA only)	p (ARMA)	q (ARMA)
Real GNP Growth	0	2	3	1	1
Inflation	1	4	1	0	1
Unemployment	0	2	3	2	0
Money Stock	1	2	3	1	1
Private Investment	1	1	1	1	0
Commercial Paper Rate	1	3	2	0	2
Change in Inventories	1	1	1	1	1

Forecasts from ARI, IMA, and ARIMA

#Construct Forecasts of Each Series by Univariate ARI, IMA, ARIMA models, with 95% confidence intervals
ARIfcsts<-list()
ARIMAfcsts<-list()
IMAfcsts<-list()
for (i in 1:nseries) {
  ARIfcsts[[i]]<-forecast::forecast(ARIstatmodels[[i]],h=20,level=95)
  ARIMAfcsts[[i]]<-forecast::forecast(ARIMAstatmodels[[i]],h=20,level=95)
  IMAfcsts[[i]]<-forecast::forecast(IMAstatmodels[[i]],h=20,level=95)
}

forecastplots<-list()
for (i in 1:nseries){
pastwindow<-window(macrodata[,i],start=c(2000,1))  
#Plot all forecasts
forecastplots[[i]]<-autoplot(pastwindow)+
  autolayer(ARIMAfcsts[[i]],alpha=0.4,series="ARIMA")+
  autolayer(ARIfcsts[[i]],alpha=0.4,series="ARI")+
  autolayer(IMAfcsts[[i]],alpha=0.4,series="IMA")+
  labs(x="Date",y=colnames(macrodata)[i],title=Series[i])
}

grid.arrange(grobs=forecastplots,nrow=4,ncol=2)

The Multivariate Case

Vector Autoregression (VAR), Vector Moving Average (VMA), and Vector ARMA (VARMA) models similar to 1 variable case, but with more coefficients
Vector Autoregression much more commonly used than VMA or VARMA cases, so discuss only this
- By equivalence relationships, can represent all as infinite order VAR
With \(m\) variables, \(y_t=(y_{1,t},\ldots,y_{m,t})\in\mathbb{R}^m\)
For all \(t=p+1\ldots T\), \(i=1\ldots m\), the VAR(p) model is \(y_{i,t}=b_{i,0}+\sum_{k=1}^{m}\sum_{j=1}^{p}b_{i,jk}y_{k,t-j}+\epsilon_{k,t}\)
Where for all t, k, j, \(h\neq0\), \(E[\epsilon_{k,t}]=0\), \(E[\epsilon_{k,t}\epsilon_{j,t+h}]=0\)
Collecting coefficients in matrices, can write as \(y_t=B_0+\sum_{j=1}^{p}B_jy_{t-j}+\epsilon_t\)
Typical to assume \(\epsilon_t\overset{iid}{\sim} N(0,\Sigma_\epsilon)\), where \(\Sigma_\epsilon\in\mathbb{R}^{m\times m}\) gives present covariance
(log) Likelihood again takes (multivariate) normal form: weighted least squares
Stationarity conditions analogous to m=1 case: roots (of particular polynomial) inside unit circle
- roots in library vars can display

Multivariate VAR Priors

Can construct multivariate version of normal-gamma priors for VARs which is also conjugate
Multivariate normal \(N(\mu_B,\Sigma_B)\) prior on \(B_j\), \(j=1\ldots p\) with covariances \(\Sigma_B\)
- With \(m\) variables, \(p\) lags, have \(m(p+1)\) coefficients, so \((mp+m)^2\) prior parameters to choose just here
Inverse Wishart distribution \(W(\Psi,\nu)\) on \(\Sigma_\epsilon\) multivariate version of Gamma distribution
- \(\Psi\in\mathcal{R}^{m\times m}\) is prior guess of \(\Sigma_\epsilon\), \(\nu\in\mathbb{R}\) sets concentration
Minnesota priors due to Doan, Litterman, Sims (1984), at FRB Minneapolis can help simplify choices
Goal is to express typical properties of macroeconomic series for business cycle forecasting
Prior mean for \(b_{i,1i}=1\), all other coefs 0 (equivalently, \(B_1=I_m\) identity matrix)
- Each series centered on random walk in just itself, with lags and cross-equation effects 0
Variances \(\sigma^2_{\epsilon,i}\) \(i=1\ldots m\) in \(\Sigma_\epsilon\) estimated by sample variance of residuals from 1 dimensional AR(p)
All prior covariances across coefficients set to 0: independent normal guesses for each
- Variances are \(\Sigma_{b_{i,jk}}=\) \(H_1/h(j)\) for own lags \(H_2\sigma^2_{\epsilon,i}/(h(j)\sigma^2_{\epsilon,k})\) for \(i,jk\) coefficient, \(H_3\sigma^2_{\epsilon,i}\) for constant
- Parameters \(H_1,H_2,H_3\) set tightness of prior around own lags, cross-variable effects, and constants, respectively

Application: Litterman (1986) Macroeconomic Forecasts

Minnesota priors developed for macro forecasting applications at the Federal Reserve
- Alternative to large, complicated and somewhat untrustworthy explicit economic models used at the time
- Allows many variables to enter in unrestricted way, but minimizes oerfitting due to priors
Litterman went on to be chief economist at Goldman Sachs, and method spread to private sector
Example from Litterman (1986): BVAR with Minnesota priors over 7 variables from before
6 lags in each variable, with a constant
Choose prior parameters \(H_1=1\), \(H_2=0.2\), \(H_3=1\), \(h(j)=j^{-2}\)
- Noting that scale is proportional to variance estimates, says expect constant to deviate by 1 standard deviation of series
- Own first lag deviates from 1 by about standard deviation of series, later lags have variance decaying harmonically
- Cross-equation effects expected to vary 20% as much as own-lag effects from 0
Implement using bvarm function in BMR library for Bayesian VARs and related models used in macroeconomics
Forecasts and corresponding intervals appear reasonable despite that some series (M1, Investment) are clearly trending

Forecasts from ARIMA (Red) and BVAR (Blue)

#Convert to a data frame
bvarmacrodata <- data.matrix(macrodata) 

#Set up Minnesota-prior BVAR object, and sample from posterior by MCMC
# See https://www.kthohr.com/bmr_docs_vars_bvarm.html for syntax documentation: manual is out of date
bvar_obj <- new(bvarm)

#Construct BVAR with nlags lags and a constant
bvar_obj$build(data_endog=bvarmacrodata,cons_term=TRUE,p=nlags)

#Set random walk prior mean for all variables
coef_prior=c(1,1,1,1,1,1,1) 
# Set prior parameters (1,0.2,1) with harmonic decay
bvar_obj$prior(coef_prior=coef_prior,var_type=1,decay_type=1,HP_1=1,HP_2=0.2,HP_3=1,HP_4=2)

#Sample from BVAR with 10000 draws of Gibbs Sampler
bvar_obj$gibbs(10000)

#Construct BVAR Forecasts
bvarfcst<-forecast(bvar_obj,periods=20,shocks=TRUE,plot=FALSE,varnames=colnames(macrodata),percentiles=c(.05,.50,.95),
    use_mean=FALSE,back_data=0,save=FALSE,height=13,width=11)

#Warning: command is incredibly slow if plot=TRUE is on, fast otherwise
# Appears to be issue with plotting code in BMR package, which slows down plotting to visualize
# With too many lags and series, have to wait through hundreds of forced pauses

# ADD VAR and ARIMA forecasts to plot

forecastseriesplots<-list()
for (i in 1:nseries){
BVAR<-ts(bvarfcst$forecast_mean[,i],start=c(2018,4),frequency=4,names=Series[i]) #Mean
lcband<-ts(bvarfcst$plot_vals[,1,i],start=c(2018,4),frequency=4,names=Series[i]) #5% Lower confidence band
ucband<-ts(bvarfcst$plot_vals[,3,i],start=c(2018,4),frequency=4,names=Series[i]) #95% Upper confidence band
fdate<-time(lcband) #Extract date so geom_ribbon() knows what x value is
bands<-data.frame(fdate,lcband,ucband) #Collect in data frame
pastwindow<-window(macrodata[,i],start=c(2000,1))
#Plot ARIMA model forecast along with BVAR forecasts, plus respective 95% intervals
forecastseriesplots[[i]]<-autoplot(pastwindow)+
  autolayer(ARIMAfcsts[[i]],series="ARIMA",alpha=0.4)+
  autolayer(BVAR,series="BVAR",color="blue")+
  geom_line(aes(x=fdate,y=ucband),data=bands,color="blue",alpha=0.4)+
  geom_line(aes(x=fdate,y=lcband),data=bands,color="blue",alpha=0.4)+
  labs(x="Date",y=colnames(macrodata)[i],title=Series[i])
}

grid.arrange(grobs=forecastseriesplots,nrow=4,ncol=2)

Interpretation

Point forecasts differ somewhat, but within respective uncertainty intervals
- Note: Bayesian predictive intervals are not confidence intervals and vice versa
- Predictive intervals give best average-case prediction of distribution given data over the model
- Confidence intervals construct region containing data 95% of the time, if model specification true
BVAR uses longer lags, prior centered at unit root to account for persistent dynamics
- Order of AR lags must be longer both to account for integration and for possibility of MA dynamics
- Prior over higher lags reduces variability of estimates, reduces overfitting by regularization
Test-based approach attempts to detect unit roots and difference them out, then find best stationary ARMA model
- Model selection by AICc acts against overfitting, but leaves remaining coefficients unrestricted

Conclusions

Latent variable models like moving averages can be used to concisely describe data features
- Since not seen, representations are not unique
- AR and MA equivalent, but one may be much more efficient, taking finite rather than infinite order
Latent models can be combined with models of observables like autoregressions (with or without integration) to extend descriptive capability
Multivariate case results in many more parameters, especially if using high order VAR
- But Bayesian approach allows imposing discipline using priors

References

Thomas Doan, Robert Litterman & Christopher Sims “Forecasting and conditional projection using realistic prior distributions” Econometric Reviews Vol 3. No 1 (1984)
- Introduced the Minnesota prior
Robert B. Litterman, “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience” Journal of Business & Economic Statistics, Vol. 4, No. 1 (Jan., 1986), pp. 25-38
Joe Mahon & Phil Davies “The Meaning of Slutsky” The Region December 2009 (https://www.minneapolisfed.org/publications/the-region/the-meaning-of-slutsky)
- History of Eugen Slutsky and the “shocks” approach to modeling economic data
Keith O’Hara, “Bayesian Macroeconometrics in R” (2015) (https://www.kthohr.com/bmr.html)
- Library for Bayesian VARs and related models

ARIMA

73-423 Forecasting for Economics and Business

Outline

Probability Models and Hidden Variables

The Moving Average Model: Origin

The Moving Average Model

Properties

Challenge: Identifiability

Relationship to Autoregression Model

Estimation

Combinations: ARMA models

ARIMA models

Seasonal Models

Application Continued: Macroeconomic Forecasts

Estimated AR, MA, and ARMA orders for Macro Series

Forecasts from ARI, IMA, and ARIMA

The Multivariate Case

Multivariate VAR Priors

Application: Litterman (1986) Macroeconomic Forecasts

Forecasts from ARIMA (Red) and BVAR (Blue)

Interpretation

Conclusions

References