Probability Models for Cycles
- Previously, saw additive probability models
- Accounts for time-dependent components like deterministic seasonality, holidays, trends, etc
- \(y_t=s(t)+h(t)+g(t)+\epsilon(t)\)
- Cyclical component \(\epsilon(t)\) soaks up any remaining, irregular variation
- In many cases, irregular, “cyclical” variation still has a predictable pattern
- Many economic series are detrended and deseasonalized, but still have a lot of structure
- Forecasts should exploit this probabilistic variation
- Today, look at probability models for this random component of variation
- For simplicity, we will ignore the time-dependent components
- Can treat as models for detrended and deseasonalized data, or as one part of model containing others
- Today, go in depth into Autoregressive Probability Model
- Structure and properties, including stationarity
- Challenges for Bayesian and statistical approaches
- Multivariate case: Vector Autoregression Model
Autoregression Model
- For all \(t=k+1\ldots T\), the AR(p) model is \[y_t=\beta_0+\sum_{j=1}^{p}\beta_jy_{t-j}+\epsilon_t\]
- Where for all t, \(h\neq0\), \(E[\epsilon_t]=0\), \(E[\epsilon_t\epsilon_{t+h}]=0\)
- To construct likelihood, need to also assume distribution for \(\epsilon_t\)
- Most common is \(\epsilon_t\overset{iid}{\sim}N(0,\sigma^2)\), \(Pr(\epsilon<m)=\Phi(\frac{m}{\sigma})=\int_{-\infty}^{\frac{m}{\sigma}}\frac{1}{\sqrt{2\pi}}\exp(\frac{-x^2}{2})dx\)
- Likelihood is \(\Pi_{t=k+1}^{T}\frac{1}{\sigma}\phi(\frac{y_t-\beta_0-\sum_{j=1}^{k}\beta_jy_{t-j}}{\sigma})\), where \(\phi(x)=\frac{1}{\sqrt{2\pi}}\exp(\frac{-x^2}{2})\) is the standard normal pdf
- Log Likelihood is \(-\frac{T}{2}\log2\pi\sigma^2+\frac{1}{2\sigma^2}\sum_{t=k+1}^{T}(y_t-\beta_0-\sum_{j=1}^{p}\beta_jy_{t-j})^2\)
- Parameters \(\theta=(\{\beta_j\}_{j=0}^{p},\sigma^2)\in \Theta\subseteq \mathbb{R}^{p+1}\times\mathbb{R}_{+}\)
- Interpretation: value today depends linearly on last \(k\) periods, up to unpredictable noise
Properties of Autoregression Models
- By adding enough lags, an autoregression model can match just about any autocorrelation pattern
- Provides an essentially universal model for autocorrelation
- Linearity means that features other than means and covariances are fixed
- Given \(\mathcal{Y}_{T}\), variance and all other features of conditional distribution (eg skewness, quantiles) are those of \(\epsilon_t\), fixed over time
- Useful for mean forecasting, less so for other features
- Specific ACF implied by coefficients derivable by solving system of equations
- Eg for AR(1), \(Cov(y_t,y_{t-h})=\beta_1^h\)
- Since solved iteratively, exponential decay eventually true for any finite set of lags
- To visually detect patterns, helpful to use Partial Autocorrelation Function PACF (
pacf
in R)
- For each order \(p=1\ldots p_{max}\), plots 1st order autocorrelation \(Cor(e^p_t,e^p_{t-1})\) of residuals from AR(p) forecast of \(y_t\)
- For an exact AR(m) model, PACF drops to 0 sharply at \(m+1\), because residuals are white noise
- For approximate AR(m), PACF may decline more smoothly
Lag Polynomials
- Building block of autoregressive model is lag operator \(L\): \(L(z_t)=z_{t-1}\) for any \(t\)
- If \(\overrightarrow{z}\) is a vector, with entries ordered by \(t\), \(L\overrightarrow{z}\) is shifted down by 1
- First entry of \(L\overrightarrow{z}\) corresponding to period 0 is missing (usually NA or . in R)
- Lags of multiple periods created by applying lag operator multiple times
- Denote \(L^j(z_t)=L(L(\ldots L(z_t)))=z_{t-j}\) for all \(t\)
- Composing lags of different order allows expressing relations over time between variables
- Equivalent notation for AR(k) model is in terms of lag polynomial
- \((1-\beta_1L-\beta_2L^2-\beta_3L^3-\ldots-\beta_pL^p)y_t=\beta_0+\epsilon_t\)
- Righthand side is pure white noise, left is a transformation applied to data
- Properties of data can be derived using the properties of the lag polynomial \(B(L)=1-\sum_{j=1}^{p}\beta_jL^j\)
- \(B(x)\) for an \(AR(p)\) can be factorized into components \(B(x)=\Pi_{j=1}^{p}(1-r_jx)\)
- Where each coefficient \(r_j\) satisfies \(B(\frac{1}{r_j})=0\), so \(\frac{1}{r_j}\) is a root of the lag polynomial
- AR(k) acts like k repeated AR(1) transformations
Stationarity Conditions
- For AR(1): \(B(L)=(1-\beta_1L)\), long run behavior depends on size of \(\beta_1\)
- \(|\beta_1|<1\): Stationary: on average, series returns to its mean over time
- Describes series which have a “natural” level returned to over time
- \(\beta_1=1\): Random Walk: on average, after a change, series does not revert to previous value
- Series is nonstationary: initial condition determines mean, and variance grows over time
- Series like financial asset prices, where growth is unpredictable
- \(|\beta_1|>1\): Explosive: on average, after a change, series goes even further away from intitial condition
- Describes series, like asset prices during a bubble, which evolve erratically
- These conditions can be generalized to AR(p) case where additional lags are included
- Check each roots \(r_j\) of lag polynomial \(B(x)=\Pi_{j=1}^{p}(1-r_jx)\) individually
- If one of the roots \(\frac{1}{r_j}=1\), the series is said to have a unit root
- Series is nonstationary and in long run does not revert to any particular level
- An \(AR(p)\) series is stationary only if all \(r_j\) are inside complex unit circle
plot.Arima
or autoplot.Arima
display inverse roots of lag polynomial of an Arima object
Integrated Processes
- Differencing a series applies transformation \((1-L)\), so \(\Delta y_t\) has lag polynomial \(\frac{B(L)}{(1-L)}\)
- If the resulting series is stationary, series is called integrated (of order 1)
- If differencing \(d\) times results in stationarity, series is integrated of order \(d\), denoted as \(I(d)\)
- To estimate coefficients of an \(I(d)\) series, difference \(d\) times, estimate \(AR\) model corresponding to all remaining factors
- Resulting series is ARIMA(p,d,0) (will describe MA next class)
- An ARIMA(p,d,0) series is an order \(p+d\) AR series, but because it is nonstationary, ERM-type estimate has nonstandard properties
- Distribution not approximately normal, and usual ERM guarantees fail
- Two predominant approaches when \(d\) not known
- Statistical approach: apply unit root test to determine \(d\), then apply AR(p) ERM to stationary series \(\Delta^d y\)
- Bayesian approach: put prior over \(p+d\) coefficients, compute posterior as usual
- Unlike in “regular” case, normal approximation is not true, and results not equivalent
- Posterior not even approximately close to asymptotic distribution: need to choose between approaches
Testing Approach
- Valid test statistics for presence of unit root based on regression of \(\Delta y_t\) on \(y_{t-1}\)
- If coefficient near 0, can’t reject null of unit root, if \(< 0\), can reject
- Need to account for drift, deterministic trends in null hypothesis
- Let \(e_t\) be a stationary I(0) series (not necessarily white noise): unit root series can behave as
- \(\Delta y_t=e_t\): (No constant) Series has only unit root, no deterministic growth
- \(\Delta y_t=\beta_0+e_t\): (Drift) Series grows by \(\beta_0\) each period along with unit root
- \(\Delta y_t=\beta_0+ct+e_t\): (Linear Trend) Series grows by \(\beta_0+ct\) each period, along with unit root
- Test by Phillips Perron (
PP.test
) or Augmented Dickey-Fuller (adf.test
in tseries
) tests
- Note that if unsure, can always use differences anyway
- If unit root, differencing restores stationarity
- If not a unit root, still stationary, prediction valid but may lose predictive power
- Can also test starting from null of stationarity (
kpss.test
in tseries
)
- Test approach is implemented in
auto.arima
- First uses KPSS test (with null of stationarity) to determine differencing, then AICc to choose lag order \(p\)
The Stakes: The “Unit Root Wars”
- Whether series has unit root makes large difference to long term forecasts and medium-to-long term policy choices
- Integrated series may deviate arbitrarily far from past mean, resulting in ever-wider prediction intervals
- After a change, stationary AR model predicts return to mean at exponential rate
- Speed depends on closeness of maximum root to 1, but exponential means that in medium term mostly move back
- Nonstationary GDP/Employment means losses from recession never made up
- Even if root 0.98, means losses made up over 1-2 decades: matters a lot for long term government budget plans
- Exchange rate deviations from “fundamental” values appear close to unit root
- If exactly unit root, countries cannot rely on returns to past values: sustainability of exchange rate policy may not be possible in this case
- Economists spent 80s-90s fighting over which series do and don’t have unit roots
- Hard to tell exactly, because tests have low power to detect small deviations from nonstationarity
- Can also have unit root, but “most” short term deviations driven by other roots, with short to medium term behavior “nearly” mean reverting
- For short-medium term forecasts, exact difference doesn’t matter so much
- For long run forecasts, use theory, like boundedness, etc to help decide
Application: Macroeconomic Forecasts
- Consider GNP forecasting application: follow Litterman (1986) in using following series
- GNP Growth, Inflation, Unemployment, M1 Money Stock, Private Fixed Investment, Commercial Paper Interest Rates, and Inventory Growth
- Use quarterly data 1971-2020Q1, instead of original 1948-1979
- Statistical approach tests each for stationarity, differences if found to be nonstationary, then estimates AR model
- As different tests have slightly different properties, one may reject stationarity, another may support it
- May rely on single test, recognizing chance of error, or be overly cautious and difference if unsure
- Run PP test with null of unit root with constant and trend, KPSS with null of stationarity plus linear trend, and (by auto.arima), KPSS with null of stationarity with no trend
- Use AICc after differencing according to KPSS results to get ideal AR order for each
- Corresponds to auto.arima (with certain options for more complicated models turned off)
- Here, inflation (GNP deflator) measured to be nonstationary by KPSS test, stationary by PP test at 5% threshold
- Macroeconomic theorists likewise fight about whether inflation stationary
- Inventory growth unit root results depend on whether trend included
- Can be hard to distinguish unit root from slight trend
Stationarity and AR order choice results
#Libraries
library(fredr) # Data from FRED API
library(fpp2) #Forecasting and Plotting tools
library(vars) #Vector Autoregressions
library(knitr) #Use knitr to make tables
library(kableExtra) #Extra options for tables
library(dplyr) #Data Manipulation
library(tseries) #Time series functions including stationarity tests
library(gridExtra) #Graph Display
# Package "BMR" for BVAR estimation is not on CRAN, but is instead maintained by an individual
# It must be installed directly from the Github repo: uncomment the following code to do so
# library(devtools) #Library to allow downloading packages from Github
# install_github("kthohr/BMR")
# library(BMR) #Bayesian Macroeconometrics in R
# Note that if running this code on Kaggle, internet access must be enabled to download and install the package
# If installed locally, there may be difficulties due to differences in your local environment (in particular, versions of C++)
# For this reason, relying local installation is not recommended unless you have a spare afternoon to dig through help files
#An alternative, similar library, is BVAR: it is on CRAN
library(BVAR) #Bayesian Vector Autoregressions
##Obtain and transform NIPA Data (cf Lecture 08)
fredr_set_key("8782f247febb41f291821950cf9118b6") #Key I obtained for this class
## Load Series: Series choices and names as in Litterman (1986)
RGNP<-fredr(series_id = "GNPC96",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
units="cch") #Real Gross National Product, log change
INFLA<-fredr(series_id = "GNPDEF",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
units="cch") #GNP Deflator, log change
UNEMP<-fredr(series_id = "UNRATE",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
frequency="q") #Unemployment Rate, quarterly
M1<-fredr(series_id = "M1SL",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
frequency="q",
units="log") #Log M1 Money Stock, quarterly
INVEST<-fredr(series_id = "GPDI",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
units="log") #Log Gross Domestic Private Investment
# The 4-6 month commercial paper rate series used in Litterman (1986) has been discontinued:
# For sample continuity, we merge the series for 3 month commercial paper rates from 1971-1997 with the 3 month non-financial commercial paper rate series
# This series also has last start date, so it dictates start date for series
CPRATE1<-fredr(series_id = "WCP3M",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("1996-10-01"),
vintage_dates = as.Date("2021-03-29"),
frequency="q") #3 Month commercial paper rate, quarterly, 1971-1997
CPRATE2<-fredr(series_id = "CPN3M",
observation_start = as.Date("1997-01-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29"),
frequency="q") #3 Month AA nonfinancial commercial paper rate, quarterly, 1997-2018
CPRATE<-full_join(CPRATE1,CPRATE2) #Merge 2 series to create continuous 3 month commercial paper rate series from 1971-2018
CBI<-fredr(series_id = "CBI",
observation_start = as.Date("1971-04-01"),
observation_end = as.Date("2020-01-01"),
vintage_dates = as.Date("2021-03-29")) #Change in Private Inventories
#Format the series as quarterly time series objects, starting at the first date
rgnp<-ts(RGNP$value,frequency = 4,start=c(1971,2),names="Real Gross National Product")
infla<-ts(INFLA$value,frequency = 4,start=c(1971,2),names="Inflation")
unemp<-ts(UNEMP$value,frequency = 4,start=c(1971,2),names="Unemployment")
m1<-ts(M1$value,frequency = 4,start=c(1971,2),names="Money Stock")
invest<-ts(INVEST$value,frequency = 4,start=c(1971,2),names="Private Investment")
cprate<-ts(CPRATE$value,frequency = 4,start=c(1971,2),names="Commercial Paper Rate")
cbi<-ts(CBI$value,frequency = 4,start=c(1971,2),names="Change in Inventories")
#Express as a data frame
macrodata<-data.frame(rgnp,infla,unemp,m1,invest,cprate,cbi)
nlags<-6 # Number of lags to use
nseries<-length(macrodata[1,]) #Number of series used
#Evaluate Stationarity of series by Phillips-Perron Tests, as well as KPSS tests
# accounting for constants and possible linear trend
#Use auto.arima to choose AR order after KPSS test without trend
stationaritytests<-list()
PPtests<-list()
KPSStests<-list()
ARIstatmodels<-list()
Integrationorder<-list()
ARorder<-list()
for (i in 1:nseries){
PPtests[i]<-PP.test(macrodata[,i])$p.value
KPSStests[i]<-kpss.test(macrodata[,i],null="Trend")$p.value
ARIstatmodels[[i]]<-auto.arima(macrodata[,i],max.q=0,seasonal=FALSE) #Apply auto.arima set to (nonseasonal) ARI only
Integrationorder[i]<-ARIstatmodels[[i]]$arma[6] #Integration order chosen (uses KPSS Test)
ARorder[i]<-ARIstatmodels[[i]]$arma[1] #AR order chosen (uses AIC)
}
Series<-c("Real GNP Growth","Inflation","Unemployment","Money Stock","Private Investment","Commercial Paper Rate","Change in Inventories")
stationaritytests<-data.frame(Series,as.numeric(PPtests),as.numeric(KPSStests),as.numeric(Integrationorder),as.numeric(ARorder))
rownames(stationaritytests)<-Series
colnames(stationaritytests)<-c("Series","PP p-value","KPSS p-value","Integration Order","AR Order")
stationaritytests %>%
mutate(
Series = row.names(.),
`PP p-value` = cell_spec(`PP p-value`,color = ifelse(`PP p-value`<0.05, "black", "red")),
`KPSS p-value` = cell_spec(`KPSS p-value`,color = ifelse(`KPSS p-value`<0.05, "red", "black")),
`Integration Order`= cell_spec(`Integration Order`,color = ifelse(`Integration Order`==1, "red", "black")),
`AR Order`=`AR Order`
) %>%
kable(escape=F,
caption="Chosen Models and P-Values for Stationarity Tests: Red indicates Unit Root") %>%
kable_styling(bootstrap_options = "striped")
Chosen Models and P-Values for Stationarity Tests: Red indicates Unit Root
|
Series
|
PP p-value
|
KPSS p-value
|
Integration Order
|
AR Order
|
Real GNP Growth
|
Real GNP Growth
|
0.01
|
0.1
|
0
|
2
|
Inflation
|
Inflation
|
0.01
|
0.01
|
1
|
4
|
Unemployment
|
Unemployment
|
0.395363350741117
|
0.0512422827489796
|
1
|
1
|
Money Stock
|
Money Stock
|
0.918305692612384
|
0.01
|
1
|
2
|
Private Investment
|
Private Investment
|
0.470464420263501
|
0.01
|
1
|
1
|
Commercial Paper Rate
|
Commercial Paper Rate
|
0.0771149514500132
|
0.0180787556682605
|
1
|
3
|
Change in Inventories
|
Change in Inventories
|
0.01
|
0.1
|
1
|
0
|
Real GNP Growth: Series, ACF, PACF, Estimated AR(2) Roots
rgnpplots<-list()
rgnpplots[[1]]<-autoplot(rgnp)+labs(x="Date",y="Percent Growth",title="Real GNP Growth")
rgnpplots[[2]]<-ggAcf(rgnp)+labs(title="Autocorrelation Function")
rgnpplots[[3]]<-ggPacf(rgnp)+labs(title="Partial Autocorrelation Function")
rgnpplots[[4]]<-autoplot(ARIstatmodels[[1]])
grid.arrange(grobs=rgnpplots,nrow=2,ncol=2)
Priors for Autoregression Models
- Common choice for a prior distribution over \(\beta\) is the \(N(b,\Sigma_b)\) prior
- \(b\) is prior mean of coefficients, biasing estimates towards a given value, \(\Sigma_b\) is prior variance and covariance over coefficient values
- \((\Sigma_b)_{(i,j)}\) is covariance between \((i,j)\) coefficients,
- As in Bayesian regression model, if \(b=0\), \((\Sigma_b)_{(i,j)}=0\) for \(i\neq j\), \(\xi\sigma^2\) for \(i=j\), MAP estimate is ridge regression
- Full posterior is also Normal, with mean given by MAP estimate
- \(\xi\sigma^2\) determines size of penalty
- Having constant prior covariance across regressors makes sense if all are on the same scale
- For autoregression, this is guaranteed because lags are same variable, shifted in time
- For case where other covariates added, common to normalize first: divide regressor by its standard deviation
- This is not exact Bayesian approach, since data is used to construct standard deviation
- Called “Empirical Bayes”, to contrast with “Hierarchical Bayes” approach of putting prior over scale of prior on coefficient
Variance Priors
- Usually, variance \(\sigma^2\) is also unknown, so need a prior for it
- Inverse Gamma distribution \(f(\sigma^2|\psi,d)\propto(\frac{1}{\sigma^2})^{\frac{d}{2}+1}\exp(\frac{-\psi}{2\sigma^2})\) common choice
- Leads to closed form posterior if coefficients known
- If both unknown, conditional posteriors have closed form, leading to fast MCMC algorithm (Gibbs sampler)
- Normal priors on \(\beta\), inverse Gamma on \(\sigma^2\) allow for fast simple computation and some choice
- Set mean of prior on \(\beta\) to describe expected persistence
- Variances of \(\beta\) prior, like penalty in ridge, describe how much data needed to lead to deviations from the mean
- Scale and shape parameters for \(\sigma^2\) determine variance predictions: less important for point forecasts
- Inference can be implemented by adding carefully chosen “bonus observations” to data before running OLS estimates
- Prior information equivalent to having seen additional data
stan_sarima
in bayesforecast
implements easy ARIMA in Stan
- \(\beta_j\overset{iid}{\sim}N(0,0.5)\) default strongly presumes stationarity
- \(\beta_0\sim t(0,1,7)\) and Variance prior default is half-t (just positive part)
- t distribution has thick tail, no closed form posterior: allows potentially large scale
Minnesota Priors
- As many economic series near to unit root, may be sensible to incorporate this knowledge in prior
- For Normal priors, this can be done by choosing \(b_1=1\) (and all other means to 0)
- Promotes nonstationarity but does not impose it: no need to test, per se
- But! Long run predictions depend a lot on exactly vs approximately one, and prior puts all weight on “approximately”
- Fine for short-to-medium term predictions, long term predictions may need to carefully think through implications
- Minnesota prior also simplifies choice of prior variances
- For constant term \(\beta_0\), choose large prior variance \(var\), as scale varies and depends on units of series
- For higher order lags, coefficients usually smaller, so want smaller variances around 0 mean to reflect this
- Choose \(\alpha\) to set rate of decrease proportional to \(s^\alpha\) at lag s
- Choose \(\lambda\) as scale for coefficients as a whole to reflect how close to prior mean to set them
- Prior variance of \(\beta_s\), \(s\geq 1\) set to \(\frac{\lambda^2}{s^\alpha}\), and covariances set to 0 to give independent priors over coefficients
- Prior allows including lots of lag coefficients, using just 3 prior parameter choices \(var,\lambda\) and decay rate \(\alpha\)
- Since prior variance for high lag coefficients small, forced to be close to 0 unless data strongly suggests otherwise
- Avoids problems of overfitting without performing strict variable selection
- Use when series is a stock: if already differenced, maybe center at 0
The Multivariate Case: Vector Autoregression Model
- Vector autoregression case almost identical to 1 variable case, but with more coefficients
- With \(m\) variables, \(y_t=(y_{1,t},\ldots,y_{m,t})\in\mathbb{R}^m\)
- For all \(t=p+1\ldots T\), \(i=1\ldots m\), the VAR(p) model is \(y_{i,t}=\beta_{0,i}+\sum_{s=1}^{p}\sum_{j=1}^{m}\beta_{s,ij}y_{j,t-s}+\epsilon_{i,t}\)
- Where for all t, i, j, \(h\neq0\), \(E[\epsilon_{i,t}]=0\), \(E[\epsilon_{i,t}\epsilon_{j,t+h}]=0\)
- Collecting coefficients in matrices, can write as \(y_t=A_0+\sum_{s=1}^{p}A_sy_{t-s}+\epsilon_t\)
- Typical to assume \(\epsilon_t\overset{iid}{\sim} N(0,\Sigma)\), where \(\Sigma\in\mathbb{R}^{m\times m}\) gives present covariance
- (log) Likelihood again takes (multivariate) normal form: penalized least squares
- Stationarity conditions analogous to m=1 case: roots (of particular polynomial) inside unit circle
roots
in library vars
can display
Application: Litterman (1986) Macroeconomic Forecasts
- Minnesota priors developed for macro forecasting applications at the Federal Reserve
- Alternative to large, complicated and somewhat untrustworthy explicitly economic models used at the time
- Allows many variables to enter in unrestricted way, but minimizes overfitting due to priors
- Litterman went on to be chief economist at Goldman Sachs, and method spread to private sector
- Example from Litterman (1986): BVAR with Minnesota priors over 7 variables from before
- 6 lags in each variable, with a constant
- Fix prior parameter \(\alpha=2\) and set \(\psi\) scale by empirical Bayes as variance of AR residuals
- Set prior variance of \(\beta_0\) be \(1e7\): essentially unrestricted scale
- Estimate \(\lambda\) “hierarchically”: put a prior on it
- \(\sim\) Gamma with mean 0.2, sd 0.4: posterior \(mean\approx0.23\), \(sd\approx 0.03\)
- Implement using
bv_minnesota
function in BVAR
library
- Forecasts and corresponding intervals appear reasonable despite that some series (M1, Investment) are clearly trending
Forecasts from ARI (Blue) and BVAR (Red)
#Label observations with dates
rownames(macrodata)<-M1$date
#Set priors: use hierarchical approach, with default scales, to account for uncertainty
#Note that defaults are for series in levels, while some here are growth rates
mn <- bv_minnesota(lambda = bv_lambda(mode = 0.2, sd = 0.4, min = 0.0001, max = 5),
alpha = bv_alpha(mode = 2),
var = 1e07)
#Ghost samples to set priors on initial conditions:
# SOC shifts towards independent unit roots
# SUR shifts toward shared unit root
# Hierarchical approach allows strength to be determined by data
#soc <- bv_soc(mode = 1, sd = 1, min = 1e-04, max = 50)
#sur <- bv_sur(mode = 1, sd = 1, min = 1e-04, max = 50)
priors <- bv_priors(hyper = c("lambda"), #Choose lambda by hierarchical method, leave others fixed at preset
mn = mn) #Set Minnesota prior as defined above
#Add options soc = soc, sur = sur to set initial observations priors
# For this data, hierarchical estimation sets fairly smal values for these, and resulting forecasts are about the same
# except with slightly narrower intervals for far future forecasts
#Set up MCMC parameters: use default initialization except that scaling is adjusted automatically in burnin
#If this is not enough, may need to adjust manually: check diagnostics
mh <- bv_metropolis(adjust_acc = TRUE, acc_lower = 0.25, acc_upper = 0.45)
# Usually automatic adjustment will work okay so long as burnin and n_draws both long enough
#Sample from posterior predictive distribution
predict(run) <- predict(run, horizon = 20, conf_bands = c(0.05, 0.16))
## Can plot using following commands, but I will use ggplot for compatibility with forecast libraries
#plot(predict(run),area=TRUE,t_back = 190)
## With library(BVARverse) also have plotting option
#bv_ggplot(predict(run),t_back = 190)
#Alternative approach: BMR library
# Available at https://www.kthohr.com/bmr.html
#Uses slightly different formulation of and notation for Minnesota prior
#There are some useful options available in this library,
# but it is not on CRAN and is not compatible with Kaggle's backend, so not run by default
#Convert to a data frame
bvarmacrodata <- data.matrix(macrodata)
#Set up Minnesota-prior BVAR object, and sample from posterior by MCMC
# See https://www.kthohr.com/bmr_docs_vars_bvarm.html for syntax documentation: manual is out of date
bvar_obj <- new(bvarm)
#Construct BVAR with nlags lags and a constant
bvar_obj$build(data_endog=bvarmacrodata,cons_term=TRUE,p=nlags)
#Set random walk prior mean for all variables
coef_prior=c(1,1,1,1,1,1,1)
# Set prior parameters (1,0.2,1) with harmonic decay
bvar_obj$prior(coef_prior=coef_prior,var_type=1,decay_type=1,HP_1=1,HP_2=0.2,HP_3=1,HP_4=2)
#Sample from BVAR with 10000 draws of Gibbs Sampler
bvar_obj$gibbs(10000)
#Construct BVAR Forecasts
bvarfcst<-BMR::forecast(bvar_obj,periods=20,shocks=TRUE,plot=FALSE,varnames=colnames(macrodata),percentiles=c(.05,.50,.95),
use_mean=FALSE,back_data=0,save=FALSE,height=13,width=11)
#Warning: command is incredibly slow if plot=TRUE is on, fast otherwise
# Appears to be issue with plotting code in BMR package, which slows down plotting to visualize
# With too many lags and series, have to wait through hundreds of forced pauses
for (i in 1:nseries){
BVAR<-ts(bvarfcst$forecast_mean[,i],start=c(2020,2),frequency=4,names=Series[i]) #Mean
lcband<-ts(bvarfcst$plot_vals[,1,i],start=c(2020,2),frequency=4,names=Series[i]) #5% Lower confidence band
ucband<-ts(bvarfcst$plot_vals[,3,i],start=c(2020,2),frequency=4,names=Series[i]) #95% Upper confidence band
fdate<-time(lcband) #Extract date so geom_ribbon()knows what x value is
bands<-data.frame(fdate,lcband,ucband) #Collect in data frame
}
#Construct Forecasts of Each Series by Univariate AR models, with 95% confidence intervals
ARIfcsts<-list()
for (i in 1:nseries) {
ARIfcsts[[i]]<-forecast::forecast(ARIstatmodels[[i]],h=20,level=95)
}
# ADD VAR and ARI forecasts to plot
forecastseriesplots<-list()
#Plot ARI model forecast along with BVAR forecasts, plus respective 95% intervals
#Commented out alternative prior method
for (i in 1:nseries){
BVAR<-ts(run$fcast$quants[3,,i],start=c(2020,2),frequency=4,names=Series[i]) #Mean
lcband<-ts(run$fcast$quants[1,,i],start=c(2020,2),frequency=4,names=Series[i]) #5% Lower confidence band
ucband<-ts(run$fcast$quants[5,,i],start=c(2020,2),frequency=4,names=Series[i]) #95% Upper confidence band
fdate<-time(lcband) #Extract date so geom_ribbon() knows what x value is
bands<-data.frame(fdate,lcband,ucband) #Collect in data frame
#BMR_BVAR<-ts(bvarfcst$forecast_mean[,i],start=c(2020,2),frequency=4,names=Series[i]) #Mean
#BMR_lcband<-ts(bvarfcst$plot_vals[,1,i],start=c(2020,2),frequency=4,names=Series[i]) #5% Lower confidence band
#BMR_ucband<-ts(bvarfcst$plot_vals[,3,i],start=c(2020,2),frequency=4,names=Series[i]) #95% Upper confidence band
#BMR_bands<-data.frame(fdate,BMR_lcband,BMR_ucband) #Collect in data frame
forecastseriesplots[[i]]<-autoplot(ARIfcsts[[i]])+autolayer(BVAR)+
geom_ribbon(aes(x=fdate,ymin=lcband,ymax=ucband),data=bands,alpha=0.2,colour="red")+
#autolayer(BMR_BVAR)+
#geom_ribbon(aes(x=fdate,ymin=BMR_lcband,ymax=BMR_ucband),data=BMR_bands,alpha=0.2,colour="green")+
labs(x="Date",y=colnames(macrodata)[i],title=Series[i])
}
grid.arrange(grobs=forecastseriesplots,nrow=4,ncol=2)
Interpretation
- Point forecasts differ somewhat, but within respective uncertainty intervals
- Note: Bayesian predictive intervals are not confidence intervals and vice versa
- Predictive intervals give best average-case prediction of distribution given data over the model
- Confidence intervals construct region containing data 95% of the time, if model specification true
- BVAR intervals wider for persistent series: account for, e.g., uncertainty over whether to difference
- BVAR uses longer lags and prior centered at unit root to account for persistent dynamics
- Prior over higher lags reduces variability of estimates, reduces overfitting by regularization
- Test-based approach attempts to detect unit roots and difference them out, then find best stationary AR model
- Model selection by AICc acts against overfitting, but leaves remaining coefficients unrestricted
Conclusions
- Autoregression model can describe autocorrelation properties of series
- Encompasses stationary and nonstationary processes, depending on roots of lag polynomial
- Unit root series exhibit long run unpredictable deviations from mean or any fixed trend
- Can use unit root tests to detect and restore stationary by differencing
- Remaining series a stationary AR model, so estimate by usual ERM
- Alternately, can apply Bayesian approach to estimate directly without transformation
- Prior centered on unit root process, like Minnesota prior, can allow full range of behavior
- But results need not be similar to test-based approach
- Long-run behavior strongly influenced by model and prior, so use knowledge an common sense
- Vector autoregression extends these properties to multivariate case
- Regularization crucial for large models, which makes Bayesian approach especially useful
References
- Thomas Doan, Robert Litterman & Christopher Sims “Forecasting and conditional projection using realistic prior distributions” Econometric Reviews Vol 3. No 1 (1984)
- Introduced the Minnesota prior
- Domenico Giannone, Michael Lenza, & Giorgio Primiceri “Prior Selection for Vector Autoregressions” Review of Economics and Statistics, Vol 97 No 2 (May 2015) 436–451
- How to pick parameters of Minnesota prior
- Basis for notation and defaults in BVAR R library
- Michael Lenza & Giorgio Primiceri “How to Estimate a VAR after March 2020” (2020)
- Notes that 2020 outliers make normal likelihood & prior perform badly
- Suggests restoring stationarity by dividing \(y_t/c_t\) for \(c_t\) scale of outlier
- Robert B. Litterman, “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience” Journal of Business & Economic Statistics, Vol. 4, No. 1 (Jan., 1986), pp. 25-38
- Keith O’Hara, “Bayesian Macroeconometrics in R” (2015) (https://www.kthohr.com/bmr.html)
- Alternate library for Bayesian VARs and related models