Multivariate Regression

Wages vs Education Again (code)

# Obtain access to data sets used in our textbook
library(foreign) 
#Load library to make pretty table
suppressWarnings(suppressMessages(library(stargazer))) 
# Import data set of education and wages
wage1<-read.dta(
  "http://fmwww.bc.edu/ec-p/data/wooldridge/wage1.dta")
# Regress log wage on years of education 
wageregoutput <- lm(formula = lwage ~ educ, data = wage1)

# Scatter plot with regression line
plot(wage1$educ,wage1$lwage, xlab = "Years of Education",
      ylab = "Log Wage", main = "Wage vs Education")
abline(wageregoutput,col="red")

Wages vs Education Again

Results Table (Code)

stargazer(wageregoutput,header=FALSE,
    type="html",
    omit.stat=c("adj.rsq"),
    font.size="tiny", 
    title="Regression of Log Wage on Years of Education")

Results Table

Regression of Log Wage on Years of Education
Dependent variable:
lwage
educ 0.083***
(0.008)
Constant 0.584***
(0.097)
Observations 526
R2 0.186
Residual Std. Error 0.480 (df = 524)
F Statistic 119.582*** (df = 1; 524)
Note: p<0.1; p<0.05; p<0.01

Additional predictors

Wages vs Education and Experience (Code)

# Install 3d plot package if not installed yet
install.packages("scatterplot3d",
        repos = "http://cran.us.r-project.org")
library(scatterplot3d)
#Plot 3d Scatter
scatterplot3d(wage1$educ,wage1$exper,wage1$lwage,
    color="red", 
    main="Log Wages vs Education and Experience",
    xlab="Years of Education",
    ylab="Years of Work Experience",
    zlab="Log Wage")

Wages vs Education and Experience

Data

What is multivariate regression for?

Ordinary Least Squares Estimator (OLS)

Regression in Wage Example

wageregression2 <- lm(formula = 
                lwage ~ educ + exper, data = wage1)

Wages vs education & experience: Regression Results (code)

# Regress log wage on years of education and experience
wageregression2 <- lm(formula = lwage ~ educ + exper, 
                    data = wage1)
#Load library to make pretty table
library(stargazer) 
stargazer(wageregression2,header=FALSE,report="vc",
      type="html",
      omit.stat=c("all"),omit.table.layout="n",
      font.size="small",
      title="Log Wage vs Years of Education, 
      Years of Experience")

Wages vs education & experience: Regression Results

Log Wage vs Years of Education, Years of Experience
Dependent variable:
lwage
educ 0.098
exper 0.010
Constant 0.217

Wages vs education & experience: Regression Visualization (Code)

# Plot data on 3D Scatterplot 
s3d<-scatterplot3d(wage1$educ,wage1$exper,wage1$lwage,
    color="red", 
    main="Log Wages vs Education and Experience, 
    with Best Fit Plane",
    xlab="Years of Education",
    ylab="Years of Work Experience",
    zlab="Log Wage")
# Add regression plane to plot
s3d$plane3d(wageregression2, 
            lty.box = "solid")

Wages vs education & experience: Regression Visualization

Ways to derive an estimator

  1. Empirical Risk Minimization
  2. Method of Moments
  3. Maximum Likelihood Estimation

Interpretations of OLS, 1: Empirical risk minimizer

Interpretations of OLS, 2: Method of moments

  1. In population, \(y=\beta_0+\beta_{1}x_{1}+\beta_{2}x_{2}+\ldots+\beta_{k}x_{k}+u\)
  2. \({(y_i,\mathbf{x}_i^\prime):i=1 \ldots n}\) are independent random sample of observations following (1)
  3. \(E(u_{i}x_{ji})=0\) for \(j=0\ldots k\)

Formula for Estimator

Method of Moments

Interpretations of OLS, 3: Maximum likelihood estimator

MLE view

Assumptions used for Linear Models

  1. In population, \(y=\beta_0+\beta_{1}x_{1}+\beta_{2}x_{2}+\ldots+\beta_{k}x_{k}+u\)
  2. \({(y_i,\mathbf{x}_i^\prime):i=1 \ldots n}\) are independent random sample of observations following 1
  3. There are no exact linear relationships among the variables \(x_1 \ldots x_k\)
  4. \(E(u|\mathbf{x})=0\)
  5. \(Var(u|x)=\sigma^2\) a constant \(>0\)
  1. \(u \sim N(0,\sigma^2)\)

Estimator Properties

Asymptotic Properties and Distribution

Inference: single parameter

Inference: multiple parameters

Performing tests in R

summary(wageregression2)

Output (Code)

#Display Table
stargazer(wageregression2,header=FALSE,
          type="html",
    font.size="tiny", 
    title="Log Wage vs Years of Education, 
    Years of Experience")

Output (Cleaned Up)

Log Wage vs Years of Education, Years of Experience
Dependent variable:
lwage
educ 0.098***
(0.008)
exper 0.010***
(0.002)
Constant 0.217**
(0.109)
Observations 526
R2 0.249
Adjusted R2 0.246
Residual Std. Error 0.461 (df = 523)
F Statistic 86.862*** (df = 2; 523)
Note: p<0.1; p<0.05; p<0.01

Variable choice

How does adding experience change education coefficient? (Code 1)

#Run short regression without experience directly
wageregression1<-lm(formula = lwage ~ educ, data = wage1)
betatilde1<-wageregression1$coefficients[2]

#Run regression of omitted variable on included variable
deltareg<-lm(formula = exper ~ educ, data = wage1)

##Display Table with all results
stargazer(wageregression1,wageregression2,deltareg,type="html",
        header=FALSE,report="vc",omit.stat=c("all"),
        omit.table.layout="n",font.size="small", 
        title="Included and Excluded Experience")

How does adding experience change education coefficient? (Code 2)

#Construct short regression coefficient 
#from formula on next slide
delta1<-deltareg$coefficients[2]
betahat1<-wageregression2$coefficients[2] 
betahat2<-wageregression2$coefficients[3] 
omittedformula<-betahat1+betahat2*delta1

How does adding experience change education coefficient?

Included and Excluded Experience
Dependent variable:
lwage exper
(1) (2) (3)
educ 0.083 0.098 -1.468
exper 0.010
Constant 0.584 0.217 35.461

Omitted variables formula

Bias?

Interpretation

More on (3): Multicollinearity

Interpreting multicollinearity

Handling multicollinearity in practice

Output (Code)

#Initialize random number generator
set.seed(42)
#Draw 100 standard normal random variables
xa<-rnorm(100) 
xb<-rnorm(100) #Draw 100 more
#Define 3rd variable as linear combination of first 2
xc<-3*xa-2*xb  
#define y as linear function in all variables + noise
y<-1*xa+2*xb+3*xc+rnorm(100) 
#Regress y on our 3 redundant variables
(multireg <-lm(y~xa+xb+xc)) 

Output

## 
## Call:
## lm(formula = y ~ xa + xb + xc)
## 
## Coefficients:
## (Intercept)           xa           xb           xc  
##    0.001766     9.856291    -3.914707           NA

Next Time

(Advanced) Explicit formula for \(\hat{\beta}\)

(Advanced) Proof of unbiasedness

(Advanced) Exact and Asymptotic Variance Formulas