Department of Economics
The University of Melbourne
ECOM30001/ECOM90001: Basic Econometrics
Semester 1, 2022
Assignment 2
Introduction
Researchers and policy makers are interested in the (statistical) relationship between a
worker’s level of education and the (hourly) wage that they receive. Consider the following
econometric model:
ln wagei = β0 + β1 educi + δXi + εi (1)
where X represents a full set of control variables that are important determinants of wages
and εi is a random error which is (approximately) normally distributed with ε ~ N (0, σ2ε).
While it is expected that β1 > 0 such that more educated workers generally earn higher
wages (conditional upon other determinants of wages), you are ultimately interested in
magnitude of β1.
1
The data file assignment2.csv contains 2,008 observations on individuals currently aged
24-34 at the time of interview, which may be used to estimate the econometric model
(1). This data file contains the following variables
lnwage=Natural logarithm of hourly wage
educ=Years of completed education
exper=Years of labour market experience
expersq=Years of labour market experience squared
disadv=1 if live in a disadvantaged region, 0 otherwise
city=1 if live in a major city, 0 otherwise
The data file also provides some variables for these individuals ten years prior to interview:
city10= 1 if lived in a major city 10 years ago, 0 otherwise
regionj=1 if lived in region j 10 years ago, 0 otherwise, j = 1, 2, . . . 4
The data file also contains some information on the education of the individual’s parents:
meduc=Completed years of education of mother
feduc=Completed years of education of father
You will need to use the following packages to complete this assignment:
stargazer : for easily generating regression output
car : for easily conducting hypothesis tests in R
sandwich : for calculting robust standard errors in R
AER : for estimating linear models using the Instrumental Variable (IV) estimator in R
These can be installed directly in RStudio from the packages tab or by using the com-
mand install.packages() and inserting the name of the package in the brackets.
Note: You are required to complete this assignment using the R statistical software.
Please include a copy of your R script file with your assignment submission (for an
additional five (5) marks).
Please insert your full R script file as an Appendix attached to the end of your submitted
assignment. You only need to submit a single file for your assignment that contains both
your assignment answers and your R script file. Do not submit your R script file as
a separate file in Canvas.
2
a) [10 marks] Consider the following econometric model:
ln wagei = β0 + β1 educi + β2 experi + β3 expersq100i
+ β4 disadvi + β5 cityi
+ β6 city10i +
4∑
j=2
β7j regionji + εi (2)
where region1 (Region 1) is the omitted category.
Note that:
expersq100i =
expersqi
100
=
exper2i
100
i) [2 marks] What is the interpretation of the population parameter β1 in model
(2)
ii) [2 marks] What is the interpretation of the population parameter β4 in model
(2)
iii) [6 marks] Estimate the econometric model (2) by the method of Ordinary
Least Squares (OLS) with robust (Huber-White) standard errors. Us-
ing the car package and a 5% level of significance, test the hypothesis that
all the variables relating to the individual’s geographic location 10 years ago
are important determinants of wages. Your answer should clearly state the
null and alternative hypothesis, the distribution of the test statistic, and your
conclusion.
b) [6 marks] Consider an extended version of model (2) that includes the two (2)
variables for parental education (motheduc and fatheduc).
ln wagei = β0 + β1 educi + β2 experi + β3 expersq100i
+ β4 disadvi + β5 cityi
+ β6 city10i +
4∑
j=2
β7j regionji
+ β8 meduci + β9 feduci + εi (3)
where region1 (Region 1) is the omitted category.
Estimate the econometric model (3) by the method of Ordinary Least Squares
(OLS) with robust (Huber-White) standard errors. Using the car package
and a 5% level of significance, test the hypothesis that the parental education
variables are jointly important determinants of wages. Your answer should clearly
state the null and alternative hypothesis, the distribution of the test statistic, and
your conclusion. Note that model (2) is now the restricted model.
3
c) [4 marks] Consider the econometric model (3). Do you think the condition:
COV(educi, εi|Xi) = 0
is likely to be satisfied. Outline at least one possible reason why this condition
might not be satisfied. Clearly explain the consequences for the OLS estimator if
this condition is not satisfied.
d) [6 marks] The sample is derived from a large country with a large number of
universities, geographically dispersed across the country. Consider the following
indicator variable:
near