PUBL0062 Data Analysis for Public Policy

Instructions:
This assignment should be done with the help of statistical software (Stata) and the dataset
labeled ‘homework2.dta’ available under week 9 in Moodle. For each question, students should
make sure to demonstrate an understanding of the results in a practical/real world sense (not
just statistical interpretation). Where applicable, students should copy/paste output
(regression/charts) into their submission. Please also make sure to include your student
number on the submitted work.
Questions 1- 5 (below) follow the sequence of the course up to week 6 and should be answered
in a separate file. Final marks are computed out of a total of 63 marks. Note that students who
compute the wrong number but provide correct interpretations and demonstrate a good
understanding will receive higher marks than students who compute the correct number but
provide incorrect interpretations.
All of the required formulas and explanations to complete this assignment can be found in
Lecture slides from weeks 2, 5, 6, 7, 8. Completed assignments should be submitted by 2:00pm
on Monday the 13
th of December, 2021.
In the ‘homework2.dta’ dataset, students will find a series of dynamic cross-country data for
the years 1990-2016. All variable are labelled so that you can interpret/understand their
meaning and coding. [note that GDP per capita variable is in thousands of US dollars and should
be interpreted as such]. Pay attention to the subscripts in the regressions below (‘i’, ‘t’, ‘it’)
when interpreting the results. Lastly, note that 2e, 3e, 4e test your ability to run and
understand multivariate regression from lecture 6.
Question 1: Practical Data Issues (8 marks)
1a) What type of dataset is this (recall from lecture 2)? (1 mark)
1b) Looking at summary statistics, the series for grant revenue (as % GDP) has a relatively large
maximum. Which country/year is this for? From a quick internet search, does this number seem
justified? If not, what would be the best way of dealing with this outlier? (3 marks)
1c)Looking at our regime durability variable, who are the four most durable regimes in our sample?
(2 marks)
Inference Testing
For questions 2, 3, 4, suppose that you are hired to consult on increasing a country’s tax capacity (i.e.
they want to collect more tax revenue). Fortunately we have a dataset (homework2) that already has
our dependent variable (c_totx) and some potential explanatory variables.
1d) Plot a histogram of total tax revenue and paste it into your homework with a record of the
mean, standard deviation, min and max (2 marks)
Question 2: Static Cross-Country Inference Testing (15 marks)
2a) Beginning with a simple cross-country model where we hypothesize that total tax revenue will
be partly determined by how wealthy a country is (GDP per capita), we could run a bivariate
regression where:
???????? = ??(??????????????
) = ?? + ????????????????
Note that we are only interested in cross-country differences here (across countries i). Suppose that
we want to only look at pooled cross-country effects of GDP per capita on tax revenue in the year
2000, we could run a regression (reg y x if year==2000). Run this regression and paste the output
into your homework. (3 marks)
2b) What is the relationship between GDP per capita and tax revenue in 2000? Is this relationship
significant? (3 marks)
2c) For a country that increases their GDP per capita by 2 thousand US Dollars, how much additional
tax revenue should that government expect to collect? (3 marks)
2d) What can we say about the ‘goodness of fit’ in this model? (2 marks)
2e) Given the choice of other variables in the dataset, use your intuition to select two (or more)
additional variables for the cross-country regression (for the year 2000). What is the effect of these
additional variables? Are these significant? Has our ‘goodness of fit’ improved? (4 marks)
Question 3: Dynamic Within-Country Inference Testing (15 marks)
3a) Beginning with a simple within-country model where we hypothesize that total tax revenue will
be partly determined by how wealthy a country is (GDP per capita), we could run a bivariate
regression where:
???????? = ??(??????????????
) = ?? + ????????????????
Note that we are only interested in tax revenue from a single country over time (across years t).
Suppose that we want to only look at within-country effects of GDP per capita on tax revenue in the
United Kingdom, we could run a regression (reg y x if country==”United Kingdom”). Run this
regression and paste the output into your homework. (3 marks)
3b) What is the relationship between GDP per capita and tax revenue in the UK over time? Is this
relationship significant? For how many years did you have data? (3 marks)
3c) Suppose that GDP per capita in the UK decreases by 2 thousand US Dollars, how much less tax
revenue should that government expect to collect? (3 marks)
3d) What can we say about the ‘goodness of fit’ in this model? (2 marks)
3e) Given the choice of other variables in the dataset, use your intuition to select two (or more)
additional variables for this within-country regression (for the UK over time). What is the effect of
these additional variables? Are these significant? Has our ‘goodness of fit’ improved? (4 marks)
Question 4: Time Series Cross-Country Inference Testing (15 marks)
4a) Noting that we have data which is both dynamic (within countries) and comparative (across
countries) we could take advantage of all of this information by running a regression testing the
hypothesis that total tax revenue will be partly determined by how wealthy a country is (GDP per
capita), we could run a bivariate regression where:
?????????? = ??(????????????????) = ???? + ??????????????????
Suppose that we want to take full advantage of our rich dataset for predicting the effects of GDP per
capita on tax revenue across countries over time, we could run a ‘fixed effects’ regression (xtreg y x,
fe). Run this regression and paste the output into your homework. (3 marks)
4b) What is the relationship between GDP per capita and tax revenue in this specification? Is this
relationship significant? For how many countries do you have data? What is the average number of
years covered? (3 marks)
4c) Suppose that GDP per capita increases by 1 thousand US Dollars in any country, how much less
revenue should that government expect to collect? (3 marks)
4d) What can we say about the ‘goodness of fit’ in this model? (2 marks)
4e) Given the choice of other variables in the dataset, use your intuition to select two (or more)
additional variables for this fixed effects regression. What is the effect of these additional variables?
Are these significant? Has our ‘goodness of fit’ improved? (4 marks)
Question 5: Interpreting these Results for Your Client (10 marks)
5a) From 2e, 3e, 4e, chose your ‘best’ model in terms of statistical validity and being able to ‘sell’ the
results to your non-quant client (this is up to you to pick). Write out the theoretical equation and fitted
equation (with results). Also, justify why you chose this specification. (5 marks)
5b) Write a short summary of your analysis (from 5a) including what policy measures would be
necessary to increase tax revenue capacity and what your client should expect in terms of results if your
policy measure are implemented. (5 marks)
Total Marks = 63