ETC3410 Practice Exam Answer Key Note that this is a very brief version of answer key. INSTRUCTIONS TO STUDENTS Answer ALL questions. Statistical tables are located at the end of this exam paper. (For the practice exam, statistical tables are available on Moodle as a separate PDF file.) Question 1 (30 marks) 1. A researcher studied the effects of potential experience on being a union member with the 1988 Current Population Survey data (“union.csv”). Variables in the data are as follows. Variable Description unioni 1 if the ith individual is a union member potexi years of potential experience of individual i: age-years of schooling-5 educi years of schooling of individual i marriedi 1 if the ith individual is married highi 1 if the ith individual is in a highly unionized industry All the relevant variables are included in “union.csv” under the same variable names as described above. (a) What is linear probability model Briefly explain in words. (2 marks) Answer: The linear probability model is a model that can be used to estimate response proba- bilities with a binary dependent variable. Response probabilities, the conditional expectations of a binary dependent variable given all the explanatory variables are represented by a lin- ear function of the parameters, which can be estimated by ordinary least squares estimation (OLS). 1 (b) The researcher wanted to estimate the logit model unioni = Λ(β0 + β1potexi + β2potex2i + β3educi + β4marriedi + β5highi) + ui, (1) How would you run logit estimation for the above model with R Write down the required R commands. (You do not need to use robust standard errors.) In addition, write down the expression of marginal effects of having one more year of potential experience for a non-married individual who works in highly unionized industry with 10 years of education and two years of potential experience. Consider potex a continuous regressor. (5 marks) Answer: For the required R commands, union data← read csv(“union.csv”) union logit← glm(union ~ potex+I(potex 2)+educ+married+high, data = union data, family = binomial(link = “logit”)) summary(union logit) For the expression of marginal effects: MEpotex = union potex = λ(β0 + 2β1 + 4β2 + 10β3 + β5) (β1 + 4β2) (c) The researcher obtained the following output for the logit model in (1): u nioni = Λ( 1.47 (0.296) + 0.081 (0.016) potexi 0.002 (0.0003) potex2i 0.042 (0.019) gradei 0.062 (0.113) marriedi 0.561 (0.099) highi) l = 475.25, n = 1000 (2) where l is the value of log-likelihood, and standard errors are in parentheses. Use the infor- mation in (2), to answer the following questions. 2 i) Calculate the value of marginal effects in part (b). Use e = 2.7. (3 marks) Answer: M Epotex = λ( 1.47+2 0.081 4 0.002 10 0.042 0.561) (0.081 4 0.002) ≈ 0.0061 ii) Test the following hypotheses at 5% level of significance: H0 : β1 = 0.05 vs. H0 : β1 6= 0.05. Clearly specify the test statistic, its distribution under the null hypothesis, the critical value of the test statistic, the rejection rule for the test and your conclusion. (4 marks) Answer: H0 : β1 = 0.05 H1 : β1 6= 0.05 z = β 1 β1 se(β 1) asy~ N(0, 1) under H0 reject H0 if |zcalc| > zcrit zcalc = 0.081 0.05 0.016 = 1.9375 zcrit,0.95 = 1.96 zcalc = 1.9375 < 1.96, we fail to reject the H0 at 5% significance level Hence β1 = 0.05 at 5% significance level. (d) The researcher wished to test the following hypotheses at the same time. H0 : β2 = 1 β1 = β3 β4 + 2β5 β2 + β3 = β5 + 0.5 3 i) Write down the above hypotheses in the form of H0 : Rβ = r. Clearly specify R, β, and r. (3 marks) Answer: R = 0 0 1 0 0 0 0 1 0 1 1 2 0 0 1 1 0 1 β = β0 β1 β2 β3 β4 β5 r = 1 0 0.5 ii) Briefly explain “Wald Test” in words. (3 marks) Answer: Wald test is a way for testing multiple hypotheses in a variety of econometric settings. Typically it has an asymptotic chi-square distribution. (e) The researcher also wanted to test the following hypotheses using the likelihood ratio test. H0 : β1 = β2 = 0 i) Write down the restricted (logit) model to implement the LR test for the above hypotheses. (3 marks). Answer: unioni = Λ(β0 + β3educi + β4marriedi + β5highi) + ui ii) What is the distribution (and the degree of freedom if applicable) of the LR test statistic when the researcher test the above hypotheses (2 marks) Answer: LR asy~ χ2(2) 4 2. State if each of the following statements is true or false, and BRIEFLY give your reasoning. (a) The maximized value of the log-likelihood for the restricted model is less than or equal to that for the unrestricted model. (2 marks) Answer: True. The value of log-likelihood function for the unrestricted model can be the that of log-likelihood function for the restricted model when parameter values are set as those reflecting restrictions while its value can get higher by allowing parameters to take values more freely. (b) The Wald test statistic W is equivalent to W = q×LR, where q is the number of restrictions in the null hypotheses and LR is the corresponding likelihood-ratio test statistic for the same test. (3 marks) Answer: False. W=q*F, where q is the number of restrictions in the null hypotheses and F is the corresponding F-test statistic for the same test. Since F and LR give different test statistics in general, so it is false. Question 2 (40 marks) 1. An econometrician wants to investigate the effects of smoking on exam scores of university students. The researcher collected a dataset (“univ.csv”) including the following variables Variable Description lscorei individual i ’s log average exam scores cigi number of cigarettes of the ith individual smokes per day genderi 1 if the individual is male agei age of the ith individual in years fcigi number of cigarettes of the individual i’s father smokes mcigi number of cigarettes of the individual i’s father smokes othcigi number of cigarettes of the individual i’s other family members smoke All the relevant variables are included in “univ.csv” under the same variable names as described above. 5 He estimated the following equation, lscorei = β0 + β1cigi + β2genderi + β3agei + ui, (5) by using OLS. He is advised that students who have low exam scores may smoke more due to stress on grades. So, he also estimated the equation (5) using 2SLS with fcigi as an instrumental variable for cigi, and 2SLS using fcigi,mcigi, othcigi as a set of instrumental variables for cigi. He also estimated the following equation using OLS: cigi = pi0 + pi1fcigi + pi2mcigi + pi3othcigi + pi2genderi + pi3agei + vi.(6) Finally, he estimated two extra equations using OLS: lscorei = β0 + β1cigi + β2genderi + β3agei + αv i + εi, (7) and u i = δ0 + δ1fcigi + δ2mcigi + δ3othcigi + δ4genderi + δ5agei + i, (8) where v i is the residual from OLS estimation of (6), and u i is the residual from 2SLS estimation of (5) using fcigi,mcigi, othcigi as a set of instrumental variables.The estimation results are as follows 6 Relevant eqn (5) (5) (5) (6) (7) (8) Dependent var lscorei lscorei lscorei cigi lscorei u i Method OLS 2SLS 2SLS OLS OLS OLS Regressors cigi 0.323 (0.057) 0.244 (0.109) 0.237 (0.108) 0.322 (0.056) genderi 0.021 (0.005) 0.020 (0.005) 0.019 (0.005) 0.012 (0.041) 0.021 (0.005) 0.005 (0.012) agei 1.258 (0.561) 1.254 (0.581) 1.255 (0.580) 0.029 (0.033) 1.254 (0.544) 0.002 (0.004) v i 0.014 (0.009) fcigi 2.114 (0.305) 0.002 (0.003) mcigi 3.239 (0.517) 0.004 (0.004) othcigi 3.019 (0.812) 0.003 (0.005) Instruments fcigi fcigi mcigi othcigi R2 0.1713 0.1145 0.1362 0.1514 0.1931 0.0093 N 817 817 817 817 817 817 Using the information provided above, answer the following questions. (a) Write down all required R commands that carry out the 2SLS estimation together with its first-stage result using fcigi,mcigi, othcigi as a set of instrumental variables for cigi. (2 marks) Answer: 7 univ data← read csv(“univ.csv”) s1← lm(cig ~ fcig +mcig + othcig + gender + age, data = univ data) summary(s1) cighat← fitted(s1) s2← lm(lscore ~ cighat+ gender + age, data = univ data) summary(s2) Note: 2SLS can be conducted via univ data← read csv(“univ.csv”) s3← ivreg(lscore ~ cig + gender + age|fcig +mcig + othcig + gender + age, data = univ data) summary(s3) However, this does not yield the first-stage result so it is not the requested answer. (b) From the equation (5), answer the following question. What happens to the individual’s test score, if an individual smokes two more cigarettes per day, ceteris paribus. (2 marks) Answer: The individual’s test score will change by 2 × 100β1% if an individual smokes two more cigarettes per day, ceteris paribus. (c) By using 2SLS regression results, where fcigi is the only instrumental variable outside of the equation (5), test the following hypotheses at 5% level of significance: H0 : β1 = 0.1 vs. H0 : β1 6= 0.1. Clearly specify the test statistic, its distribution under the null hypothesis, the critical value of the test statistic, the rejection rule for the test and your conclusion. (4 marks) 8 Answer: H0 : β1 = 0.1 H1 : β1 6= 0.1 z = β 1 β1 se(β 1) asy~ N(0, 1) under H0 reject H0 if |zcalc| > zcrit zcalc = 0.244 0.1 0.109 ≈ 3.156 zcrit,0.95 = 1.96 |zcalc| = 3.156 > 1.96, we reject the H0 at 5% significance level Hence β1 6= 0.1 at 5% significance level. (d) Test if the three instrumental variables fcigi,mcigi, and othcigi are jointly exogenous at 1% level of significance. Clearly specify the test statistic, its distribution under the null hypothesis, the critical value of the test statistic, the rejection rule for the test and your conclusion. (4 marks) Answer: H0 : fcigi, mcigi, othcigi are jointly exgenous H1 : at least one of fcigi, mcigi, othcigi is exgenous, but not all of them H = nR2 asy~ χ2(2) under H0 reject H0 if Hcalc > Hcrit Hcalc = 817 0.0093 ≈ 7.598 Hcrit,0.99 = 9.21 Hcalc < Hcrit,0.99, we fail to reject the H0 at 1% significance level. Hence the three IVs are jointly exgenous at 1% significance level. 9 (e) Test if cigi is endogenous at 10% level of significance. Clearly specify the test statistic, its distribution under the null hypothesis, the critical value of the test statistic, the rejection rule for the test and your conclusion. (4 marks) Answer: H0 : α = 0 H1 : α 6= 0 t = α se(α ) asy~ N(0, 1) under H0 reject H0 if |tcalc| > tcrit tcalc = 0.014 0.009 ≈ 1.5556 tcrit,0.10 = 1.64 |tcalc| < tcrit,0.10, we fail to reject the H0 at 10% significance level. Hence cigi is exgenous at 10% significance level. 2. Let the structural equations be y1i = β0 + β1y2i + β2x1i + ui, y2i = γ0 + γ1y1i + γ2x1i + vi, where u and v are independent. (a) Write down the reduced form equation for y2i. (3 marks) Answer: y2i = γ0 + γ1β0 + (γ1β2 + γ2)x1i + γ1ui + vi 1 γ1β1 (b) Suppose β1γ1 6= 1, β2 6= 0 and γ2 6= 0. What is the identification status of the first 10 equation (y1 equation) Explain. How about the second equation (y2 equation) Explain. (3 marks) Answer: The first equation is unidentified. The endogenous variable in the first equation is y2. There is no excluded instrument for y2 in the reduced form equation. The second equation is also unidentified. The endogenous variable in the second equation is y1. There is no excluded instrument for y1 in the reduced form equation. (c) Suppose β1 = 0, γ1 6= 0, β2 6= 0 and γ2 6= 0. What is the identification status of the first equation (y1 equation) Explain. How about the second equation (y2 equation) Explain. (3 marks) Answer: The first equation is exactly identified. Because x1 is exogenous. The second equation is exactly identified. Because x1 and y1 are both exogenous. (d) Suppose β1 = 1, γ1 6= 1, β2 6= 0 and γ2 6= 0. What is the identification status of the first equation (y1 equation) Explain. How about the second equation (y2 equation) Explain. (3 marks) Answer: The first equation is identified because we know the true coefficient on y2, there is no endogenous regressors. The second equation is unidentified. The reason is the same as (b). 3. Let yi = β0 + β1x1i + β2x2i + vi, (9) where E(v|x1, x2) = 0. However, we do not observe x2, and cov(x1, x2) 6= 0. We have a variable w outside of the equation (9). (a) Write down conditions under which the variable w is a valid instrumental variable so that you can consistently estimate β1. (3 marks) 11 Answer: w 6= x2 cov(w, x1) 6= 0 cov(w, β2x2 + v) = 0 (b) Write down conditions under which the variable w is a valid proxy variable so that you can consistently estimate β1. (3 marks) Answer: E[y|x1, x2, w] = E[y|x1, x2] In the regreesion, x2 = pi0 + pi1w + e cov(x1, e) = 0 pi1 6= 0 4. Consider a simultaneous equations model of supply and demand. You observe variables of equilibrium price, equilibrium quantity for each observation (a product). You have a variable “aggregate real income” for each observation. (a) Suppose “aggregate real income” is a demand shifter but not a supply shifter. Is either of supply and demand equation identified If yes, which one Explain. (3 marks) Answer: The supply equation will be identified. Because ”aggregate real income” is outside of the supply equation and in the reduced form equation of price. It can be used as the instrument for price in the supply equation. (b) Suppose “aggregate real income” affects both demand and supply equation. Is either of supply and demand equation identified If yes, which one Explain. (3 marks) Answer: Both of the two equations will be unidentified. There are no excluded instruments for both of them. 12 Question 3 (30 marks) 1. There is a great deal of discussion and disagreement about the most effective way to reduce crime. In particular, participants in the debate disagree about the relative importance of detection, punishment, and social and economic conditions in influencing crime rates. We use a panel data set to investigate this issue. For the purpose of this investigation, we interpret the variables lpa and lpc as measures of the probability of detection, lpp and las as measures of the severity of punishment, and lwage as a crude measure of economic conditions. Of course, there are many other unobserved factors that influence crime rates. Consider an unobserved effects model of the crime rate given by lcrimeit =β0 + β1lpait + β2lpcit + β2lpcit + β3lpolit + β4lpdenit + β5lppit + β6lasit + β7lwageit + ∑ t αtdt + hi + uit, t = 1, · · · , T where T = 3. (a) Under what condition(s) on hi and/or uit, is FE estimation more appropriate than RE estimation Explain. (3 marks) Answer: cov(Xi, hi) 6= 0 where Xi includes lpait, lpcit, lpolit, lpdenit, lppit, lasit, lwageit. When this condition is true, FE can be consistent and efficient, but RE is inconsistent. (b) Suppose RE is more appropriate than FE. That is, the condition(s) in part (a) does not hold. Is RE estimator unbiased for β Explain. How about POLS Explain. (4 marks) Answer: No, RE is biased. RE is always biased. Yes, POLS is unbiased. With this condition hold, there is no endogeneity problem in POLS. (c) Now, suppose fixed effects model is more appropriate. If you use FE estimation, can you estimate coefficients on the time dummies dt for all t Explain. (4 marks) 13 Answer: No, the maximum number of time dummies we can include in FE is T 1 (without contantly changing regressors). Because time demeaning will cause multicollinearity problem with T time dummies. Now, suppose that lpa does not change over time. (d) Can you estimate β1 with RE How about with POLS Explain. (3 marks) Answer: Yes, we can estimate β1 in RE and POLS. Without time-demeaning, we can keep time-invariant variables in RE and POLS regressions. (e) Write down the modified equation that allows you to estimate annual changes in the marginal effects of lpa from year 1 to years 2 and 3. (4 marks) Answer: lcrimeit = β0 + β1lpait + β2lpcit + β3lpolit + β4lpdenit + β5lppit + β6lasit (0.1) + β7lwageit + α2d2 + α3d3 + α4lpaitd2 + α5lpaitd3 + hi + ui (0.2) 2. State if each of the following statements is true or false, and BRIEFLY give your reasoning. (a) Suppose that strict exogeneity and cross-sectional independence hold, and Cov(Xi, hi) = 0. If E(uiu′i) = σ2uIT , then both POLS and RE are efficient. (3 marks) Answer: False. Only RE is efficient. Because the composite error term vit = hi + uit is serially correlated. (b) One can use the Hausman test to choose between the RE and the POLS estimators for 14 estimating the unobserved effects model. (3 marks) Answer: False. Under the alternative, one estimator needs to be consistent, while another is inconsistent, this cannot happen to RE and POLS. (c) The standard errors for the FE estimator and the FD estimator are exactly the same if T , the time dimension of the corresponding panel dataset, is two. (3 marks) Answer: True. When T=2, FE and FD are equivalent. (d) If the unobserved effects model includes a regressor age2, or age-squared, the coefficient on age2 cannot be estimated in first difference estimation. (3 marks) Answer: False. Becuase age2 is not time-invariant and not increasing over time by a constant amount for every individual in the sample. END OF EXAMINATION 15