统计-ECON 113 - Father Essays

ECON 113 Fall 2021Problem Set #4: Statistical Inference in Multiple RegressionNote: Questions #1 and #2 both use the dataset rental.dta. Question #3 uses houseprices.dta.(#1) (Data exercise). You should use Stata for parts (j) and (k) of this problem.Are rental rates influenced by the student population in a college town Let rent be the average monthlyrent paid on rental units in a college town in the United States. Let pop denote the total city population,avginc the average city income, and pctstu the student population as a percentage of the totalpopulation.One model to test for a relationship isln(rent) = β0 + β1ln(pop) + β2ln(avginc) + β3pctstu + ua) State the null hypothesis that size of the student body relative to the population has no ceterisparibus effect on monthly rents. State the alternative that there is an effect.b) What signs do you expect for β1 and β2 The equation estimated using 1990 data from rental.dta for 64 college towns is = .043 + .066 lnpop + .507 lnavginc + .0056 pctstu(.844) (.039) (.081) (.0017)n=64, R2=.458Here are the summary statistics and correlation matrix for the four variables:Variable | Obs Mean Std. Dev. Min Max————-+——————————————————–lnrent | 64 6.026034 .200436 5.717028 6.829794lnpop | 64 11.16897 .6245325 10.16119 13.35808lnavginc | 64 10.04073 .2556954 9.133676 10.93857pctstu | 64 27.84786 13.61892 11.45658 71.20982| lnrent lnpop lnavginc pctstu————-+————————————lnrent | 1.0000lnpop | 0.2195 1.0000lnavginc | 0.6029 0.3692 1.0000pctstu | 0.0598 -0.5869 -0.3127 1.0000c) Interpret the slope coefficient of 0.0056 on pctstu. Be careful when stating the units.(#1) cont’d.d) If lnavginc were omitted from the regression, what do you think would happen to the coefficient onpctstu Explain briefly.e) Using the estimates from above and one of the following critical values, test the hypothesis statedin part (a) at the 5% level. Explain why you chose the critical value that you did. [Hint: you willneed to calculate the t-stat for 3.]Stata note: the expression invt(df,p) gives the value t, of a t-distributed random variable with dfdegrees of freedom, for which Pr(one tail of a t-distribution with df degrees of freedom.. display invt(60,.005)-2.660283. display invt(60,.01)-2.3901195. display invt(60,.025)-2.0002978. display invt(60,.05)-1.6706489f) Now focus on the slope coefficient on lnpop (β1). Construct a (two-sided) 90% confidence intervalfor β1. (Continue to use the estimates from part (b) and choose the appropriate critical values frompart (e).) Write a statement that explains this 90% confidence interval.g) Construct a (two-sided) 95% confidence interval for β1. (Again, choose the appropriate criticalvalue from part (e)).h) Based on your answers to (f) and (g), what can you conclude about the (2-sided) p-value for 1:A. p > .10B. p = .10C. p = .05D. .05 < p < .10E. p < .05(#1) cont’d.i) Use the output in (b) to calculate the t-statistic for 1. It is: __________.Now use the output from the Stata command:. display ttail(60,1.70).04715491Stata note: the expression ttail(df,t) gives the probability p=Pr(>t) for a t-distributedrandom variable with df degrees of freedom. So, it gives us the probability in one tail of a t-distribution with df degrees of freedom when the value is t.Use this information (including knowledge of the t-statistic you calculated) to obtain the (two-sided)p-value associated with 1. Your answer here should be consistent with your answer to part (h).j) Now estimate the above regression yourself, and where possible, use the output to verify youranswers thus far. Some Stata notes: The data set rental.dta contains data from two years, 1980 and 1990. The variable “year”takes on of two values (80 or 90). To estimate the regression for the 1990 data only, type“if” after the last variable in the regression statement and then the expression “year==90”.Note that there is no comma before “if”:regress y x1 x2 x3 if year==90 You should run the command once with the default significance level of 5% (and 95%confidence intervals) and then again with the option (typed after a comma):, level(90)to get output corresponding to the 10% significance level and 90% confidence intervals.k) Based on your regression output from part (j), which of the three variables (lnpop, lnavginc,pctstu) is statistically significant at:a. a 10% level b. a 5% level c. a 1% level d. a 0.1 % level (#2) (Data exercise). This problem also uses the data set rental.dta. In question #1, you estimated thefollowing model for rental rates in college towns, using all 64 observations:log () = .043 + .066 log(pop) + .507 log(avginc) + .0056 pctstu(.844) (.039) (.081) (.0017)n=64, R2=.458Now suppose that you accidently deleted a few observations from the data for year 1990 before runningthe regression. Let’s assume that the observations were deleted at random, so the sample you have leftis still a random sample.a) Do you think your estimates for 1 would change Why or why not If so, can you predict whether 1 will get bigger or smaller b) Explain why you might expect the t-stat for 1 to get smaller.c) Is it possible that the t-stat will actually get bigger Explain.Now, try estimating the regression after dropping 4 observations. Do this as follows.First, “keep” only the observations corresponding to year 1990 by typing:keep if year==90(type describe to verify that you now have a data set with only 64 observations, and tab year toverify that year is equal to 90 for all of them.)First, re-estimate the regression using all 64 observations (in #1, you did this by adding “if year==90” tothe end of the command. Now, you no longer need this “if” condition).Now, to estimate the regression using only the first 60 observations (i.e, ignoring the last 4), typeregress y x … in 160Try it again using only the last 60 observations:regress y x … in 564d) What happened to 1 and to the t-statistic in each case (#3) (Data exercise). Use the data on housing prices (houseprices.dta) to estimate a simple linearregression of price on number of bedrooms (bdrms).a) Is the coefficient on bdrms statistically significant at a: 10% significance level YES / NO 5% level YES / NO 1% level YES / NOb) Now control for both the size of the house and the size of the lot in your regression. Conditional on thesize of the house and the size of the lot, is the predicted effect of an additional bedroom on the sale priceof a house significant at the: 10% significance level YES / NO 5% level YES / NO 1% level YES / NOc) A realtor tells you that you should expect to pay an extra $150 on average ($.15K) for each additionalsquare foot in this housing market, holding constant the size of the lot and the number of bedrooms.Suppose that you decide you will trust this realtor’s advice unless you are 95% confident that thestatement is wrong based on your own regression analysis. Do you trust the realtor [Hint: test the nullhypothesis that the statement is true.]d) The p-value of .128 on bdrms in your estimated multiple regression model implies that… … an additional bedroom does not have a statistically significant effect on the home price (using conventional thresholds for significance) once we control for the size of the house and the lot. TRUE/ FALSE … an additional bedroom does not have an economically meaningful effect on the home price (once we control for the size of the house and the lot). TRUE/ FALSE … the true coefficient on bdrms in the price regression is zero (once we control for the size of the house and the lot). TRUE/ FALSE e) Suppose you were able to collect additional data on housing prices and quadruple the size of yourrandom sample (i.e., you now have 88 × 4 = 352 observations). If you re-estimate the multiple regression model using the new sample, would you expect the new coefficient on bdrms to be statistically significant at the 5% level Explain.one tail of a t-distribution with df degrees of freedom. . display invt(60,.005) -2.660283 . display invt(60,.01) -2.3901195 . display invt(60,.025) -2.0002978 . display invt(60,.05) -1.6706489 f) Now focus on the slope coefficient on lnpop (β1). Construct a (two-sided) 90% confidence interval for β1. (Continue to use the estimates from part (b) and choose the appropriate critical values from part (e).) Write a statement that explains this 90% confidence interval. g) Construct a (two-sided) 95% confidence interval for β1. (Again, choose the appropriate critical value from part (e)). h) Based on your answers to (f) and (g), what can you conclude about the (2-sided) p-value for 1: A. p > .10 B. p = .10 C. p = .05 D. .05 < p < .10 E. p < .05 (#1) cont’d. i) Use the output in (b) to calculate the t-statistic for 1. It is: __________. Now use the output from the Stata command: . display ttail(60,1.70) .04715491 Stata note: the expression ttail(df,t) gives the probability p=Pr(>t) for a t-distributed random variable with df degrees of freedom. So, it gives us the probability in one tail of a t- distribution with df degrees of freedom when the value is t. Use this information (including knowledge of the t-statistic you calculated) to obtain the (two-sided) p-value associated with 1. Your answer here should be consistent with your answer to part (h). j) Now estimate the above regression yourself, and where possible, use the output to verify your answers thus far. Some Stata notes: The data set rental.dta contains data from two years, 1980 and 1990. The variable “year” takes on of two values (80 or 90). To estimate the regression for the 1990 data only, type “if” after the last variable in the regression statement and then the expression “year==90”. Note that there is no comma before “if”: regress y x1 x2 x3 if year==90 You should run the command once with the default significance level of 5% (and 95% confidence intervals) and then again with the option (typed after a comma): , level(90) to get output corresponding to the 10% significance level and 90% confidence intervals. k) Based on your regression output from part (j), which of the three variables (lnpop, lnavginc, pctstu) is statistically significant at: a. a 10% level b. a 5% level c. a 1% level d. a 0.1 % level (#2) (Data exercise). This problem also uses the data set rental.dta. In question #1, you estimated the following model for rental rates in college towns, using all 64 observations: log () = .043 + .066 log(pop) + .507 log(avginc) + .0056 pctstu (.844) (.039) (.081) (.0017) n=64, R2=.458 Now suppose that you accidently deleted a few observations from the data for year 1990 before running the regression. Let’s assume that the observations were deleted at random, so the sample you have left is still a random sample. a) Do you think your estimates for 1 would change Why or why not If so, can you predict whether 1 will get bigger or smaller b) Explain why you might expect the t-stat for 1 to get smaller. c) Is it possible that the t-stat will actually get bigger Explain. Now, try estimating the regression after dropping 4 observations. Do this as follows. First, “keep” only the observations corresponding to year 1990 by typing: keep if year==90 (type describe to verify that you now have a data set with only 64 observations, and tab year to verify that year is equal to 90 for all of them.) First, re-estimate the regression using all 64 observations (in #1, you did this by adding “if year==90” to the end of the command. Now, you no longer need this “if” condition). Now, to estimate the regression using only the first 60 observations (i.e, ignoring the last 4), type regress y x … in 160 Try it again using only the last 60 observations: regress y x … in 564 d) What happened to 1 and to the t-statistic in each case (#3) (Data exercise). Use the data on housing prices (houseprices.dta) to estimate a simple linear regression of price on number of bedrooms (bdrms). a) Is the coefficient on bdrms statistically significant at a: 10% significance level YES / NO 5% level YES / NO 1% level YES / NO b) Now control for both the size of the house and the size of the lot in your regression. Conditional on the size of the house and the size of the lot, is the predicted effect of an additional bedroom on the sale price of a house significant at the: 10% significance level YES / NO 5% level YES / NO 1% level YES / NO c) A realtor tells you that you should expect to pay an extra $150 on average ($.15K) for each additional square foot in this housing market, holding constant the size of the lot and the number of bedrooms. Suppose that you decide you will trust this realtor’s advice unless you are 95% confident that the statement is wrong based on your own regression analysis. Do you trust the realtor [Hint: test the null hypothesis that the statement is true.] d) The p-value of .128 on bdrms in your estimated multiple regression model implies that… … an additional bedroom does not have a statistically significant effect on the home price (using conventional thresholds for significance) once we control for the size of the house and the lot. TRUE/ FALSE … an additional bedroom does not have an economically meaningful effect on the home price (once we control for the size of the house and the lot). TRUE/ FALSE … the true coefficient on bdrms in the price regression is zero (once we control for the size of the house and the lot). TRUE/ FALSE e) Suppose you were able to collect additional data on housing prices and quadruple the size of your random sample (i.e., you now have 88 × 4 = 352 observations). If you re-estimate the multiple regression model using the new sample, would you expect the new coefficient on bdrms to be statistically significant at the 5% level Explain.