线性回归-STAT3008

Page 1/4 STAT3008: Applied Regression Analysis 2019/20 Term 2 PAST Mid-Term Examination and quick answers Date: 7th April 2020 (Tuesday) Time: 9:30am – 12:15pm (165 minutes) Total Score: 100 points Please present your answers in 4 significant figures. Submission Requirement: (1) Name and SID on the 1st page of your work, (2) Only a single file in .pdf or .doc* format (size < 10MB) will be accepted (3) Filename in the format of “LAST NAME First Name – SID.pdf/doc*” How to submit your exam work A dropbox button is now available on Blackboard. Problem 1 [27 points]: Suppose the following regression model is fitted to a data set with observations {(xi1, xi2, yi), i = 1, 2, …, n}: ),0(~ , 22211 Neexxy iid iiiii Assume that n i ii xx 1 21 0 . (a) [8 points] Derive the OLS estimates 1 and 2 . (b) [6 points] Setup the log-likelihood function ),,( 221 l . (c) [4 points] Do you expect the MLE 1 ~ and 2 ~ to be the same as their corresponding OLS estimates 1 and 2 in part (a) Explain. (No computation required) (d) [5 points] Is 1 an unbiased estimator for β1 Verify. (e) [4 points] Does the point iiiiii yx n yx n x n x n yxyxxxyxx 21 2 2 2 121 2 2 2 121 11 , 1 , 1 , ,),,( pass through the regression line based on the OLS estimates Verify. Problem 2 [16 points]: Consider multiple linear regression 1n11)(p1)(pn1n eβXY with 1)( nE 0e and nIeVar 2)( . Let ')'( 1XXXXA and ')'( 1XXXXIB n . (a) [4 points] Prove or disprove the following: AABA . (b) [4 points] Prove or disprove the following: 75 BIA n . (c) [8 points] Simplify the following in terms of 2, n and p: YXXXXe' ')'( 1 E . Page 2/4 Problem 3 [24 points]: A simple linear regression is fitted to the data {(x1, y1), … (x48, y48)}, with 2 10 )|(Var ,)|( xXYxxXYE The coefficient table and ANOVA table below shows some of the regression results: It’s known that R2 = 15%. (a) [16 points] Replicate the two tables above and fill in ALL the missing values (in 4 significant figures). (b) [8 points] Based on the results in part (a), test the hypotheses on whether β0 is greater than -12.0 at α=0.05. You should setup the 4 steps of hypothesis testing as on Ch2 page 64. Note: R functions like “pf”, “pt”, “qf” and “qt” could be useful in this problem. Problem 4 [19 points]: Consider multiple linear regression with 3 explanatory variables (EVs) x1, x2 and x3. Two hypothesis testing was performed on models with selected EVs, and the results were summarized by the two ANOVA tables below: H0: 0)|( xXYE vs H1: 22110)|( xxYE xX H0: )|( 33110 xxYE xX vs H1: 3322110)|( xxxYE xX It’s known that the sample correlation between y and each of the xi are 91.118%, -44.260% and 99.556% respectively. That is, %556.99),( and %260.44),( %,118.91),( 321 xyxyxy (a) [11 points] Replicate the table below, and fill in ALL the missing values (in 4 significant figures). (df and RSS of Model 7: 33220)|( xxYE xX have already been included in the table) (b) [4 points] Do you think multicollinearity exists in Model 8: 3322110 )|( xxxYE xX Explain. (c) [4 points] Do you think the sample correlation between x1 and x2 (i.e. ),( 21 xx ) is close to 0 Explain. Page 3/4 Problem 5 [14 points]: Suppose we are interested in explaining the sale price of a house by 4 variables relating to its size and age (grey columns below). The table below shows the data of the first 6 houses in the data set: A multiple linear regression was fitted into y = ln(SalePrice) based on the 4 EVs. The table below shows the parameter estimates: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.947e-01 4.175e-01 2.143 0.0323 * Year 5.231e-03 2.151e-04 24.323 < 2e-16 *** FirstFloor 3.378e-05 3.102e-05 1.089 0.2764 Basement -2.274e-04 2.948e-05 -7.714 2.6e-14 *** Total 3.954e-04 1.446e-05 27.335 < 2e-16 *** Residual standard error: 0.212 on 1169 degrees of freedom Multiple R-squared: 0.7353, Adjusted R-squared: 0.7344 F-statistic: 811.8 on 4 and 1169 DF, p-value: < 2.2e-16 Note that most of the parameter estimates are intuitive. For example, Year = 0.005231 > 0 is consistent with the fact that a newer house (larger Year) is supposed to be sold at a higher price. (a) [12 points] Based on the parameter estimates above, comment on whether each of the following are consistent with your intuition: (I) Basement = -0.0002274 < 0 (II) Total = 0.0003954 > FirstFloor = 0.00003378 > 0 (b) [2 points] What is the sample size n of the data set – End of the Exam – Page 4/4 Quick Answers In the actual exam, you are required to show all the details of your work instead of just the final answers below. Problem 1: (a) n 1 2 2 n 1 2 2n 1 2 1 n 1 1 1 , i i i ii i i i ii x yx x yx (b) n i iii n i xxy n l 1 2 22112 1 22 21 2 1 )2ln( 2 ),,( (c) Yes. Explain. (d) Yes. Verify. (e) Yes. Verify. Problem 2:(a) ABA≠A (b) A5 = … = In -B 7 (c) 2)1( p Problem 3: (a) (b) (Step 1) H0: β0 = -12.0 vs H1: β0 > -12.0 (Step 2) t0 = (-9.9081-(-12))/5.3871 = 0.3883 (Step 3) Since p-value = Pr(t46 > t0) = 0.3498 > 0.05, we do not reject H0 at α = 0.05. (Step 4) We do not have sufficient evidence that β0 is greater than -12.0. Problem 4: (a) (b) Yes. Explain… (material from Ch4, not in the upcoming midterm) (c) Yes. Explain… (material from Ch4, not in the upcoming midterm) Problem 5: (a) From Ch4 (not in the upcoming midterm) (b) n = 1174