STAT3008: Applied Regression Analysis 2020/21 Term 2 Mid-term Review (March 2nd 2021) Chapter 1 Introduction Terminologies: Response Variable (RV), Explanatory Variable (EV) Mean Function and Variance Function Separated Points: Outlier and Leveraged Point Scatterplot and Scatterplot Matrix Null Point: Constant mean and variance functions, no separated point Chapter 2 Simple Linear Regression OLS Estimates: Derivation Terminologies: Fitted Value, Residuals, RSS Properties of the OLS Estimates: Biasness and Variance Maximum Likelihood Estimates: Derivation and Comparison against OLS Estimates Analysis of Variance: Decomposition of Sum of Squares (SSreg, RSS, SStotal) Construction of ANOVA table ANOVA Table vs Coefficient Table Coefficient of Determination R2 Test for Betas, Confidence Interval for Fitted Value Prediction Interval for based on new Chapter 3 Multiple Linear Regression Random Vector, Model Setup OLS Estimates: Derivation, Vector/Matrix Differentiation and Trace Operation Properties of the OLS Estimates Biasness and Variance, Asymptotic Distribution of the OLS Estimates Expected Value and Variance of Random Matrix Maximum Likelihood Estimates Analysis of Variance: Decomposition of Sum of Squares, Construction of ANOVA table Coefficient of Determination R2 Test for Betas, Confidence Interval for Fitted Value Prediction Interval for based on new Questions from previous STAT3008 students 1. Is p-value computed based on one tail or two tails of the distribution 2. How to compute SSreg if there is only one regression model specified in the problem *y *x *y *x Practice Exercises Problem 1: Consider multiple linear regression with 3322110)|( xxxYE xX with sample size n = 9. The coefficient table on the left shows the OLS estimates The ANOVA table on the right tests the hypotheses: H0: 110)|( xYE xX vs H1: 3322110)|( xxxYE xX (a) Based on a T-statistic and a p-value from the Coefficient Table, construct the ANOVA table for the hypotheses H0: 22110)|( xxYE xX vs H1: 3322110)|( xxxYE xX (b) Based on the given ANOVA table and the results from part (a), construct the ANOVA table for the hypotheses H0: 110)|( xYE xX vs H1: 22110)|( xxYE xX (c) What conclusion can you draw from the ANOVA table from part (b) Problem2: Suppose x1, x2, …xn are known constants. Let y1, y2, …yn be independent random variables with mean 0 and variance 1. Let n i ix n x 1 1 and n i iy n y 1 1 . (a) Simplify n i i yy n 1 1 and 2 1 2 1 2 ynyyy n i i n i i . (b) Find n i i i y xx Var 1 SXX . Problem3: Suppose 3 2 1 , 4 2 0 , 400 021 031 βaM (a) Let βaMββ’β ‘)( f . Express β β )(f in terms of β1, β2 and β3. (b) What are the values of (β1, β2, β3) minimizing βaMββ’β ‘)( f in part (a) (c) Let βMβ 1111)( g . Express β β )(g in terms of β1, β2 and β3. Problem4: Consider a n×(p+1) matrix X. Let X’X)X(X’H 1 . (a) Simplify XHI )( , where I is an identity matrix. (b) Is Ha symmetric matrix (c) Compute tr(H). (d) Show that H3008=H. Problem 5: Which of the following is true about multiple linear regression (i) When p=1, OLS estimate SXX/SXY SXX/SXY xy β . (ii) When p=0, OLS estimate y β (iii) tr(Var(e)) =p 2, where tr(A) is the trace of a square matrix A. (a) (i) and (ii) only (b) (i) and (iii) only (c) (ii) and (iii) only (d) All of the above Problem 6: Consider a multiple linear regression with response {yi, i = 1,2, … 30} and 3 explanatory variables {(xi1, xi2, xi3), i = 1,2, … 30}. Suppose we want to use the Analysis of Variance (ANOVA) to test the two models below: E(Y|X)= β0+ β1×1+ β3×3 vs E(Y|X)= β0+ β1×1+β2×2+β3×3 What are the degrees of freedom of the corresponding F-statistic (a) 1 and 26 (b) 1 and 27 (c) 2 and 26 (d) 2 and 27 SOLUTIONS/QUICK ANSWERS to the Practice Exercises – You need to show your work in full details like problems 1 below during the mid-term exam. Problem 1: (a) T-statistic= 2.636 and p-value =0.0462 from the coefficient table allows us to test the hypotheses:H0: 22110)|( xxYE xX vs H1: 3322110)|( xxxYE xX The corresponding ANOVA table is therefore given by: (b) For the hypotheses H0: 110)|( xYE xX vs H1: 22110)|( xxYE xX Note that RSS1=55,423 (from the given ANOVA table), and RSS2=48,762 (ANOVA table in part (a)). The corresponding ANOVA table is therefore given by: (c) Since p-value =0.40> 0.05. We do not reject H0 at α=0.05. We do not have sufficient evidence to reject the model mean function 110)|( xYE xX at α=0.05. Problem 2: (a) 0 and 0 (b) 1/SXX Problem 3: (a) 48 244 42 )( 3 21 21 β βf , (b) (β1, β2, β3) = (1, -0.5, 0.5) (c) 4/100 011 032 1 M , 4/1 2 1 )( β βg Problem 4: (a) 0n×(p+1) (b) H is symmetric since HX’XX’X’XXX’X’X’XX’XH’ 111 )()()()'(‘)( (c) p+1 (d) Note that HX’XX’XX’XX’XX’XX’XH2 111 )()()( . Hence, H3008 = H3006(H2)= H3006(H) (since H2=H) =H3007=H3006 =… =H based on similar argument. Problem 5: (a) Problem 6: (a)