R-MATH1312-Assignment 1

MATH1312 Week 6 – Indicator Variables Assignment 1 Feedback Generally you did very well – I am very happy about it Mistakes on your assignment: Wrong calculations Include what you are asked. Input comments and conclusions. Use the correct values from tables Make a test to make conclusions Assignment 1 Feedback Mistakes on your assignment: Analysis of the residuals to test the assumptions behind the regression method (plots and tests ). Provided the R codes Provided the outputs State your results or comments Justify what you are saying For each βj Remembering aka standard error (jth diagonal element) And Confidence Intervals – Coefficients jjpnj Ct 2 ,2 jjC 2 ResMS= 2 Confidence Intervals – Mean Response Expected value of y given a particular set of values for x01,x02,…,x0k Variation in this CI length can be useful in estimating precision of model. More distinct further from centroid βx0 0 =y ( )0xyE ( ) ( ) 00 xx 12 0 = XXyVar ( ) 00 xx 12 ,20 XXty pn Bonferroni method In general joint CI Bonferroni Simple multiple correction ( )jj se ( ) ( )jpnpj set ,2 Indicator Variables Qualitative (categorical) variable used in regression Gender Status No scale of measurement Must set a level – the indicator variable Also known as “Dummy variable” Indicator Variable – Example Effective life of a cutting tool (y) used on a lathe to the lathe speed in revolutions per minute (x1) and type of cutting tool used. Tool type is qualitative and can be represented as = ToolB ToolA x 1 0 2 +++= 22110 xxy For Tool type A this model becomes: For Tool type B this model becomes: Changing from A to B induces a change in the intercept (slope is unchanged and identical). We assume that the variance is equal for all levels of the qualitative variable. + + = 110 xy + + + = + + + = 1120 2110 )( xy xy Example Linear Regression Analysis 5E Montgomery, Peck & Vining The model to be fit is where x2 = 0 indicates Tool type A, if x2 = 1 then Tool type B is used. The least squares fit is For qualitative variables with a levels, we would need a-1 indicator variables. For example, say there were three tool types, A, B, and C. Then two indicator variables (called x2 and x3) will be needed: Multiple levels Benefit of Using Indictator variables Possible to model the same situation with two separate regressions Combining using indicator: Simplifies – only one equation Assuming same slope – more observations to estimate β1 One variance estimate – more degrees of freedom When Slope is expected to differ Indicator variables can still be used, but model must now incorporate an additional term Eg for tool type A For tool type B + + + + = 21322110 xxxxy ++++= ++++= 13120 132110 )()( )1()1( x xxy ++= 110 xy ++++= )0()0( 132110 xxy Β2 – change in intercept from A to B Β3 – change in slope from A to B Testing Effectively, there are now two separate regression equations Extra sum of squares will test hypotheses H0: 2 = 3 = 0 Ha: 2 0 and/or 3 0 ( ) ( ) ( ) 107.1141 005.293112.1434 ,,,, 0103210132 = = = RRR SSSSSS ( ) 75.64 811.8 2107.1141 2,, 0132 0 = = = Res R MS SS F ( ) 82.1 811.8 078.16 1,, 0123 0 = = = Res R MS SS F Are they the same Is the slope the same More Than Two Levels An electric utility is investigating the effect of the size of a single- family house and the type of air conditioning used in the house on the total electricity consumption during warm weather months. More Than Two Levels We would expect the mean electricity consumption to increase with the size of the house, but the rate of increase should be different for a central air conditioning system (more efficient) than window units for larger houses. There should be an interaction between the size of the house and the type of air conditioning system. More Than Two Levels The four regression models corresponding to the four types of air conditioning systems are as follows: Using Indicator Variables When more than two levels, use additional dummy variables Using 1,2,3,4 will define a set increase for each level Unrealistic (qualitative so no scale) R2 is improved with dummy over allocated codes Can be used to classify quantitative variables Analysis of Variance ANOVA can be considered as a regression where all regressors are indicators Model for ANOVA Where μ is the grand mean j is each observation i corresponds to each factor level τ is the effect of each treatment With k=3 treatments Need k-1 indicator variables Lab Questions – Week 6 Reading Chapter 8 of Montgomery et al. “Linear Regression”