STAT5002: Final Exam Q1: a – d Q2: a – b Q3: a – d Q4: a – c Q5: a – d Q6: a – f Topics: Inferences about population proportion such as estimation (point and interval) and hypothesis testing np ≥ 5 and nq ≥ 5 => the sample size is large enough to invoke the CLT. By the CLT, p is approximately normal with E( ) = p, se( ) = √ pq n Estimate p using interval estimation method: p ± Zα/2√ p q n One-sample Z test about p: Η0: P = P0 against H1: P (>, <, ≠) P0 Ζ = p p √ pq n ≈ N(0, 1) under H0 Chi-square test of independence (how to conduct the Chi-square and its assumptions) Paired difference t test Sign test (H0: p+ = 0.5 versus H1: p+ (>, < , ≠) 0.5 X = number of + signs ~ Bin(n, p+) x = observed number of + signs Need to calculate p-value H1: p+ < 0.5, p-value = P(X ≤ x) H1: P+ > 0.5, p-value = P(X ≥ x) H1: P+ ≠ 0.5, p-value = 2P(X ≥ x) for x > n 2 , p-value = 2P(X ≤ x) for x < n 2 Test for proportion: Η0: P = P0 against H1: P < P0 X = number of success ~ Bin(n, P0) p-value = P(X ≤ x) Η0: P = P0 against H1: P > P0 X = number of success ~ Bin(n, P0) p-value = P(X ≥ x) Η0: P = P0 against H1: P ≠ P0 X = number of success ~ Bin(n, P0) p-value = P(|X – nP0| ≥ |x – nP0|) If P0 = 0.5, p-value = 2P(X ≥ x) for x > n 2 p-value = 2P(X ≤ x) for x < n 2 Four graphs: identify 3 issues and give an remedy Logistic regression (issues with linear probability model) ln(odds) => exp{ln(odds)} => odds = P 1 P => P = odds 1+odds Log-linear regression: ln(Y) = β0 + β1X1 + β2X2 + … + βkXk + ε Y = exp(β0 + β1X1 + β2X2 + … + βkXk + ε) Y = exp(β0 + β1X1 + β2X2 + … + βkXk)exp(ε) E(Y|X) = exp(β0 + β1X1 + β2X2 + … + βkXk)E[exp(ε)|X] If ε ~ N(0, σ2), then exp(ε) ~ log-normal distribution Hence, E[exp(ε)] = exp( σ2 2 ) E(Y|X) = exp(β0 + β1X1 + β2X2 + … + βkXk)exp( σ2 2 ) σ 2 = MSE = RSS n p Example: ln(y) = 1.592 + 0.044X, RSS = 0.079374, n = 50 If X = 10, Y = σ 2 = MSE = RSS n p = 0.079374 50 2 = 0.001653625 ln(y) = 1.592 + 0.044(10) = 2.032 = exp(2.032)exp( σ 2 2 ) = exp(2.032)exp( 0.001653625 2 ) = 7.6356