考试-STATS 1000 /

STATS 1000 / STATS 1004 / STATS 1504 Statistical Practice 1 Lecture notes Week 7 Stephen Crotty School of Mathematical Sciences, University of Adelaide Semester 1 2022 Two-sample t-test Two-sample problems What if we want to compare the mean of some quantitative variable for the individuals in two populations, Population 1 and Population 2 Population Parameter Statistic Sample Size 1 μ1 xˉ1 n1 2 μ2 xˉ2 n2 We are interested in μ1 μ2. Sampling distribution of Xˉ1 Xˉ2 I What is the mean of Xˉ1 Xˉ2 I What is the standard error of Xˉ1 Xˉ2 Example Rats and ozone To measure the effects of ozone on weight, one group of 70-day-old rats was kept in an environment containing ozone for 7 days. A second group of rats of the same age (the control group) was kept in an ozone-free environment for the same time. The weight gains (to the nearest gram) were as follows. Boxplots Compare the distributions Control Ozone 10 0 10 20 30 40 50 group w e ig ht Null and alternative hypotheses Write down appropriate null and alternative hypotheses to test if the population means for each group are the same. Calculate the value of the test statistic If we have two-samples and we are interested in comparing the populations means, then we use T = Xˉ1 Xˉ2√ S21 n1 + S22 n2 . Example group mean SD n Control 22.35 10.79 23 Ozone 11.00 19.09 22 I Calculate the value of t for the rats data. The degrees of freedom for the two-sample t-test I By hand, calculate using min(n1 1, n2 1). I R calculates a more accurate version. Calculate, by hand, the degrees of freedom for the rats dataset. What is the P-value for the rats data Do you reject or retain the null hypothesis at the 5% significance level Confidence interval for two-sample t-test The formula for calculating the C% confidence interval is (Xˉ1 Xˉ2)± t √ S21 n1 + S 2 2 n2 where t is the appropriate critical value to give a confidence level of C. Note t is not the same as the test statistic that you calculated in the hypothesis test. What is the 95% confidence interval for the mean difference in the weight gain for the ozone compared to the control group Two-sample t-test in R t.test(weight~group, data=rats) ## ## Welch Two Sample t-test ## ## data: weight by group ## t = 2.4403, df = 32.877, p-value = 0.02023 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 1.885639 20.810013 ## sample estimates: ## mean in group Control mean in group Ozone ## 22.34783 11.00000 Check the assumptions Normality 0 20 40 2 1 0 1 2 Theoretical Quantiles Sa m pl e Qu an tile s 20 0 20 40 2 1 0 1 2 Theoretical Quantiles Sa m pl e Qu an tile s Check the assumptions Independence I Within each group (i.e., random sampling). I Between each group (i.e., random allocation). Summary Hypothesis testing for two means from independent normal distributions I Hypotheses: H0 : μ1 μ2 = 0, Ha : μ1 μ2 6= 0. I Test statistic: T = Xˉ1 Xˉ2√ S21 n1 + S22 n2 I P-value: t-distribution with min(n1 1, n2 1) degrees of freedom, or look at the sig(2-tailed) in the R output. I Confidence interval: (Xˉ1 Xˉ2)± t √ S21 n1 + S 2 2 n2 . Example Wood dataset high low 5 10 15 20 25 Preservative Lo ss R output t.test(Loss~Preservative, data=wood) ## ## Welch Two Sample t-test ## ## data: Loss by Preservative ## t = -7.5472, df = 30.269, p-value = 1.935e-08 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -13.04549 -7.49051 ## sample estimates: ## mean in group high mean in group low ## 6.016 16.284 QQ-plots 3 6 9 2 1 0 1 2 Theoretical Quantiles Sa m pl e Qu an tile s 10 20 30 2 1 0 1 2 Theoretical Quantiles Sa m pl e Qu an tile s Matched Pairs t-test Matched pairs So far in the two-sample t-tests, we have assumed the two groups are independent. Consider the following experiment Each student in SP1 tests how long they can keep their hands in ice water before the cold becomes too painful. Once while swearing, once while not swearing. Matched pairs In the matched pairs experiment, illustrated in the swearing example, we are still comparing two treatments, but now each subject receives both treatments. Therefore, the two groups are no longer independent. Model In the matched pairs design, each subject has two measurements, and we call these X and Y . To convert this into a problem we can already do, we look at the differences D = X Y So for each subject, we have one difference. Then we can test if the mean of D is different from 0, i.e., H0 : μD = 0 vs Ha : μD 6= 0 We can do this with a one-sample t-test. Example moon data To assess if the moon phase has an effect on dementia, 15 patients had the average number of disruptive events measured on moon days (three days either side of full moon) and the other days. Moon data patient moon other 1 3.33 0.27 2 3.67 0.59 3 2.67 0.32 4 3.33 0.19 5 3.33 1.26 6 3.67 0.11 7 4.67 0.30 8 2.67 0.40 9 6.00 1.59 10 4.33 0.60 11 3.33 0.65 12 0.67 0.69 13 1.33 1.26 14 0.33 0.23 15 2.00 0.38 Moon data 0 2 4 6 moon other day N um be r o f i nc id en ts Null and alternative hypotheses Write down appropriate null and alternative hypotheses to test if there is a difference in the moon data. Moon data patient moon other D 1 3.33 0.27 3.06 2 3.67 0.59 3.08 3 2.67 0.32 2.35 4 3.33 0.19 3.14 5 3.33 1.26 2.07 6 3.67 0.11 3.56 7 4.67 0.30 4.37 8 2.67 0.40 2.27 9 6.00 1.59 4.41 10 4.33 0.60 3.73 11 3.33 0.65 2.68 12 0.67 0.69 -0.02 13 1.33 1.26 0.07 14 0.33 0.23 0.10 15 2.00 0.38 1.62 Moon data The summary statistics of the differences are mean SD n 2.43 1.46 15 Calculate the value of the test statistic If we have matched pairs data and we are interested in testing if there was a difference for one treatment compared to the other, then we use T = DˉSD/ √n . Calculate the P-value This is done using a t-distribution with n 1 degrees of freedoms, where n is the number of subjects. Confidence interval for the matched-pairs t-test The formula for calculating the C% confidence interval is Dˉ ± t sD√n where t is the appropriate critical value to give a confidence level of C . Calculate the 95% confidence interval for the mean of the differences. R output for moon data t.test(moon$D) ## ## One Sample t-test ## ## data: moon$D ## t = 6.4518, df = 14, p-value = 1.518e-05 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 1.623968 3.241365 ## sample estimates: ## mean of x ## 2.432667 Check the assumptions Normality 0 1 2 3 4 5 2 1 0 1 2 Theoretical Quantiles Sa m pl e Qu an tile s Check the assumptions Independence Would the behaviour of one patient affect the behaviour of another patient Summary Hypothesis testing for mean for matched pairs data I Hypotheses: H0 : μD = 0, Ha : μD 6= 0. I Test statistic: T = Dˉ 0SD/ √n I P-value: Calculate using a t-distribution with n 1 degrees of freedom. I Confidence interval Dˉ ± t SD√n .