r-STAT 431-Assignment 2

STAT 431: Introduction to Biostatistics
Winter 2022, Hua Shen
Assignment 2, two sets of questions, 70 points
Open: 2:00PM on Tuesday, March 1, 2022
Due: 11:59PM on Friday, March 18, 2022
Late assignments will be penalized by 50% per day up to 100%.
Well-presented assignments will receive 1% bonus.
Note:
Please clearly print your name and UCID on your assignment.
This is an open-book assignment, both calculators and statistical software such as R can be used.
If you are using software to assist, please include the code, output and your own interpretations
of and conclusions based on the output right after the question. Please do not include all code
at the end as appendix because it creates extra difficulty for markers.
For numerical answers, please round off to three decimal digits when needed.
Question 1: RMS Titanic
The file “data-titanic.txt” (available on D2L) contains the survival data for passengers on the
RMS Titanic including the sex and survival status of the passengers in addition to the ticket classes.
The column names are as follows
ID: identification number of the passenger
class: passenger’s ticket classes on Titanic,
determined by both the price of their ticket and the wealth and social class
1: first class; 2: second class; 3: third class
death: survival status, 1: if dead; 0: alive
gender: gender information of the passenger: female vs male
1. (6 points) Please create
(a) (3 points) a two-way table for passengers on the Titanic based on the gender and survival
status of the passengers, and
(b) (3 points) three two-way tables for passengers on the Titanic based on the gender and
survival status of the passengers, stratified by the three ticket classes.
2. (8 points) Please use the Mantel-Haenszel procedure to estimate the Relative Risk of dying
associated with sex, controlling for the potential confounding effects of ticket class. Please
provide an appropriate 95% confidence interval to supplement your point estimate and interpret
the results. Note that here we assume there is no interaction between sex and ticket class in
terms of RR.
1
Note: As exercise, you may repeat this question for odds ratio and excess risk though they are
not required in the assignment.
3. (8 points) Please use the Woolf method to estimate the Relative Risk of dying associated with
sex, controlling for the potential confounding effects of ticket class. Please provide an appropriate
95% confidence interval to supplement your point estimate. Again here we assume there is no
interaction between sex and ticket class in terms of RR.
Note: As exercise, you may repeat this question for odds ratio and excess risk though they are
not required in the assignment.
4. (8 points) Assuming there is no important interaction between sex and ticket class, please use the
Cochran-Mantel-Haenszel (CMH) method to test the association of death and sex, controlling
for the potential confounding effects of ticket class, with significance level α = 0.05. Please
follow the typical steps of hypothesis testing: state the hypotheses, calculate the test statistic,
find the p-value or rejection region, make conclusion/statement about the research question.
5. (8 points) Now we formally test the assumption of no interaction between sex and and ticket
class across strata. Please use the Woolf method to test whether there is interaction effect of
sex and ticket class on dying (that is, whether the effect of sex on dying is modified by ticket
class), with significance level α = 0.05, using Relative Risk as the effect measure.
Note: As exercise, you may
use the odds ratio as the measure of association of interest, and test the consistency across
strata, i.e., absence of multiplicative interaction,
use the excess risk as the measure of association of interest, and test the consistency across
strata, i.e., absence of additive interaction,
though they are not required in the assignment.
6. (8 points) Please use the data in the file to construct a pooled 2×3 table, with death as D, and
ticket class as E. Please use the overall χ2 test to test whether D and E are independent with
significance level α = 0.05, please interpret your finding.
7. (8 points) Now consider the ticket class as an ordered variable, and assign x = 0 for the third
class, x = 1 for the second class, and x = 2 for the first class. Please perform the trend test on
the pooled 2× 3 table with significance level α = 0.05 and interpret your finding.
8. (8 points) Please use the goodness-of-fit test to test whether a linear line explains the trend of
the data, with significance level α = 0.05 and interpret your finding.
2
Question 2: Association between D and E
(8 points) When investigating the association between disease and exposure, we may run into the
scenario that there are several different levels of exposure. Ignoring the fact that these levels may
be ordered, we consider the exposure variable E that has K natural levels, labeled for convenience
by 1, 2, · · · , K. Taking notations introduced in the lectures, the test statistic applicable for data
generated by either a population-based, cohort, or case-control design is
χ2overall =
n2
nDnDˉ
K∑
k=1
(
ak nDmkn
)2
mk
which follow χ2K 1 under the null hypothesis that D and E are independent. Please show that when
K = 2 this is equivalent to the one we have for the 2× 2 table
n(ad bc)2
(a+ b)(c+ d)(a+ c)(b+ d)
,
where a, b, c and d are the typical notation we have been using.
3