金融|STAT7055 Introductory Statistics for Business and Finance

Research School of Finance, Actuarial Studies and Statistics
PAST FINAL EXAMINATION 3
STAT7055 Introductory Statistics for Business and Finance
Writing Time: 180 minutes
Reading Time: 15 minutes
Exam Conditions:
Central examination.
Students must return the examination paper at the end of the examination.
This examination paper is not available to the ANU Library archives.
Materials Permitted in the Exam Venue:
(No electronic aids are permitted, e.g., laptops, phones).
Calculator (non-programmable).
Two A4 pages with notes on both sides.
Unannotated paper-based dictionary (no approval required).
Materials to be Supplied to Students:
Script book.
Scribble paper.
Instructions to Students:
Please write your student number in the space provided on the front of the script book.
Attempt all 6 questions.
Start your solution to each question on a new page and clearly label each solution with the corresponding
question number.
To ensure full marks show all the steps in working out your solutions. Marks may be deducted for failure
to show working or formulae.
Selected statistical tables are attached to the back of the examination paper.
If a required degree of freedom is not listed in a statistical table, please use the closest degree of freedom.
Unless otherwise stated, use a significance level of α = 5%.
Round all numeric answers to 4 decimal places.
Question: 1 2 3 4 5 6 Total
Marks: 22 13 22 7 16 26 106
Question 1 [22 marks]
It is widely accepted that regularly flossing your teeth will decrease the chance of get_xfffe_ting a cavity. Also, people who are generally more health conscious tend to floss more
regularly. A study was conducted to investigate how often people flossed, how often
people exercised and the relationship between flossing and exercising. A sample of 500
people were surveyed about their flossing and exercising habits and the resulting data
is summarised in the following table (which lists the number of people falling into each
category of flossing frequency and exercise frequency):
Flossing frequency (days/week)
0 1 2 3 4 5 6 7
Exercise Less than 1 23 32 22 21 17 15 14 10
frequency Between 1 and 5 21 21 28 32 34 19 15 11
(hrs/week) More than 5 12 11 16 24 33 31 21 17
(a) [4 marks] Test whether the proportion of people who floss 3 days a week is the
same among people who exercise less than 1 hour a week and among people who
exercise more than 5 hours a week. Clearly state your hypotheses and use a signif icance level of α = 5%.
(b) [4 marks] Test whether the population proportions of people who exercise less
than 1 hour a week, between 1 and 5 hours a week, and more than 5 hours a week
are the same. Clearly state your hypotheses and use a significance level of α = 10%.
(c) [4 marks] For people who exercise more than 5 hours a week, test whether there
are 3 times as many people who floss at least 4 days a week than people who don’t.
Clearly state your hypotheses and use a significance level of α = 5%.
The more you floss, the less likely you are to see a dentist, which means that you will
probably spend less on dentist fees each year. Ideally, we would like to go back and ask
each of the 500 people how much they spent on dentist fees each year. Unfortunately,
the study only had enough funding to go back and ask 2 people from each cell of the
first table about their yearly dentist fees. The overall sample variance of the dentist fees
of the 48 people was s2 = 1592.212. In addition, a two-way ANOVA was performed on
the data and the ANOVA table is displayed below:
Source Sum of squares Degrees of freedom Mean squares F
Flossing 27230.81
Exercise 4766.30
Interaction 17883.32
Error
Total
Past Final Examination 3 Page 2 of 8 STAT7055
(d) [4 marks] Test whether there is an interaction between flossing frequency and
exercise frequency. Clearly state your hypotheses and use a significance level of
α = 5%.
(e) [3 marks] Test whether there is a difference in the mean yearly dentist fees between
the different levels of flossing frequency. Clearly state your hypotheses and use a
significance level of α = 5%.
(f) [3 marks] Test whether there is a difference in the mean yearly dentist fees between
the different levels of exercise frequency. Clearly state your hypotheses and use a
significance level of α = 5%.
Question 2 [13 marks]
Suppose I have 5 fair coins, two which have a head on both sides, one which has a tail
on both sides, and two which are normal (a head on one side and a tail on the other
side).
(a) [3 marks] I shut my eyes, pick one of the 5 coins at random, and flip it. Find the
probability that the lower face of the coin is a head.
(b) [3 marks] I open my eyes and see that the coin is showing heads (on the upper
face). Find the probability that the lower face is a head.
(c) [4 marks] I shut my eyes again and flip the same coin again. Find the probability
that the lower face is a head. Remember that I already saw a heads on the upper
face on the first flip.
(d) [3 marks] I open my eyes and see that the coin is showing heads (on the upper
face). Find the probability that the lower face is a head. Remember that I already
saw a heads on the upper face on the first flip.
Past Final Examination 3 Page 3 of 8 STAT7055
Question 3 [22 marks]
We have a coin which we suspect may not be fair. Suppose the coin is flipped 40 times
and the number of times it comes up heads is 15.
(a) [4 marks] Test whether the probability that the coin comes up heads is not less
than 0.5. Clearly state your hypothesis and use a significance level of α = 10%.
(b) [4 marks] Calculate the probability of making a type II error for the test in part
(a) at a significance level of α = 10%, if the true probability of coming up heads is
0.45.
(c) [3 marks] Suppose the true probability that the coin comes up heads was indeed
0.45. How many coin flips are required to estimate the true probability to within
0.1, with 99% certainty
Suppose we have 10 of these unfair coins, where the probability that each coin comes up
heads is 0.45. Just because we can, we decide to perform some more fun experiments
with these coins. The first experiment is that we flip each of the 10 coins once. Each
coin that comes up heads is flipped a second time. Let X be the number of heads that
appears in the second round of flips.
(d) [4 marks] Calculate E(X) and V (X).
For the second experiment, we take two of these unfair coins and flip them. For the ith
coin, define a new random variable Yi in the following way: if the coin comes up heads,
Yi is equal to 1; if the coin comes up tails, we roll a fair six-sided die and Yi is equal to
the number that comes up. Let Z denote the sum of the numbers we get for each coin,
that is, Z = Y1 + Y2.
(e) [4 marks] Calculate E(Z).
(f) [3 marks] Find P(Z < 4). Question 4 [7 marks] Suppose X has a binomial distribution with parameters n and p. Recall that the variance of the sample proportion, p = X n , is equal to V (p ) = p(1 n p) . If we were trying to estimate the variance of the sample proportion, a reasonable estimate would seem to be p (1 n p ) . (a) [5 marks] Show that p (1 n p ) is not an unbiased estimator of p(1 n p) . (b) [2 marks] Modify p (1 n p ) slightly to form an unbiased estimator of p(1 n p) . Past Final Examination 3 Page 4 of 8 STAT7055 Question 5 [16 marks] The effect of alcohol on the body appears to be much greater at higher altitudes. To test this theory, an investigator randomly sampled 12 people and randomly divided them into two groups of six. The first group was given 100cc of alcohol (in the form of free beer) at sea level. The second group was transported halfway up Mount Everest and given the same amount of alcohol. The blood alcohol level (×100) of each person in each group was measured and the results summarised in the following table: Sea level 15 000 feet 7 13 10 17 9 15 12 14 9 10 13 14 (a) [5 marks] Test whether the population variances of the alcohol levels in the two groups are equal. Clearly state your hypotheses and use a significance level of α = 5%. (b) [5 marks] Test whether the population mean alcohol level from group 2 (15 000 feet) exceeds the mean from group 1 (sea level) by more than 1. Clearly state your hypotheses and use a significance level of α = 5%. The investigator doesn’t fully trust the results from the test in part (a) and decides he needs to collect more data. Suppose that he repeats the entire experiment 5 times. That is, he takes a new sample of 12 people each time, gives them free beer in the same manner described above and measures their blood alcohol levels. Each time he repeated the experiment, he calculated the sample variances of the blood alcohol levels in the two groups. The sample variances are listed in the table below: Sea level 15 000 feet 4.168 4.857 5.445 5.851 4.843 4.466 5.770 5.963 4.445 5.355 (c) [6 marks] Based on this new data, test whether the population variances of the alcohol levels in the two groups are equal. You can assume that the population variances of these samples variances are the same in each group. Clearly state your hypotheses and use a significance level of α = 5%. Past Final Examination 3 Page 5 of 8 STAT7055 Question 6 [26 marks] A study was conducted to determine what variables might be important in predicting the height of a professional basketball player. The Height (cm), Weight (kg), Wingspan (cm), Salary (million dollars) and Vertical Leap (inches) were recorded for 9 profes sional basketball players. The data are summarised in the table below, along with some summary statistics. Height (Y ) Weight (X1) Wingspan (X2) Salary (X3) Vertical Leap (X4) 195.05 132.90 197.94 1.45 33.00 197.62 133.93 201.32 3.39 33.55 201.18 136.92 204.44 4.04 29.82 201.09 136.46 205.93 5.48 29.92 202.54 136.20 205.90 7.20 35.60 201.72 139.60 203.67 4.73 44.54 205.83 145.28 209.45 4.45 38.40 208.84 147.70 211.99 4.95 44.69 211.53 147.90 213.07 4.29 32.19 s2 Y = 26.943544 s2 X1 = 34.047078 s2 X2 = 24.062694 s2 X3 = 2.408569 s2 X4 = 32.307803 A multiple regression model was fitted to all 4 independent X variables and some output from the regression analysis is displayed below. Predictor Coef SE Coef T p-value Intercept 63.4562 68.5409 0.93 0.4069 Weight 0.7063 0.5794 1.22 0.2898 Wingspan 0.2078 0.7092 0.29 0.7841 Salary 0.5964 0.7591 0.79 0.4760 Vertical Leap 0.1322 0.1677 0.79 0.4745 Analysis of Variance Source DF SS MS F p-value Regression 52.19429 Residual Error Total 215.5484 (a) [2 marks] Test the overall significance of the model. Clearly state your hypotheses and use a significance level of α = 5%. (b) [3 marks] We are told that the adjusted R2 is equal to 0.962402. Based on this information, calculate the unadjusted R2. Past Final Examination 3 Page 6 of 8 STAT7055 We know that including independent variables that are unrelated to Y is not a very good idea. A suggestion was made to remove Salary and Vertical Leap from the model. (c) [4 marks] Test whether the population correlation coefficient between Weight and Wingspan is not equal to 0. Clearly state your hypotheses and use a significance level of α = 5%. The following summary statistic has also been provided: sX1X2 = 26.51125. (d) [2 marks] Based on your answer to part (c), do you think it would be a good idea to fit a multiple regression model with Weight and Wingspan as independent variables Why or why not A simple linear regression model with only Wingspan as an independent variable was fitted to the data. Unfortunately, the person who performed the regression lost all the output except for the proportion of variation explained by the model, R2 = 0.9664913. They also remembered that the estimated slope of the regression model was positive. (e) [5 marks] Fit the regression model Y = β0 + β1X2 + #. That is, calculate the estimates β 0 and β 1. After the results of this study were published, one of the main criticisms was that the sample size of 9 was much too small to produce any reliable results. A followup study was performed with a much larger sample of players. In addition to the player’s Height (Y ) and Wingspan (X), the following information was also recorded: an indicator variable (M) that equals 1 if the player was male; and an indicator variable (P) that equals 1 if at least one of the player’s parents also played professional basketball. The following model was fitted to the data in the followup study: Y = β0 + β1X + β2M + β3P + β4 (X × M) + β5 (X × P) + # Some output from the regression analysis, including plots of the residuals, are displayed below: Histogram Residuals 8 6 4 2 0 2 4 6 190 195 200 205 Residual Plot Fitted Values Past Final Examination 3 Page 7 of 8 STAT7055 Frequency 0 10 20 30 6 4 2 0 2 4 Residuals Predictor Coef SE Coef T p-value Intercept 117.6208 33.8876 3.47 0.0008 X 0.3834 0.1737 2.21 0.0298 M 80.1757 35.7644 2.24 0.0273 P 17.3537 19.1019 0.91 0.3659 X × M 0.4186 0.1827 2.29 0.0242 X × P 0.0835 0.0941 0.89 0.3774 (f) [2 marks] What assumptions are we testing in each of these plots Comment briefly on what these plots tell us about our model. (g) [2 marks] What do you conclude regarding the relationship between Height and Wingspan Clearly state your hypotheses and use a significance level of α = 5%. (h) [2 marks] Test whether a different intercept is needed for players who have at least one parent who played professional basketball. Clearly state your hypotheses and use a significance level of α = 5%. (i) [2 marks] Test whether a different coefficient parameter for Wingspan is needed for male players. Clearly state your hypotheses and use a significance level of α = 5%. (j) [2 marks] What additional variable would we have to include in our model in order to determine whether a different coefficient parameter is needed for the male indicator variable (M) for players with at least one parent who played professional basketball Clearly state the hypotheses we would test. END OF EXAMINATION Past Final Examination 3 Page 8 of 8 STAT7055