R语言数理统计 | EBC2107 Mathematical Statistics Assignment

s
我=
1个
24
Yi-6 +
1个
12
X
5
j = -5
Yi + j +
1个
24
易+6。
图1并排绘制了原始数据和平滑后的每月数据。注意-巨大
-差异。
日期
°C
1920 1940 1960 1980 2000 2020
5
0
5 10 15 20
德·伯特
埃尔德
马斯特里赫特
(a)原始每月温度
日期
°C
1920 1940 1960 1980 2000 2020
6
7
8
9 10 11 12
德·伯特
埃尔德
马斯特里赫特
(b)每月气温平稳
图1:1907-2019年荷兰每月温度
年度数据年度数据只是简单地计算为一天中所有天的平均值
一年。由于这些数据不再包含任何季节性模式,因此可以直接
用于趋势分析。这些数据构成了作业的主要输入。数字
2绘制年度数据。
3 Programming in R
For the assignment you have to programme the techniques we learn in the course in the
statistical software package R. R is available for (free) download on http://www.r-project.org. More
information about R can be found on the course page.
4 Assignment
For your assignment you write a paper where you should try to answer the question if there
is statistical evidence of an upward trend in the temperature series. The main focus should
2
Date
° C
1920 1940 1960 1980 2000 2020
7
8
9 10 11
De.Bilt
Eelde
Maastricht
Figure 2: Annual Dutch temperatures 1907-2019
be on the annual data. Choose one series as your main series of interest, but check if your
conclusions change depending on which series you use.
To guide you in the analysis, below you can find a list with specific questions to consider
in your analysis. Remember though that in the end you should provide one coherent analysis
in the paper, and not a point by point answering of the questions.
Compare average temperatures in different parts of the sample
Start by analyzing average temperatures over different parts of the sample. Split the sample
in a number of subsamples, and compare average temperatures across the subsamples. You
can vary the way how to split your sample. Make sure estimation uncertainty is taken into
account, e.g. by constructing confidence intervals. You could also consider overlapping versus
non-overlapping subsamples.
You can also consider a formal test for equality in different subsamples, for example you
could split the sample in two and test whether the mean temperatures in both parts are equal.
3
Investigate the presence of a linear upward trend
Next we fit a linear regression model to the data. That is, if Y1, . . . , Yn are the temperature
data, we fit the regression model
Yi = α + βxi + εi
, (1)
and take x1, . . . , xn to represent a linear trend. Estimate the model, and provide measures of
estimation uncertainty such as confidence intervals. Also perform a hypothesis test to test if
an upward trend is visible in the data.
Investigate the presence of a linear upward trend in part of the sample
While we have data from 1907 on, some people claim temperatures only started to rise
significantly from the seventies on. Here we investigate that claim. We still consider model
(1), but now we take x1, . . . , xn such that the linear trend only kicks in from a starting point
late in the sample on. Find a reasonable point to start the trend at (e.g. somewhere in the
seventies) and motivate your choice. Motivation can come from either outside sources, or
from the data. In case you motivate the choice from the data, explain how this could be
misleading.
Analyse the model in the same way as for the overall linear trend and contrast your
findings with that case.
Implement the bootstrap
For the most interesting parts of your analyse above, construct the relevant hypothesis tests
and confidence intervals using the bootstrap rather than the standard approach. Focus especially on implementing the bootstrap for the regression model. Discuss how this changes
your results and how to interpret any changes.
Discuss the assumptions you need in the analysis
For all parts of your analysis, discuss carefully which assumptions you needed to make. For
example, do you need to assume normality When do you need to assume independent and/or
identically distributed random variables Can you give an asymptotic justification of your
methods that avoids some of the assumptions
Once you set up the assumptions you need, discuss how likely it is that they are satisfied
for your data. You can also think about ways to check if your assumptions are satisfied, either
formally or informally. For example, can we check if normality, or independence, are satisfied
by the data
Extension to monthly data
So far you could use just the annual data for the analysis. A final issue that you could
consider is how using monthly data changes the picture. A straightforward extension is just
to apply the same techniques to the smoothed monthly data. You can take some of the most
interesting aspects of your previous analysis and repeat them using those data. If you do
so, carefully discuss how much added benefit it is to consider these smoothed monthly data
relative to the annual data, especially in light of the assumptions made.
4
Another option would be to use the monthly data in a different way. For example, you
could think of alternative ways to remove the seasonal effect. Alternatively, you could use the
information in the monthly data differently; rather than looking at overall trends you could
consider just summer or winter, or maybe look at the variation in temperatures within a year.
Turn your analysis into a paper and conclude
Having performed all your analyses, you need to draw meaningful conclusions from it. Make
sure your paper is coherent – writing a paper is more than just ticking boxes and doing
exercises. In an academic paper you tell a story that has a logical flow from start to end.
Formulate research questions in the introduction, address these in the analyses, and provide
answers to the questions in the conclusion.
An important aspect of the story writing is to translate your statistical findings into
societal meaningful conclusions. What is the meaning of your findings Are there limitations
to the methods used, or assumptions they require, that might affect how you draw conclusions
Try to address these issues in your paper.