程序案例-STAT 153

STAT 153 Project
Due: 5 pm, November 23, 2021
The main goal of this class is to be able to analyze and forecast a time series. Thus, your project is
to analyze and forecast a time series of your choice! Broadly speaking, your task is to
1. Form groups of 3-5 students to work on the project.
2. Find a dataset in an interesting field and to analyze.
3. Complete a detailed analysis of the dataset.
Exploratory data analysis
Pursue stationarity two different ways (two signal models)
Model the plausibly-stationary noise from the two signal models, in two different ways each,
resulting in four total models.
Appropriately select the model (and you should discuss how you define “best”).
Using this best model, forecast the next 10 time points.
4. Write up results in a report (5-10 pages). Include the name and SID of your group members. Each
group will submit a single pdf file.
5. This document is not intended to give you a step-by-step instruction on how to do the analysis. It
only helps you with Latex and shows you some general aspect you should take care about in your
report.
Summary:
In the beginning of your report, write a short summary. You should summarize in a few
sentences which data set you analyzed, which model you used to make predictions, and
briefly describe the public impact of your analysis and prediction.
1 Introduction
Here you can discuss the background of the question you will explore and describe your data resources.
2 Exploratory Data Analysis
Here you will explore the data. Naturally, the first plot you should make is the data itself. You should
point out any visible features, e.g. heteroscedasticity, seasonality, trend.
3 Models Considered
Here I would put a brief introductory statement. First, include details on your pursuit of stationarity:
removing trends and seasonalities, stabilizing the variance, and any other operation that makes sense
for your dataset. You should do this in two different ways (e.g., model A and model B). Then, second,
describe your ARMA model selections: provide convincing justification why a particular ARMA (or
SARIMA, etc.) is suitable. You should do this in two different ways for each stationary series, resulting
in four different models (e.g. models A1, A2, B1, B2). For example, the models could be.
1
polynomial trend + ARMA(1,1)
polynomial trend + SARIMA(p = 0, q = 0, P = 1, Q = 2, S = 12)
ARIMA(p = 1, d = 2, q = 3) (second differences with ARMA(1,3))
SARIMA(p = 0, d = 2, q = 0, P = 1, S = 12).
How to begin this section is up to you, but I suggest individual model details be broken up into subsections
below.
3.1 Signal Model 1 (but you won’t call it “signal model 1”, perhaps “Para-
metric model: quadratic polynomial of time with monthly indicators”)
Here I describe the first method of pursuing stationarity. It’d probably be good to show the fitted values
and the stationary-looking residuals. I would also describe why I made these modeling decisions, but it
should NOT be a travel log of all the different models I tried before landing on this one.
3.1.1 Signal Model 1 + ARMA version 1
Describe what the ARMA choice is and why it was made. ACF and PACF plots should show up
somewhere, and perhaps the full SARIMA diagnostics if that makes sense.
3.1.2 Signal Model 1 + ARMA version 2
Describe you second ARMA choice. No need to show the ACF of the residuals again; simply cite the
figure/plot from the previous sub-sub-section.
3.2 Signal Model 2
Describe the second method of pursuing stationarity. See the instructions in the previous subsection.
3.2.1 Signal Model 2 + ARMA version 1 (but you won’t call it that, perhaps “Second-
order differencing with ARMA(2,3)”
Same as above. But note, in the coming weeks we’ll see that “Second-order differencing with ARMA(2,3)”
can be called ARIMA(2,2,3).
3.2.2 Signal Model 2 + ARMA version 2
Insert needed text and figures here.
4 Model Comparison and Selection
Compare your models as appropriate. This may involve AIC, BIC, etc., but must include time series
cross-validation. State what your criteria is for the best model, and state which of these models is
best. You’ll likely need a table, perhaps more complicated than this one. For example, you may want to
compare different models by how well the predict future values (as your predictions will also be evaluated
in this final project).
Model Name MSE
Linear trend and ARMA(4,1) 68.05
Linear trend and ARMA(1,2) 99.07
Second differences and MA(2) 29.11
SARIMA(1,2,3,4,5,6) 31.92
Table 1: These are the out-of-sample MSE’s for our models of interest. There are many ways to do cross
validation and there are many diagnostics you can look at. MSE is one example.
At the end of this section you should decide on one single model you want to work with for your
analysis.
2
5 Results
For your chosen model, do three things. First, write out the model mathematically. Second, estimate
the parameters of your chosen model, probably in a table. Third, forecast appropriately and include a
plot of your forecasted values appended to the end of your time series. Here, you will discuss the results
of your model fit from Section 4. Define the mathematical model model is defined in equation (1).
(1 φ1B φ2B2) 2Xt = (1 θ1B)Wt (1)
5.1 Estimation of model parameters
You could use a table to show the parameters estimation.
Parameter Estimate (s.e)
φ1 -0.293 (0.06)
φ2 0.068 (0.05)
θ1 -0.990 (0.07)
Table 2: These are our parameter estimates and corresponding standard errors for the ARIMA model
in equation 1.
5.2 Prediction
Use your best model to forecast the next 10 points. Also, show them as new values on the time series
plot of the actual data. It’s helpful here to address uncertainty about your estimates too.
6 Conclusion
Coming back to your scientific problem, what do you learn from this time series analysis
3