ABERDEEN 2040 BU5565 Empirical Methods in Finance Research Workshop 1 (week 28) Dr Filipa Da Silva Fernandes Academic Year: 2021-22 ABERDEEN 2040 1 . Define the concept of non-stationary. Discuss the different types of non-stationary data. A Non-stationary time series does not have a constant mean and variance. In the lectures we distinguish between a random walk and a random walk with a drift (a constant term). Overall, we always need to transform data to obtain stationarity. There are different types of stationary: Strict Stationarity: Denotes that the joint distribution of any moments of any degree (i.e. expected values, variances) within the process is never dependent on time. First-order stationarity: Indicates that the series have means that never change with time. Any other moment like variance can change. Trend-stationary models fluctuate around a deterministic trend (the series mean). These deterministic trends can be linear or quadratic, but the amplitude (height of one oscillation) of the fluctuations neither increases nor decreases across the series. Difference-stationary models are models that need one or more differences to become stationary. For example, differencing financial data like stock market data. ABERDEEN 2040 1 . Define the concept of non-stationary. Discuss the different types of non-stationary data. ABERDEEN 2040 2. In describing time series, we can use words such as “trend” and “seasonal”. Describe these concepts. Trend: long-term increase or decrease in the data. It does not have to be linear. Some authors mention that a trend can also be called as “changing direction” when it might go from an increasing trend to a decreasing trend: ABERDEEN 2040 2. In describing time series, we can use words such as “trend” and “seasonal”. Describe these concepts. seasonal pattern: condition when a time series is affected by seasonal factors such as the time of the year or the day of the week. ABERDEEN 2040 3. Define differenced variable and its purpose. The dth differencing operator applied to a time series x is to create a new series z whose value at time t is the difference between x(t + d) and x(t). This method works very well in removing trends and cycles. For example, first differencing applied to a series with a linear trend eliminates the trend while if cycles of length d exist in a series, a dth difference will remove them. ABERDEEN 2040 4. Define the concept of Autoregressive Process (AR). If I decide to model profits, as an AR (4) process, how would I write the underlying function An AR process is when a value from a time series is regressed on previous values from that same time series: = 0 + 1 1 + (1) The order of an autoregression is the number of immediately preceding values in the series that are used to predict the value at the present time. In this case, the above model is an AR with what order Answer: first -order autoregression written as AR(1). So, with the same logic, a profit model with an AR(4) process would be: (2) ABERDEEN 2040 5. Discuss how you might use the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF) to tell the type of underlying process. By looking at the ACF and the PACF plots of a differentiated series we can try to identify the AR and/or MA terms that are needed. The main issue Determining the order of an AR/Ma process from its autocorrelation function is difficult. To resolve this issue, the Partial Autocorrelation Function (PACF) is introduced. The ACF plot : bar chat of the coefficients of correlation between a time series and lags of itself. The PACF: partial correlation coefficients between the series and lags of itself. ABERDEEN 2040 6. A research plot different univariate time series Autocorrelation function and Partial Autocorrelation function. 6.1) Define the below process ACF : Coefficients decrease with the lag: AR process PACF: Lag 1 is the only statistically significant coefficient. Hence we have an AR(1) process. ABERDEEN 2040 6. A research plot different univariate time series Autocorrelation function and Partial Autocorrelation function. 6.2) Define the below process ACF : A single coefficient different from zero is observed PACF: A geometric decay is observed. Hence we have an MA(1) process. ABERDEEN 2040 7. Describe the steps behind the construction of an ARMA model using the Box-Jenkins Methodology. Box and Jenkins (1976) were the first to consider ARMA (and thereafter ARIMA) modelling in this logical and coherent fashion. Their methodology consists of 3 steps: Identification: We need to determine the specific order of the model (by using graphical procedures, i.e. ACF and PACF) Estimation: Estimation of the parameters of the model of size given in the first stage. This can be done using the OLS or the maximum likelihood, depending on the model that we are testing. Diagnostic checking: The fitted model is checked for inadequacies by checking for the autocorrelation of the residual series and overfitting the model. ABERDEEN 2040 8. Define the following univariate time-series models: 8.1: = . + 8.2: = . . + 8.3: = . + 8.4: = . + . + 8.1) = . + MA(1) 8.2) = . . + MA(2) 8.3) = . + AR(1) 8.4) = . + . + ARMA(1,1) ABERDEEN 2040 9. Open the E-views file “GNPDEF.WF1”. The file contains information on quarterly observations on U.S. GNP and the implicit price deflator for GNP for 1947 through 2019. 9.1) Plot the graph of the U.S. GNP series. What can you conclude The series does not seem to have a constant mean and variance it seems that we are in the presence of a non-stationary series. ABERDEEN 2040 9.2) In E-views run the Augmented Dickey-Fuller Test (ADF) for the above series. What can you conclude Note: ADF definition ADF test is fundamentally a statistical significance test. Hence we have a hypothesis testing involved with a null and alternate hypothesis and as a result a test statistic is computed and p-values get reported. It is from the test statistic and the p-value, you can make an inference as to whether a given series is stationary or not. So, how exactly does the ADF test work The ADF test belongs to a category of tests called ‘Unit Root Test’, which is the proper method for testing the stationarity of a time series. So what does a ‘Unit Root’ mean Unit root is a characteristic of a time series that makes it non-stationary. Technically speaking, a unit root is said to exist in a time series of the value of alpha = 1 in the below equation. ABERDEEN 2040 9.2) In E-views run the Augmented Dickey-Fuller Test (ADF) for the above series. What can you conclude Note: ADF definition So what does a ‘Unit Root’ mean Unit root is a characteristic of a time series that makes it non-stationary. Technically speaking, a unit root is said to exist in a time series of the value of alpha = 1 in the below equation. where, Yt is the value of the time series at time ‘t’ and Xe is an exogenous variable (a separate explanatory variable, which is also a time series). What does this mean to us The presence of a unit root means the time series is non-stationary. Besides, the number of unit roots contained in the series corresponds to the number of differencing operations required to make the series stationary. Hypotheses: H0: A unit root is present in the model (it implies that the series is non-stationary) Ha: The series is stationary. ABERDEEN 2040 9.2) In E-views run the Augmented Dickey-Fuller Test (ADF) for the above series. What can you conclude ADF p-value=0.990 (or the t-stat of 1.4 falls in the non rejection area). We cannot reject the null hypothesis, suggesting the presence of stationary. The second part of the output: the t-value of the lagged of the GNP series is 1.40062 with a p-value of 0.1624. This suggests that the coefficient is not different from zero, and thus, supporting the hypothesis that the U.S. GNP indicator is a random walk, or in other words, the series is non-stationary. Null Hypothesis: GNPDEF has a unit root Exogenous: Constant Lag Length: 3 (Automatic – based on SIC, maxlag=15) t-Statistic Prob.* Augmented Dickey-Fuller test statistic 1.400662 0.9990 Test critical values: 1% level -3.452911 5% level -2.871367 10% level -2.572078 *MacKinnon (1996) one-sided p-values. Augmented Dickey-Fuller Test Equation Dependent Variable: D(GNPDEF) Method: Least Squares Date: 01/11/21 Time: 13:14 Sample (adjusted): 1948Q1 2019Q4 Included observations: 288 after adjustments Variable Coefficient Std. Error t-Statistic Prob. GNPDEF(-1) 0.000395 0.000282 1.400662 0.1624 D(GNPDEF(-1)) 0.508514 0.058486 8.694607 0.0000 D(GNPDEF(-2)) 0.173378 0.065249 2.657187 0.0083 D(GNPDEF(-3)) 0.181805 0.059114 3.075523 0.0023 C 0.027744 0.016600 1.671256 0.0958 R-squared 0.698956 Mean dependent var 0.348257 Adjusted R-squared 0.694701 S.D. dependent var 0.247114 S.E. of regression 0.136540 Akaike info criterion -1.127191 Sum squared resid 5.276012 Schwarz criterion -1.063598 Log likelihood 167.3154 Hannan-Quinn criter. -1.101706 F-statistic 164.2652 Durbin-Watson stat 2.033520 Prob(F-statistic) 0.000000 ABERDEEN 2040 9.3) In E-views, generate a new variable log_GNP (In E-views: Menu Quick, Generate series, log_GNP=log (GNPDEF). Note: A simple but often effective way to stabilize the variance across time is to apply a power transformation (square root, cube root, log) to the time series. This is the ideal way before using more sophisticated methods. ABERDEEN 2040 9.4) Plot the graph of the log_GNP series. What can you conclude Once again, the series seems to have an upward trend, and therefore, no constant mean and variance, suggesting that it is non-stationary. 2.4 2.8 3.2 3.6 4.0 4.4 4.8 50 55 60 65 70 75 80 85 90 95 00 05 10 15 LOG_GNP ABERDEEN 2040 9.5) In E-views run the Augmented Dickey-Fuller Test (ADF) for the above series. What can you conclude Once again the series is non-stationary. Null Hypothesis: LOG_GNP has a unit root Exogenous: Constant Lag Length: 3 (Automatic – based on SIC, maxlag=15) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -0.570340 0.8735 Test critical values: 1% level -3.452911 5% level -2.871367 10% level -2.572078 *MacKinnon (1996) one-sided p-values. Augmented Dickey-Fuller Test Equation Dependent Variable: D(LOG_GNP) Method: Least Squares Date: 01/11/21 Time: 13:48 Sample (adjusted): 1948Q1 2019Q4 Included observations: 288 after adjustments Variable Coefficient Std. Error t-Statistic Prob. LOG_GNP(-1) -0.000171 0.000300 -0.570340 0.5689 D(LOG_GNP(-1)) 0.560739 0.058242 9.627821 0.0000 D(LOG_GNP(-2)) 0.144034 0.066575 2.163483 0.0313 D(LOG_GNP(-3)) 0.130570 0.058078 2.248192 0.0253 C 0.001812 0.001220 1.484842 0.1387 R-squared 0.630569 Mean dependent var 0.007611 Adjusted R-squared 0.625348 S.D. dependent var 0.006103 S.E. of regression 0.003736 Akaike info criterion -8.324485 Sum squared resid 0.003950 Schwarz criterion -8.260892 Log likelihood 1203.726 Hannan-Quinn criter. -8.299000 F-statistic 120.7609 Durbin-Watson stat 1.898292 Prob(F-statistic) 0.000000 ABERDEEN 2040 9.6) Generate the log_gnp in first differences. In E-views, select the Menu Quick, generate series: Once again the series is non-stationary. ABERDEEN 2040 9.7) Plot the graph of the first differences of the log_GNP series. What can you conclude The series seems to have a constant mean and variance (i.e. it appears to cross its mean value frequently). As such, it seems that we are in the presence of a stationary series. -.02 -.01 .00 .01 .02 .03 .04 50 55 60 65 70 75 80 85 90 95 00 05 10 15 DLOG_GNP ABERDEEN 2040 9.8) In E-views, run the ADF test in first differences. Use the log_GNP series. What can you conclude These results suggest that we can reject the unit root hypothesis in the first differences of the log GNP series. The estimated Tau-statistic is more highly significantly negative than even the 1% critical value. Null Hypothesis: D(LOG_GNP) has a unit root Exogenous: Constant Lag Length: 2 (Automatic – based on SIC, maxlag=15) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -4.128201 0.0010 Test critical values: 1% level -3.452911 5% level -2.871367 10% level -2.572078 *MacKinnon (1996) one-sided p-values. Augmented Dickey-Fuller Test Equation Dependent Variable: D(LOG_GNP,2) Method: Least Squares Date: 01/11/21 Time: 13:58 Sample (adjusted): 1948Q1 2019Q4 Included observations: 288 after adjustments Variable Coefficient Std. Error t-Statistic Prob. D(LOG_GNP(-1)) -0.161100 0.039024 -4.128201 0.0000 D(LOG_GNP(-1),2) -0.276681 0.060925 -4.541309 0.0000 D(LOG_GNP(-2),2) -0.131830 0.057967 -2.274234 0.0237 C 0.001149 0.000373 3.081540 0.0023 R-squared 0.183299 Mean dependent var -7.18E-05 Adjusted R-squared 0.174671 S.D. dependent var 0.004107 S.E. of regression 0.003731 Akaike info criterion -8.330280 Sum squared resid 0.003954 Schwarz criterion -8.279406 Log likelihood 1203.560 Hannan-Quinn criter. -8.309893 F-statistic 21.24677 Durbin-Watson stat 1.899194 Prob(F-statistic) 0.000000 ABERDEEN 2040 9.9) In E-views, run the Phillips-Perron test (In E-views: Menu View-Unit Root test, select the Phillips-Perron test). What can you conclude We reject the null that the series has a unit root. As such, the Phillips-Perron test confirms that the D_log_GNP series is stationary. Null Hypothesis: DLOG_GNP has a unit root Exogenous: Constant Bandwidth: 6 (Newey-West automatic) using Bartlett kernel Adj. t-Stat Prob.* Phillips-Perron test statistic -5.907523 0.0000 Test critical values: 1% level -3.452753 5% level -2.871298 10% level -2.572041 *MacKinnon (1996) one-sided p-values. Residual variance (no correction) 1.50E-05 HAC corrected variance (Bartlett kernel) 1.43E-05 Phillips-Perron Test Equation Dependent Variable: D(DLOG_GNP) Method: Least Squares Date: 01/11/21 Time: 14:20 Sample (adjusted): 1947Q3 2019Q4 Included observations: 290 after adjustments Variable Coefficient Std. Error t-Statistic Prob. DLOG_GNP(-1) -0.222604 0.036953 -6.023950 0.0000 C 0.001686 0.000366 4.607128 0.0000 R-squared 0.111900 Mean dependent var -3.62E-05 Adjusted R-squared 0.108817 S.D. dependent var 0.004121 S.E. of regression 0.003890 Akaike info criterion -8.253940 Sum squared resid 0.004358 Schwarz criterion -8.228630 Log likelihood 1198.821 Hannan-Quinn criter. -8.243799 F-statistic 36.28798 Durbin-Watson stat 2.348816 Prob(F-statistic) 0.000000