BUSINESS SCHOOL QBUS6830 Financial Time Series and Forecasting Practice questions for final exam Q1 (a) If we can observe repeated finite values of a real data series, like a price series, explain why and how it could possibly have an infinite mean, infinite variance and infinite 4th moments. It is not just possible, but quite usual, for a variable to have infinite variance but still yield finite values of itself. A variance is a mathematical formula that involves a sum over an infinite number of values over the real line, and effectively averages (X )2. As an example, a Student-t distribution with 2 degrees of freedom has infinite variance, yet will only generate finite values from its distribution. Variance being infinite simply means that the tails of the distribution are fat or thick or long enough so that the weighted average of (X )2 , weighted by the probability density function, over the whole real line is infinite. It implies we should expect to see outliers in the data from this distribution. Another example is an asset price series. The price can never be infinite, but as the series progresses over time, it does not have to have, and indeed usually does not have, a long-run or any mean reversion, and if it follows a random walk it has no long-run variance either. (b) What is the purpose of a factor model The purpose of a factor model is to find a small number of underlying components in multiple series of data, so as to learn what might drive their variation. It applies to situations where a number of variables have been observed at the same times, over the same time period. The factor model tries to find a set of nonlinear combinations of these series that can explain most of the variation in these series. (c) Explain why the CAPM is not usually used for forecasting purposes. Explain one method you could use to forecast with this model. The CAPM relates asset premiums to market premiums, both at the same time. To forecast what will happen to the asset premium in the next period (e.g. tomorrow), using the CAPM would require knowing the market premium for that same time, the next period (e.g. tomorrow). Of course this is not possible. The only way to forecast with the CAPM is to make an assumption about what will happen tomorrow in the market. E.g. we can do a stress test by seeing what would happen if the market premium was a certain value, or a list of values. (d) Why are likelihood methods most favoured, in general, for estimation in volatility models like GARCH, but not favoured when estimating CAPM models (where least squares is favoured) GARCH models are time series regressions in the squares of the data: i.e. they are models for squared returns. For LS estimation to have good properties in this case, we need the 4th moments of the squared returns, i.e. the 8th moment of returns to be finite. Now, returns certainly do seem prone to outliers and extreme observations, and have much fatter tails than a Gaussian, making even a finite 4th moment questionable. Further, since the 8th moment involves the average of (rt – )8 , and return data has outliers, it seems somewhat unlikely that this 8th moment would be finite for return data. Thus, LS methods are not preferred for GARCH models in general. When estimating a CAPM model, we only need the 4th moment of returns to be finite (not the 8th moment) for LS estimation to have good properties. In this case, LS estimation is favoured for regression since under the LS assumptions the estimates found have desirable properties: like being unbiased, consistent and relatively efficient among estimators for regression parameters. Further, likelihood methods force us to make an assumption about the conditional distribution of Y given X, while LS estimation does not. Also, LS estimators give the conditional mean of Y given X, which is important in asset pricing models. (e) Explain the leverage effect and how it is potentially captured by the GJR- GARCH model. The leverage effect is a theory that suggests that as asset prices fall, the volatility of that asset’s returns increases. Naturally as prices fall a firm’s equity is decreased. If at the same time the level of debt for the company is unchanged the drop in equity will increase the debt/equity ratio and thus leverage increases. The leverage effect associates this with a subsequent increase in volatility due to the increased risk the firm faces. The GJR-GARCH model 2 2 20 1 1 1 1 1 1 1 1 0, 0 1, 0 t t t t t t t I a a I a includes a dummy variable that is 1 whenever a return shock is negative. This dummy variable allows the ARCH affect to be different for negative, than for positive, return shocks. If the variable is positive, then volatility would be higher following negative shocks (since at-1 2 is also positive) than following positive return shocks. Note that this is not exactly the leverage effect, since here volatility would increase only when the return was higher than it estimated mean (i.e. at = rt – t). However, the GJR would capture the leverage effect if the mean was set to zero. (f) Compare the Value at Risk and Expected Shortfall risk measures, listing and discussing at least one advantage and one disadvantage of each, compared to each other. Value at Risk is the maximum amount that an asset return would realise in a fixed period of time at a given probability level. VaR at 1% is then the 1% percentile or quantile of a return distribution over a fixed period of time. Expected shortfall (ES) is the average amount that an asset return would realise in a fixed period of time if it was more extreme than a return quantile at a given probability level. ES at 1% is then the average return for returns below the 1% percentile or quantile of a return distribution, i.e. below the 1% VaR, over a fixed period of time. An advantage of ES is that it represents the average loss in a specific part of the return distribution, which is perhaps more representative of such losses than the VaR, which represents the minimum loss in that part of the distribution. A disadvantage of ES is that it is often a point that is very far out in the tails of the returns distribution, and is thus estimated with a high level of uncertainty or standard error (since hardly any actual observations are this far out in the tails in real data sets), and is very sensitive to outliers. A disadvantage of VaR is that it represents a minimum loss in one part of the return distribution, which is not really representative of the range or typical losses in that part of the distribution. An advantage of VaR is that it is not as extreme as ES and thus can be estimated with more certainty and with less standard error. (g) A GARCH model specifies the one-step-ahead forecast return distribution. Explain why it does not specify the two-step-ahead return distribution and why this distribution is not the same as the one-step-ahead forecast return distribution. A GARCH model can be written as: ; t t t t t tr a a 2 2 2 0 1 1 1 1t t ta ~ (0,1) where 0 and 1 t t t D E Var This setting implies that the one-step-ahead distribution for returns is: 21 1 1| ~ ,t t t tr D where t+1 and t+1 are known at time t, so only t+1 varies here, and it varies according to D. For the 2-step-ahead distribution 2 2 2 2 | |t t t t t tr Here 2 2 2 2 2 2 0 1 1 1 1 0 1 1 1 1 1t t t t t ta . Since this depends on 2 1t , which is not known at time t, this two-step-ahead standard deviation is a random variable. Thus 2 2 2 2| |t t t t t tr is a random variable that involves the distribution of the multiplication of the rv t+2 and the rv t+2. We do not, in general know what the distribution of this new rv is, except that it is not the same as D. (h) Why is forecasting important in finance, especially in the context of investment Investment involves making a decision and seeing and realising the subsequent result of that decision. In the simplest case if we buy an asset, we make money if the asset price subsequently increases and lose money if that price subsequently decreases. Thus, our investment decisions are based on what we think we will happen after we make our investment decision. Thus forecasting is important, since we are at least implicitly forecasting what will occur. A little thought will help us realise that all investments require forecasts of what will happen to prices or other financial instruments. Q2 We consider the daily prices for the asset NAB (National Australia Bank) from January 2003 until June 2012 with 2434 observations. Percentage log returns for NAB appear in the bottom plot above. (a) (5 marks, as shown) GARCH models with Gaussian and Student-t errors are fit to this data, using only the first 2000 returns in the sample, with the following results: 2 2 2 1 1 0.005 ; ; ~ (0,1) 0.062 0.158 0.831 t t t t t t t t t r a a N a * 5.0 2 2 2 1 1 0.041 ; ; ~ (0,1) 0.033 0.141 0.858 t t t t t t t t t r a a t a i. Interpret the estimated GARCH-t model, i.e. explain the three parts of the estimated model. The unconditional average return is estimated as 0.041%. The conditional distribution is estimated to be a standardised Student-t with 5 degrees of freedom. The volatility equation has 11/10/02 23/02/04 07/07/05 19/11/06 02/04/08 15/08/09 28/12/10 11/05/12 23/09/13 15 20 25 30 35 40 45 11/10/02 23/02/04 07/07/05 19/11/06 02/04/08 15/08/09 28/12/10 11/05/12 23/09/13 -60 -40 -20 0 20 40 estimated intercept of 0.033, with ARCH effect of 0.141 and GARCH effect of 0.858. The ARCH effect indicates how much the volatility will increase by, if yesterday’s squared shock was increased by 1%, holding yesterday’s volatility constant. Thus, if yesterday’s squared shock was increased by 1%, the volatility estimate would increase by 0.141. ii. If yesterday’s variance was 0.93 and yesterdays return shock was -1.5%, estimate today’s volatility. The estimate is: 2 20.033 0.141 1.5 0.858 0.93t = 1.148 iii. Estimate the volatility persistence and the average variance in NAB returns, for both models. Volatility persistence is given by ARCH+GARCH effects = 0.141+0.858 = 0.999 Average volatility is given by 0 1 1 0.033 33 1 1 0.141 0.858 (b) Diagnostics are applied to the GARCH with Student-t errors, as follows: The degrees of freedom are estimated as 6.1, and these are used to form the transformed standardised residuals, as 1 6.1 t te T , where t are the standardised residuals from the 2nd model above. These residuals t e are plot over time below, as is there ACF for the first 25 lags. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 -5 0 5 0 2 4 6 8 10 12 14 16 18 20 -0.5 0 0.5 1 Lag S a m p le A u to c o rr e la ti o n Sample Autocorrelation Function i. Ljung-Box tests are applied to residuals t e testing the first 8 and 13 lags. The p-values from these tests are 0.50 and 0.76. Conduct these tests by listing the hypotheses and stating the conclusions. Do the results agree with what you see in the plots The null hypothesis is that the first 8 autocorrelations all equal 0, the alternative is that at least one of these is non-zero. I choose a significance level of 5%, or 0.05. The p-value from the Ljung-Box statistic is 0.50. Since 0.5 > 0.05, we cannot reject the null and conclude that the first 8 autocorrelation estimates are not significantly different from 0, as a group. The test is the same when testing 13 lags, except now the null is that the first 13 autocorrelations all equal 0. Since the p-value is 0.76 > 0.05, we cannot reject the null and conclude that the first 13 autocorrelation estimates are not significantly different from 0, as a group. This agrees with the ACF plot above which shows no clearly significant correlations in the first 20 lags (none of the correlations are outside the 95% intervals around 0 given in blue in the plot). ii. The squares of the residuals t e are also plot, as well as their ACF below iii. Ljung-Box tests are applied to the squares of the residuals t e testing the first 8 and 13 lags. The p-values from these tests are 0.002 and 0.018. Conduct these tests by listing the hypotheses and stating the conclusions. Do the results agree with what you see in the plots The null hypothesis is that the first 8 autocorrelations all equal 0, the alternative is that at least one of these is non-zero. I choose a significance level of 5%, or 0.05. The p-value from the Ljung-Box statistic is 0.002. Since 0.002 < 0.05, we can reject the null and conclude that at least one of the first 8 autocorrelation estimates is significantly different from 0. The test is the same when testing 13 lags, except now the null is that the first 13 autocorrelations all equal 0. Since the p-value is 0.018 < 0.05, we can reject the null and 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 5 10 15 20 0 2 4 6 8 10 12 14 16 18 20 -0.5 0 0.5 1 Lag S a m p le A u to c o rr e la ti o n Sample Autocorrelation Function conclude that at least one of the first 13 autocorrelation estimates is significantly different from 0. This agrees with the ACF plot above which shows that the first lag auto-correlation is significant and that perhaps the 5th is marginally significant too (these two correlations are just outside the 95% intervals around 0 given in blue in the plot). (c) A histogram of the residuals t e is shown below, together with a qq-plot i. Does the distributional assumption of this model seem appropriate We expect to see a Gaussian N(0,1) distribution in the histogram and the crosses appearing on the straight dashed line in the qqplot, if the distribution is the correct one. For a N(0,1), we expect most points to lie within (-3, 3), and in a very large sample a few to be close to - 4 and 4. This seems to be the case in the histogram. Indeed, the blue crosses all seem very close to being right on the dashed line, as expected for a N(0,1). -5 -4 -3 -2 -1 0 1 2 3 4 0 100 200 300 400 -4 -3 -2 -1 0 1 2 3 4 -5 0 5 Standard Normal Quantiles Q u a n ti le s o f In p u t S a m p le QQ Plot of Sample Data versus Standard Normal ii. The sample skewness and kurtosis are estimated as -0.066 and 3.018 respectively. A Jarque- Bera test is performed with p-value of 0.459. Conduct this test by listing the hypotheses and stating the conclusion. Does the distributional assumption of this model seem appropriate The null hypothesis is that the skewness equals 0 and the kurtosis equals 3, the alternative is that either or both of things are not true. The p-value of 0.459 shows the probability of getting sample skewness and kurtosis as far as from 0 and 3 as -0.066 and 3.018. Since 0.459 > 0.05, we cannot reject the null and conclude that these residuals could indeed follow a Gaussian N(0,1) distribution. (d) 1-step-ahead 5% and 1% forecast VaRs are estimated for each day’s return in the forecast sample (after day 4000) using these models, plus GJR-GARCH with Gaussian and Student-t errors, plus an IGARCH model with = 0.94 and a 100 day historical simulation method. The plot below shows 1-step-ahead 5% VaR forecasts for the last 433 days in the sample. i. Compare and explain the behaviour of the three GARCH-type model 5% VaR forecasts. Note the differences and similarities and try to explain why they have occurred. The three GARCH-type models all sit similarly on the “bottom shoulder” of the data in their 5% VaR forecasts. They each get more extreme (negative) following outlying or extreme returns, as their volatility estimates also will and mostly they are quite close to each other as a group. However, following highly extreme returns, the GARCH- 0 50 100 150 200 250 300 350 400 450 -8 -6 -4 -2 0 2 4 6 data1 GARCH-Gaussian GARCH-t IGARCH HS-100 day Gaussian and GARCH-t have immediately more extreme 5% VaR forecasts than the IGARCH, while the IGARCH takes longer to move back towards the data than these two models, so the IGARCH tends to be firstly less extreme for a few days, but then more extreme than these two for long periods (around 50 days) after an extreme return. Why is this The ARCH effects in the three models are 0.16, 0.14 and 0.06. GARCH- Gaussian and GARCH-t have much higher ARCH effects (0.16 and 0.14) and so react much more strongly to extreme or large returns, than the IGARCH. However, the IGARCH is non-stationary and not mean-reverting, while the other two are mean- reverting in volatility. Thus the GARCH-Gaussian and GARCH-t have to revert to a long-run average, while IGARCH does not, causing the latter to deviate from the data for longer following an extreme return. ii. Why do the 100 day historical simulation VaRs stay flat for long periods Following an extreme return, this return will be in the last 100 days for 100 days! i.e. for the next 100 days, the sample quantile will be dominated by this extreme return, and so stay flat. When the extreme return is 101 days ago, there will be a marked change in the 5% sample VaR estimate (unless another extreme has occurred in that 100 day period). (e) The table below shows number of violations and violation rates for each of the models above, from their 5% VaR forecasts: Model Violations 0.05 Significantly different to 0.05 GARCH- Gaussian 22 0.051 1.02 No GARCH-t 25 0.058 1.15 No GJR-Gaussian 22 0.051 1.02 No GJR-t 22 0.051 1.02 No IGARCH 20 0.046 0.92 No HS-100 day 23 0.053 1.06 No i. The 95% confidence interval for the violation rate is (0.0295, 0.0705), if the true rate was 0.05, in a sample of size 433. Fill in the last column of the table above. All sample violation rates are inside the 95% CI, meaning none can be rejected as having a violation rate different from the required 0.05. ii. Briefly discuss these results, compare the models performance and discuss why you believe each model has performed the way it has, regarding accuracy of forecasting 5% VaR in this data. All models pass the unconditional coverage test and have acceptable violation rates. Even still, three models GARCH-Gaussian, GJR-Gaussian and GJR-t are closest to 0.05 on this measure, while the GARCH-t is furthest away with 0.058, 15% too many violations. The IGARCH and HS-100 methods have performed pretty well on this aspect, about equally far away from 0.05. Note that the IGARCH is the only conservative risk model here, since it is only to have fewer violations than expected, at a rate of 0.046. (f) The table below shows p-value from the independence and Dynamic Quantile (DQ) tests applied to the violations from each model’s set of 1% VaR forecasts, as well as the criterion function loss values. Model Violations Independence DQ Loss GARCH-Gaussian 22 0.051 0.12 0.47 77.47 GARCH-t 25 0.058 0.68 0.50 77.20 GJR-Gaussian 22 0.051 0.12 0.49 77.37 GJR-t 22 0.051 0.12 0.54 76.67 IGARCH 20 0.046 0.94 0.60 75.86 HS-100 day 23 0.053 0.13 0.12 78.21 i. Briefly discuss these results, compare the models performance and discuss why you believe each model has performed the way it has, via these criteria. No models are rejected by the independence test, nor by the DQ test. All seem quite comparable and to have violations that are roughly independent over time and close to the expected 5%. Regarding loss functions, the best model will have the lowest loss value, and be the one that is “closest” to the unknown true 5% VaR values. The model that does best on this criterion is the IGARCH, followed closely by the GJR-t; while the worst is the HS-100. ii. Which model has performed the best Why Which has performed the worst Why The HS-100 has performed the worst by Loss function and has the lowest p-value for the DQ test. The GARCH-t has the violation rate furthest from 0.05, and also has the most violations and more than expected (0.058 and 25), which is not good for financial solvency. These two models have performed the worst. The best model seems to be between the GARCH-Gaussian, GJR-Gaussian, GJR-t and IGARCH. The first three did best by violation rate, but the IGARCH is conservative here, which is good for financial solvency (less violations than expected). The IGARCH does best by loss function, followed closely by the GJR-t. It seems the IGARCH models may be the best here. FORMULAS YOU MAY ASSUME AND USE Some formulas and information to assist you in the test Some formulas and information to assist you in the test The assumptions of OLS regression are: 1. The population residuals and the X variables are uncorrelated. In other words, | 0E X 2. The data sample are iid 3. The 4th moments of both Y and each X are finite, i.e. 4 4;E X E Y . This implies that the mean and variance of each of Y and each X are also finite. 4. The X variables, if there are more than 1, are not perfectly correlated with each other, and none is a perfect linear combination of the other. The assumptions of Factor analysis are: 1. The data sample for Y are iid. 2. The estimated factors F are iid, with mean 0 and variance 1. 3. The population residuals and the factor variables are uncorrelated. In other words, , 0jCov F for each factor Fj 4. The 4th moments of both Y and each factor F are finite, i.e. 4 4;E F E Y . This implies that the mean and variance of each of Y and each F are also finite. Omitted variables: (a) When X occurs in time before Y A variable Z is an omitted variable from the analysis of the relationship between two variables Y and X, under the following conditions: 1. The variable Z is not accounted for specifically in the analysis 2. The variable Z is correlated or associated with X 3. The variable Z is causal for Y (b) If two variables X and Y occur simultaneously, one cannot cause the other. However, there may be an omitted variable if: 1. The variable Z is not accounted for specifically in the analysis 2. The variable Z is correlated, associated with or causal for X and Y Value at Risk (VaR) The VaR is the minimum loss that could occur with probability level over a fixed time period. Expected Shortfall (ES) The ES is the average loss that could occur, for losses occurring with probability level over a fixed time period. GARCH models: The basic structure of a GARCH model has three components: 1. Mean equation: ; t t t t t tr a a 2. Volatility equation: e.g. ARCH(p) 2 20 1 p t i t i i a GARCH(1,1) 2 2 2 0 1 1 1 1t t ta GJR-GARCH 2 2 20 1 1 1 1 1 1 1 1 0, 0 1, 0 t t t t t t t I a a I a Risk-Metrics 2 2 21 1 1 11t t ta EGARCH 2 20 1 1 1 1 1log logt t t t tE 3. Conditional distribution: ~ (0,1) where 0 and 1 t t t D E Var e.g. Gaussian ~ (0,1)t N or standardised Student-t * 2 ~ (0,1) (0,1)t v v v t t v