Coursework: Statistics CIVE50008 Deadline: 21/03/2022 – online submission of a single Matlab script The objective of this coursework is to provide a statistical analysis of measurements of temperature and rainfall. Understanding of both variables is important for hydrological applications (in particular drought management and flood estimation): while the rainfall is the direct input to hydrological models, temperature is used to estimate actual evaporation. The dataset comprises of 5 variables in an .xlsx file called: “dataset.xlsx”. The day, month and year of the measurements can be found in columns A, B, and C, respectively. Column D contains the temperature measurements (in degrees Celsius) and Column F contains the rainfall measurements (in mm). All these measurements correspond to values recorded at 8pm in geographically close locations. Use the template file called: “Surname_FirstName_CIDnumber.m” to perform the statistical analysis. Rename this template using your information to submit. All your lines of Matlab code should be gathered into one Matlab script so that when it is run, the answers appear in the command window, and the required figures are generated. Indicate the question you are answering using a comment line before the code addressing the question. If you need to obtain answers using a Matlab GUI please put the answers in comment lines. When you are asked to comment on a result, please write your answer in up to five comment lines. Coursework tasks 1. Import the data and create an array (X) that contains three columns: Month, temperature and precipitation on the days when measurements are available. There are days with missing measurements; these are indicated by – 99.0 in the data set. If one measurement is missing on one day, all measurements on that day should be ignored. The recorded measurements will be assumed, for the rest of the coursework, to be genuine i.e. not erroneous. 2. Populate an array Y with the same information as X but concerning the days with non-zero rainfall. 3. Calculate the measures of mean, mode and median for the temperature and rainfall data (both for the full-series and non-zero series). 4. Calculate the variance, standard deviation, coefficient of variation, mean absolute deviation of all variables. Briefly compare and comment on the observed dispersion and the differences between the two series. 5. Plot the Cumulative Distribution Function (CDF) for each variable (full-series). 6. Produce a plot that indicates if the temperature and rainfall data (full-series) are skewed. Briefly describe how this is shown in the plot. 7. Explore appropriate distributions to approximate the temperature and rainfall data in the full series using q-q plots. 8. Produce histograms with variable number of bins for the temperature data (full series), and the non-zero rainfall data. Use 5, 10, 20 and 30 bins and comment on the differences you observe based on the bin size. 9. Consider the non-zero rainfall series. Fit (continuous) distributions to the temperature and rainfall data and investigate which one best provides the best fit. Produce the relevant figures and use an appropriate metric to justify your selection of the best-fit distribution. Which is the simplest distribution appropriate to each variable 10. For the best-fit distributions you identified in (9), use both the Method of Moments and the Maximum Likelihood Method to obtain the parameters of each distribution (one distribution for each variable). Comment on the results. 11. Using the 2 distributions obtained in (10) and their parameters, generate an appropriate amount of synthetic data series. Use these synthetic data to explore the validity of the law of large numbers and the central limit theorem. 12. Comment on the corelation of the two variables (temperature – rainfall). Compare your findings for the full-series and the non-zero rainfall series. 13. Physical knowledge of the processes suggests that temperature should resemble a Normal distribution and non-zero rainfall should follow a Gamma distribution. Assume that this is indeed the case and that the two variables are independent. Construct a bivariate distribution to describe the joint PDF of temperature and non-zero rainfall. Create a surface plot of the joint PDF using appropriate bounds and parameters. 14. Give a 90% confidence interval for the population mean and a 90% confidence interval for the population standard deviation for the temperature data (full series). 15. Test with 1% significance the hypothesis that the normal distribution is a good fit for the temperatures (full series). Perform the test using two different methods and indicate the main difference between the two approaches. 16. Test with 90% confidence the hypothesis that daily August rainfall depths that are larger than 1 mm are from the same population as daily March rainfall depths that are larger than 1mm, by doing a test comparing the means. Comment on the test and its results.