R-PSTAT 105-Assignment 6

PSTAT 105: Assignment 6 Due Friday, February 25, 2022
Please complete these questions and submit the answers on GauchoSpace. Your calculations should be done
in R, and your results should be typed up to include R input and output as well as clear explanations for
your answers.
Please use the Kernsmooth library and its locpoly function to calculate the nonparametric regression
estimators.
1. The file andro2.txt contains the record of biological signal taken on a mouse subject. We are going
to model this data as noisy measurements around an unknown mean function.
(a) Esimate the value of the mean function at time 200 by first averaging the 35 points closest to that
time, and then average the 105 points closest to time 200. Give 95% confidence intervals for each
of these estimates using a t statistic and estimating the variance from the data used in the mean.
(b) In measuring the performance of a nonparametric regression estimator, we are looking at the
variance and the bias of the estimate. Which of these two is not accounted for in your 95%
confidence intervals from part (a) Explain.
(c) Plot the data with time as the x axis and signal as the y axis. Calculate the Nadaraya-Watson
kernel estimate using a Gaussian kernel, and plot the resulting estimate of a smooth mean function.
Use a bandwdith that you think works well.
(d) Make a new scatter plot and then add a local linear regression estimate. Choose a new bandwidth
that fits the data well.
(e) The scientists were looking at these measurements for evidence of a sequence of peaks in the levels
happening regularly over time. Use your estimates of the mean functions to locate the peaks in
the data stream, and calculate the average amount of time between the peaks.
2. The data set GermProd.txt contains monthly measurements of industrial production in Germany.
(a) Use a Nadaraya–Watson estimator to fit a smooth mean to this data. Use a bandwidth of 24.
Plot your resulting estimate over a scatter plot of the original data series.
(b) Describe the issue that you are having at the boundary.
(c) Refit the data using a local-linear model using a similar bandwidth. Produce a plot which com-
pares the estimator from the local-linear estimator and the N–W estimator.
(d) How does this adjust for the boundary problem
3. The approval ratings data for George W. Bush are available on GauchoSpace. We would like to estimate
his approval rating on June 1, 2005 and October 1, 2001.
(a) Plot the approval ratings and a local-linear regression estimator using a reasonable bandwidth.
(b) Use the results from part (a) to find the estimated value on June 1, 2005.
(c) Fit a new Nadaraya–Watson kernel regression estimator with the same bandwidth, and compare
the estimate on October 1, 2001 from the kernel and local-linear estimates.
(d) Plot the kernel regression estimator over the scatter plot of the approval ratings.
(e) What effect does the discontinuity in this data at Sept 11, 2001 have on our estimation procedures
Suggest a way to mitigate that effect.