Python-S20

STA142B, S20
Prof. W. Polonik
Project 3
due Thursday, May 20, 2021
A. Data analysis: Perform the data analysis projects as instructed in the file Project3.ipynb
that you can find in ’files’ on Canvas.
B. Methodology: Answer the following questions, and submit your answers through
Canvas.
1. Let X1, . . . , Xn be an iid sample from a distribution F with density F
′ = f , and consider
the KDE with a uniform kernel:
f n(x) =
1
nh
n∑
i=1
K
(Xi x
h
)
,
where K(t) = 1[ 1/2,1/2](t), and h is called the bandwidth. (Note that K(t) is the density
of the U([ 1
2
, 1
2
]) distribution.) Let
ph = F
(
x +
h
2
)
F
(
x h
2
)
.
(a) Show that
E
(
f n(x)
)
=
ph
h
, and Var
(
f n(x)
)
=
1
nh2
ph(1 ph).
(b) Show that, for each x, if h→ 0, then
bias
(
f n(x)
)→ 0.
(c) Show that if nh→∞ then
Var
(
f n(x)
)→ 0.
Comment on the interpretation of the above properties of the KDE: Ideally we would like
to have both bias and variance of an estimator to be small. Now, parts (b) and (c) tell
us that we need to choose h ‘small’ for the bias of the KDE to be small, while (c) tells
us that h cannot be too small for the variance to be small. So we need to strike a good
balance when choosing h.
2. An Erdo¨s-Renyi graph is constructed as follows. Given are n vertices. Edges are being
added randomly (independently) with probability p ∈ (0, 1) between pairs of vertices.
(a) Find the expected degree of a given vertex.
(b) Find the variance of the degree of a given vertex.
(c) Find the expected number of edges of the graph.
3. We still consider an Erdo¨s-Renyi graph. A triangle in the graph consists of a triple of
edges {i, j}, {j, k}, {k, i} with i 6= k 6= i.
(a) Find the expected number of triangles in the graph.
(b) Extra credit: Find an expression for the variance of the number of triangles in
the graph.