COMP90051

COMP90051 Statistical Machine Learning Practice Exam
The University of Melbourne
School of Computing and Information Systems
COMP90051
Statistical Machine Learning
2021 Semester 2 – Practice Exam
Identical examination papers: None
Exam duration: 120 minutes
Reading time: 15 minutes
Upload time: 30 minutes additional to exam + reading; upload via Canvas
Late submissions: -1 against final subject mark per minute late, starting 120+15+30 minutes after
exam start, up to 30 minutes late maximum. Late submissions permitted by provided OneDrive upload
link.
Length: This paper has 7 pages (real exam is longer) including this cover page.
Authorised materials: Lecture slides, workshop materials, prescribed reading, your own projects.
Calculators: permitted
Instructions to students: The total marks for the real exam is 120 (practice exam has fewer questions
and marks), corresponding to the number of minutes available. The mark will be scaled to compute your
final exam grade.
This paper has three parts, A-C. You should attempt all the questions.
This is an open book exam (see authorised materials above). You should enter your answers in a Word
document or PDF, which can include typed and/or hand-written answers. You should answer each ques-
tion on a separate page, i.e., start a new page for each of Questions 1–6 – parts within questions do not
need new pages. Write the question number clearly at the top of each page. You have unlimited attempts
to submit your answer during the course of the exam, but only your last submission is used for marking.
You must not use materials other than those authorised above. You should not use private tutor notes,
nor use materials off the Internet. You are not permitted to communicate with others for the duration of
the exam, other than to ask questions of the teaching staff via the Exam chat tool in Canvas (BigBlue-
Button). Your computer, phone and/or tablet should only be used to access the authorised materials,
enter or photograph your answers, and upload these files.
Library: This paper is to be lodged with the Baillieu Library.
page 1 of 7 Continued overleaf . . .
COMP90051 Statistical Machine Learning Practice Exam
COMP90051 Statistical Machine Learning
Practice Exam
Semester 2, 2021
Total marks: 120 in real exam based on more questions; 90 in this practice exam
Students must attempt all questions
Section A: Short Answer Questions [25 marks]
Answer each of the questions in this section as briefly as possible. Expect to answer each question in 1-3
lines, with longer responses expected for the questions with higher marks.
Question 1: [25 marks]
(a) In words or a mathematical expression, what quantity is minimised by linear regression [5 marks]
Acceptable: The residual sum of errors
Acceptable: The mean-squared error
Acceptable:
∑n
i=1(yi y i)2 (terms are true and estimated labels) or this times a constant
(b) In words or a mathematical expression, what is the marginal likelihood for a Bayesian probabilistic
model [5 marks]
Acceptable: the joint likelihood of the data and prior, after marginalising out the model parameters
Acceptable: p(x) =

p(x|θ)p(θ)dθ where x is the data, θ the model parameter(s), and p(x|θ) the
likelihood and p(θ) the prior
Acceptable: the expected likelihood of the data, under the prior
(c) In words, what does Pr(A,B | C) = Pr(A | C) Pr(B | C) say about the dependence of A,B,C
[5 marks]
Acceptable: A and B are conditionally independent given C.
(d) What are the free parameters of a Gaussian mixture model What algorithm is used to fit them for
maximum likelihood estimation [10 marks]
Acceptable 5 mark: For a Gaussian mixture with k components the parameters are probabilities for
(k 1) components, a mean vectors for each of the k components, and a symmetric positive-definite
covariance matrix for each of the k components.
Acceptable 5 mark: The EM algorithm is appropriate for maximum likelihood estimates.
page 2 of 7 Continued overleaf . . .
COMP90051 Statistical Machine Learning Practice Exam
Section B: Method & Calculation Questions [45 marks]
In this section you are asked to demonstrate your conceptual understanding of methods that we have
studied in this subject, and your ability to perform numeric and mathematical calculations. NOTE: in
the real exam, a small number of questions from this section will be a bit harder/longer than others.
Question 2: [10 marks]
(a) Consider a 2-dimensional dataset , where each point is represented by two features and the label
(x1, x2, y). The features are binary, the label is the result of XOR function, and so the data consists
of four points (0, 0, 0), (0, 1, 1), (1, 0, 1) and (1, 1, 0). Design a feature space transformation that
would make the data linearly separable. [5 marks]
Acceptable: new feature space (x3), where x3 = (x1 x2)2
(b) How does SVM handle data that is not linearly separable List two possible strategies [5 marks]
Acceptable 2.5 marks: using kernels to transform the data.
Acceptable 2.5 marks: using soft-margin SVM to relax the constraints.
Acceptable 2.5 marks: using both soft-margin SVM and kernels.
Question 3: [10 marks]
Consider the data shown below with hard-margin linear SVM decision boundary shown between the
classes. The right half is classified as red squares and the left half is classified as blue circles. Answer the
following questions and explain your answers.
(a) Which points (by index 1–6) would be the support vectors of the SVM [5 marks]
Acceptable: Points 1, 2, 4 would all be support vectors as they all lie on the margin.
(b) What is the value of the hard margin SVM loss for point 3 [5 marks]
Acceptable: zero, since the point is on the right side of the boundary and is outside the margin.
page 3 of 7 Continued overleaf . . .
COMP90051 Statistical Machine Learning Practice Exam
Question 4: [15 marks]
Consider the following directed PGM
where each random variable is Boolean-valued (True or False).
(a) Write the format (with empty values) of the conditional probability tables for this graph. [5 marks]
————
Pr(A=True)
————