Office Use Only
Semester One 2019
Examination Period
Faculty of Information Technology
EXAM CODES: FIT3181
TITLE OF PAPER: Final Examination Paper for Unit FIT3181 Deep Learning
EXAM DURATION: 2 hours writing time
READING TIME: 10 minutes
THIS PAPER IS FOR STUDENTS STUDYING AT: (tick where applicable)
□ Caulfield □ Clayton □ Parkville □ Peninsula
□Monash Extension □ Off Campus Learning □ Malaysia □ Sth Africa
□ Other (specify)
During an exam, you must not have in your possession any item/material that has not been authorised for
your exam. This includes books, notes, paper, electronic device/s, mobile phone, smart watch/device,
calculator, pencil case, or writing on any part of your body. Any authorised items are listed below.
Items/materials on your desk, chair, in your clothing or otherwise on your person will be deemed to be in
your possession.
No examination materials are to be removed from the room. This includes retaining, copying, memorising
or noting down content of exam material for personal use or to share with any other person by any means
following your exam.
Failure to comply with the above instructions, or attempting to cheat or cheating in an exam is a discipline
offence under Part 7 of the Monash University (Council) Regulations, or a breach of instructions under Part
3 of the Monash University (Academic Board) Regulations.
AUTHORISED MATERIALS
OPEN BOOK □ YES □ NO
CALCULATORS □ YES □ NO
SPECIFICALLY PERMITTED ITEMS □ YES □ NO
if yes, items permitted are:
Candidates must complete this section if required to write answers within this paper
STUDENT ID: __ __ __ __ __ __ __ __ DESK NUMBER: __ __ __ __ __
FIT3181 Deep Learning Final Exam Paper, T1, 2019 | Page 1 of 9
PART A: Multiple-Choice Questions
● This part contains 8 multiple-choice questions
● The total number of marks for this part is 15
● For multiple-choice question, you must select all applicable answers to receive the full
mark for that question.
Question A1 [1 mark]
In a CNN, unlike the convolutional layer, the pooling layer has no learnable parameters
a) True
b) False
Solution: (a)
Question A2 [3 marks]
Consider a machine learning problem to detect breast cancer from a dataset consisting of
mammograms and medical data collected from a cohort of patients. The task is to predict whether
a patient has breast cancer. The following table summarizes the confusion matrix on the test
dataset. Select all applicable answers below:
True Labels
Pre
dic
ted
Cla
ss
CANCER (1) NORMAL (0)
CANCER (1) 9 10
NORMAL (0) 1 90
a) The total number of instances is 110 with 10 labelled as CANCER and 100 labelled as
NORMAL
b) The true positive rate (TPR) is 9/10 = 90%
c) This test dataset is a balanced dataset
d) It is not possible to calculate the AUC because we don’t have the ROC curve which requires
the performance information at different level of thresholds
Solution: (a), (b), (d)
Question A3 [2 marks]
When training a deep neural network (DNN), which of the following statements are applicable
a) One can use a Stochastic Gradient method known as the Back Propagation algorithm to
optimize the loss function
b) The Back Propagation method is robust against overfitting, hence likely to always produce a
good global optimal solution
c) The learning rate is an important parameter when training a DNN
d) With TensorFlow, a simple way to detect the problem of gradient vanishing is to draw the
histograms of the gradients and visually inspect them in TensorBoard
Solution: (a), (c), (d)
FIT3181 Deep Learning Final Exam Paper, T1, 2019 | Page 2 of 9
Question A4 [2 marks]
Factors which have driven recent success in deep learning include:
a) Deep learning models are generally very flexible and powerful
b) Modern advancements in hardware have enabled fast distributed and parallel computation
c) The recent release of iPhone X has enabled centralized computation on a single device
d) The availability of massive scale modern datasets has enabled computer scientists to train
powerful machine learning models
Solution: (a), (b), (d)
Question A5 [2 marks]
Which of the following statement are true regarding the Gradient Descent (GD) method when
applied to minimize the objective function J(w):
a) With appropriate learning rate, GD guarantees to converge to a global minimum if J(w) is a
convex function
b) With appropriate learning rate, GD always guarantees to converge to some local minimum
if J(w) is a nonconvex function
c) GD updates the parameter in the opposite direction of the current gradient
d) GD is a second-order optimization method
e) GD runs much faster than Stochastic GD for large datasets
Solution: (a), (c)
Question A6 [2 marks]
With respect to the Pooling layer in a CNN, which of the following statements are true
a) It operates at the combination of multiple activation maps to produce a dependent output
b) It reduces the resolution of the image, hence provides a better computational efficiency
c) Max-pooling is locally invariant in the sense that input numbers within a local filter window
can be shuffled without changing the final output
d) Output tensor of a max-pooling layer will always have the same depth with the input tensor
Solution: (b), (c), (d)
Question A7 [1 mark]
Consider a text modelling problem where a corpus of texts is given and contains two words ‘king’
and ‘queen’. If these two words are to be represented as one-hot encoding vector, what is the
possible value computed for their cosine similarity
a) 0
b) 0.25
c) 0.5
d) 1.0
e) None of above
Solution: (a)
Question A8 [2 marks]
FIT3181 Deep Learning Final Exam Paper, T1, 2019 | Page 3 of 9
You were given a corpus of texts which is assumed to be sufficiently large to learn semantic
meaning of words. After applying word2vec embedding on this corpus, each word had been
associated with a vector. As you have learned from the lecture, this now allows you to perform
analogical reasoning. Which answer do you expect if we reason “mom – dad = – man”
a) child
b) girl
c) mother
d) woman
Solution: (d)
PART B: Short Workout Questions
● This part contains 6 workout questions
● The total number of marks for this part is 35
Question B1 [3 marks]
Draw the computational graph for the function , , ( ) = (2 + 2 + )
Solution:
Question B2 [5 marks]
Generative Adversarial Networks (GAN) is a deep generative model that uses a generator
to simulate data by mapping a random through . Given a training dataset
θ
( ) ~