Python-ML1 - Father Essays

2022/3/9 10:43Written Exam A: Attempt review 第1/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905 Dashboard ! My courses ! [WiSe 21/22] ML1 ! General ! Written Exam A Information Question 1 Complete Marked out of 5.00 Question 2 Complete Marked out of 5.00 Question 3 Complete Marked out of 5.00 Question 4 Started on Wednesday, 9 March 2022, 8:43 AM State Finished Completed on Wednesday, 9 March 2022, 10:43 AM Time taken 2 hours Eigenst ndigkeitserkl rung / Declaration of Independence By proceeding further, you confirm that you are completing the exam alone, without other resources than those that are authorized. Authorized resources include the ML1 course material, personal notes, online APIs documentation, and calculation/plotting tools. Which of the following is True: In explainable machine learning, Shapley values: a. can be computed in the order of a single forward/backward pass. b. requires an exponential number of function evaluations to be computed. c. requires O(d) function evaluations, where d is the number of input dimensions. d. is a self-explainable model that must be trained alongside the actual model of interest. Which of the following is True: Layer-wise relevance propagation (LRP) is a method for explainable AI that: a. can be applied to any black-box machine learning model. b. assumes that the machine learning model has a neural network (or computational graph) structure. c. requires O(d) function evaluations, where d is the number of input dimensions, in order to produce an explanation. d. can be applied to any black-box model, with the only condition that the gradient w.r.t. the input features can be computed. Which of the following is True: In the context of model selection, risk of overfitting is particularly high when: a. The model is a low-bias estimator. b. The model is a high-bias estimator. c. The model is a low-variance estimator. d. The model is a high-variance estimator. Which of the following is True: In the soft-margin SVM, the parameter C controls: “search + press enter# [WiSe 21/22] ML1 Participants meet@ISIS Dashboard All courses Calendar My courses $ E-PrKl EA WiSe 2022 $ KL EA WS21 $ [WiSe 21/22] PyML $ PR EA WS 21/22 $ [WiSe 21/22] ML1 $ VL EA WS 21/22 $ [WS21/22] Robotics $ DIP [WS21/22] $ MEST WS2122 $ [WS21/22] Python Einführung $ ProjekteMDT $ FaMe 21/22 $ MEST WS2021 ! More… Grades 2022/3/9 10:43Written Exam A: Attempt review 第2/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905 Complete Marked out of 5.00 Information Question 5 Complete Marked out of 10.00 Question 6 Complete Marked out of 5.00 Information a. How nonlinear the margin is allowed to be. b. To which extent the training points can lie inside (or on the wrong side of) the margin. c. How far from the origin the margin is allowed to be. d. To which extent the training points can lie on the wrong side of the decision boundary. Assume you would like to build a neural network that implements some decision boundary in Rd. For this, you have at your disposal neurons of the type aj = sign(∑i aiwij + bj) where ∑ i sums over the indices of the incoming neurons. The sign function returns +1 when its input is positive, -1 when its input is negative, and 0 when its input is zero. Denote by a1 and a2 the two input neurons (initialized to the value x1 and x2 respectively). Denote by a3, a4, a5 the hidden neurons, and by a6 the output neuron. Give the weights and biases associated to a neural network with the structure above and that implements the function depicted below: w13=1 w23=0 b3=1 w14=1 w24=1 b4=0 w15=0 w25=1 b5=1 w36=1 w46=1 w56=1 b6 is greater than -1 and less than 0 Give the derivative of the function implemented by your network w.r.t. the parameter w13 when the function is evaluated at (x1,x2) = (3, 0). 0 A kernel k : Rd × Rd → R is positive semi-definite (PSD) if for any sequence of N data points x1, … ,xN ∈ Rd and real-valued scalars c1, … , cN , the following inequality holds: N ∑ i=1 N ∑ j=1 cicjk(xi,xj) ≥ 0 Furthermore, if a kernel is PSD and symmetric, there exists a feature map : Rd → H associated to this kernel satisfying for all pairs (x,x′) the equality k(x,x′) = (x), (x′) . 2022/3/9 10:43Written Exam A: Attempt review 第3/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905 Question 7 Not answered Marked out of 10.00 Question 8 Not answered Marked out of 5.00 Question 9 Not answered Marked out of 5.00 Information Question 10 Complete Marked out of 5.00 k(x,x ) = (x), (x ) . Consider the kernel k(x,x′) = sin(∥x∥) sin(∥x′∥) + x,x′ Rewrite the expression ∑ i ∑ j cicjk(xi,xj) as a sum of positive terms, e.g. squared terms. (Note: you just need to write the final form, no need for the intermediate steps.) Without explicitly computing the feature map (x), express the distance to the origin in feature space, i.e. compute ∥ (x) 0∥. Give a possible feature map associated to the kernel k above, for the case where x ∈ R2. Consider some parameter θ ∈ R and an estimator θ^ of that parameter, based on observations. It is common to analyze such estimator using a bias-variance analysis: Bias(θ^) = E[θ^ θ] Var(θ^) = E[(θ^ E[θ^ ])2] Error(θ^) = E[(θ^ θ)2] = Bias(θ^)2 + Var(θ^). Let X1, … ,XN ∈ R be a sample drawn i.i.d. from a univariate Gaussian distribution with mean μ and variance σ2. We would like to build an estimator of the (unknown) parameter μ from those observations, and analyze its properties. Assume N ≥ 3 and consider the estimator μ^ = + + of the parameter μ. Give its bias. X1 2 X2 4 X3 4 2022/3/9 10:43Written Exam A: Attempt review 第4/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905 Question 11 Complete Marked out of 5.00 Question 12 Complete Marked out of 5.00 Question 13 Complete Marked out of 5.00 Information frac{-3}{4}mu (-3/4)mu Give its variance. 1/16*variance/N Give its error. frac/sigma^(2)-3 mu}{4} Consider a new data point XN+1 drawn from the same distribution, and consider the new estimator μ^ (new) = μ^ + XN+1 Express the bias of this new estimator as a function of Bias(μ^). EVeft/frac(N)N+1} times frac(1)4)Neft(1+x_(1}+x_(3)+x_(5/right)+frac(1)(N + 1} x_(N+ 1}-muright] N N + 1 1 N + 1 In this exercise, we would like to implement the maximum-likelihood method to estimate the best parameter of a data density model p(x|θ) with respect to some dataset D = (x1, … ,xN), and use that approach to build a classifier. Assuming the data is generated independently and identically distributed (iid.), the dataset likelihood is given by p(D|θ) = N∏ k=1 p(xk|θ) and the maximum likelihood solution is then computed as θ^ = arg max θ p(D|θ) = arg max θ log p(D|θ) where the log term can also be expressed as a sum, i.e. log p(D|θ) = N∑ k=1 log p(xk|θ). 2022/3/9 10:43Written Exam A: Attempt review 第5/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905 Question 14 Not answered Marked out of 10.00 Question 15 Not answered Marked out of 10.00 Question 16 Not answered Marked out of 5.00 In your implementation, when possible, you should make use of numpy vector operations in order to avoid loops. Consider x ∈ R2 and the probability model p(x|θ) = exp( ∥x θ∥1) where θ ∈ R2. Write a function that takes as input some dataset D given as a numpy array of size N × 2, and some array of parameters THETA of size K × 2. The function should return an array of size K containing the corresponding log-probability scores. 1 4 1 2 Write a procedure that finds using grid search the parameters (θ1, θ2) that are optimal in the maximum likelihood sense for a given dataset D. The parameters should be searched for on the interval [ 5, 5]. Furthermore, parameters should be constrained to satisfy ∥θ2 + θ1∥ < 5. Explain in one sentence the problem you would face if applying this grid search procedure when the model has a similar structure but more than two parameters (e.g. 10 parameters or more). Explain in one sentence how the problem can be addressed for the given class of models. 2022/3/9 10:43Written Exam A: Attempt review 第6/6 https://isis.tu-berlin.de/mod/quiz/review.php attempt=2437839&cmid=1292905