Homework Assignment 7 COGS 118A: Supervised Machine Learning Algorithms Due: May 20th, 2022, 11:59pm (Pacific Time). Instructions: Your math can be handwritten. In that case use a scanner app to take photos of the pages and turn them into a PDF. Adobe, Evernote, Microscoft, and many others offer free scanner apps. Alternatively, your math may be done in LaTeX or in a Jupyter Notebook using LaTeX markdown and from there turned into PDF. All of these different sources must be stitched together into a single PDF file. PDF Merge tool (e.g. smallpdf, comebinepdf) can do this among many other tools. Make sure to show the steps of your solution, not just the final answer. Late Policy: You will receive 75 percent credit MAX for any late assignments. Late submissions will not be accepted after 1 week. Grade: out of 63 points 1 (12 points) Conceptual Questions 1. In the Figure 1, we use a black line to mark the decision boundary of a linear SVM classifier parameterized by weight vector w and bias b. If we start to increase the margin of the classifier, what would happen to w 7SCR Computing the margin wi th Figure 1: A linear SVM classifier with data points. A. The magnitude of w does not change. B. w will have smaller magnitude. 1 C. w will have greater magnitude. D. None of above. 2. Which one below best describes what support vectors are: A. The decision boundary. B. The positive and the negative planes. C. The training samples that determine the positive and the negative planes. D. The test samples that determine the positive and the negative planes. 3. Consider a dataset S = {(xi, yi), i = 1, 2, …, n} where x = [x1, x2] and y ∈ { 1,+1}. Here we try to learn a linear SVM classifier parameterized by weight vector w and bias b to fit the dataset S. Which one is the appropriate loss function for the linear SVM classifier from the options below: A. L(w, b) = 2√ w w C ×∑ni=1max (0, 1 yi × (w xi + b)) B. L(w, b) = 1 2 w w + C ×∑ni=1max (0, 1 yi × (w xi + b)) C. L(w, b) = 2√ w w + C ×∑ni=1max (0, 1 yi × (w xi + b)) D. L(w, b) = 1 2 w w C ×∑ni=1max (0, 1 yi × (w xi + b)) 2 (16 points) K Nearest Neighbors Consider a training dataset Straining = {(xi, yi), i = 1, 2, …, 8} where each data point (x, y) has a feature vector x = [x1, x2] and the corresponding label y ∈ { 1,+1}. The points with the corresponding labels in the dataset are shown in the figure below. In the figure above, the index i for each training example xi is given in bold near the point. You are asked to predict the label of a point xpred = [1, 0] shown in the figure as a triangle ▲ using k nearest neighbors (k-NN) method under Euclidean distance. 1. (4 points) List the x vectors of all the data points in Straining and their corre- sponding labels. 2. (4 points) Determine the predicted label for xpred using the k-NN with different k. (a) k = 1. (b) k = 5. 3. (8 points) For KNNs and SVMs: (a) Describe the differences between in the decision boundaries of each algo- rithm. (b) When might an SVM be more appropriate than a KNN Visa Versa 3 (15 points) Support Vector Machine Consider a dataset {(xi, yi), i = 1, 2, …, 10} where x = [x1, x2] and y ∈ { 1,+1}, which is shown in the Figure 2 below. Suppose we have trained a support vector machine (SVM) classifier on the dataset, which has the decision boundary : w x+ b = 0. The SVM classifier is optimized as following: Find: argmin w 1 2 ||w||22 Subject to, i :w xi + b ≥ +1, if yi = +1, w xi + b ≤ 1, if yi = 1. We define + : w x + b = +1 as the positive plane and : w x + b = 1 as the negative plane. After training the SVM classifier using the gradient descent method for several iterations to reach an intermediate state, we have obtained the positive plane and the negative plane, which are indicated as solid lines in the Figure 2 below. Figure 2: The positive plane, the negative plane and the data points. 1. (2 points) Draw the decision boundary of the SVM classifier in Figure 2. 2. (8 points) Calculate the w and b from the positive plane and negative plane. Hint: The positive plane passes through x4 = (2, 2), x8 = (3, 4). The negative plane passes through x3 = (2, 0), x7 = (3, 2). 3. (5 points) Calculate for size of the margin based on your w. Hint: Margin can be obtained from 2||w||2 . 4 (10 points) SVM: Gradient Given a training dataset Straining = {(xi, yi)}, i = 1, . . . , n}, we wish to optimize the loss L(w, b) of a linear SVM classifier: L(w, b) = 1 2 ||w||22 + C n∑ i=1 ( 1 yi(wTxi + b) ) + (1) where (z)+ = max(0, z) is called the rectifier function and C is a scalar constant. The optimal weight vector w and the bias b used to build the SVM classifier are defined as follows: w , b = argmin w,b L(w, b) (2) In this problem, we attempt to obtain the optimal parameters w and b by using a standard gradient descent algorithm. Hint: To find the derivative of L(w, b), please consider two cases: (a) 1 yi(wTxi + b) ≥ 0, (b) 1 yi(wTxi + b) < 0 1. find the derivative: L(w, b) w . 2. find the derivative: L(w, b) b . 5 (10 points) SVM: Margin As shawn in the Figure 3, we have the decision boundary (marked as a black line) defined as Eq. 3 given w, b: wTx+ b = 0 (3) In parallel to the decision boundary, we have the positive plane (marked as a red line) defined as Eq. 4 and the negative plane (marked as a blue line) defined as Eq. 5: wTx+ b = +1 (4) wTx+ b = 1 (5) We pick an arbitrary point x on the negative plane, and draw a purple line that passes x and is perpendicular to the negative plane. The intersection between this purple line and the positive plane can be denoted as x+. Thus, we have the following Eq. 6 that indicates the relation between x+ and x : x+ = x + λw (6) where λ ∈ R is an undetermined scalar. The margin M is defined as the distance between the positive and the negative planes, which can be calculated from Eq. 7: M = ||x+ x ||2 = √ < λw, λw > (7) 7SCR Computing the margin width Figure 3: The decision boundary, the positive plane and the negative plane. Please derive the following according to Eq. 7: M = 2√ < w,w > Hint: You can firstly represent λ in the form of w by using Eq. 4, 5, 6.