1 point 1. Consider the following 2D dataset: Which of the following figures correspond to possible values that PCA may return for u(1) (the first eigenvector / first principal component)? Check all that apply (you may have to check more than one figure). 答案AB 1 point 2. Which of the following is a reasonable way to select the number of principal components k ? (Recall that n is the dimensionality of the input data and m is the number of input examples.) 答案D Choose the value of k that minimizes the approximation error 1m∑mi=1||x(i)−x(i)approx||2 . Choose k to be 99% of n (i.e., k=0.99∗n , rounded to the nearest integer). Choose k to be the smallest value so that at least 1% of the variance is retained. Choose k to be the smallest value so that at least 99% of the variance is retained. 1 point 3. Suppose someone tells you that they ran PCA in such a way that "95% of the variance was retained." What is an equivalent statement to this? 答案C 1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≥0.05 1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≥0.95 1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≤0.05 1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≤0.95 1 point 4. Which of the following statements are true? Check all that apply. 答案BD Given only z(i) and Ureduce , there is no way to reconstruct any reasonable approximation to x(i) . Given input data x∈Rn , it makes sense to run PCA only with values of k that satisfy k≤n . (In particular, running it with k=n is possible but not helpful, and k>n does not make sense.) PCA is susceptible to local optima; trying multiple random initializations may help. Even if all the input features are on very similar scales, we should still perform mean normalization (so that each feature has zero mean) before running PCA. 1 point 5. Which of the following are recommended applications of PCA? Select all that apply. 答案CD Preventing overfitting: Reduce the number of features (in a supervised learning problem), so that there are fewer parameters to learn. To get more features to feed into a learning algorithm. Data visualization: Reduce data to 2D (or 3D) so that it can be plotted. Data compression: Reduce the dimension of your data, so that it takes up less memory / disk space.