MTH6101: Introduction to Machine Learning Main Examination period 2020

Java Python Main Examination period 2020 – May – Semester B

MTH6101: Introduction to Machine Learning

Question 1 [10 marks].

(a) Describe the problem of dimensionality reduction in unsupervised learning. [6]

(b) List two techniques for this problem. [4]

Question 2 [29 marks]. As part of Karhunen-Loeve expansion of the covariance matrix of a centered data set X with n = 100 observations in p = 5 variables, the following matrix was computed.

(a) Complete the following table and determine a number of components using an 80% threshold. [12]

(b) Using the matrix Λ above, determine if the data was scaled to compute the covariance matrix and briefly explain why. [4]

(c) Write (do not derive) the formula that links Λ with D. Recall that Λ is the eigenvalue matrix of the Karhunen-Loeve decomposition of the covariance matrix Σ; and that D is the matrix of eigenvalues of the singular value decomposition of matrix X. [6]

(d) Use the formula you wrote to determine numerically the eigenvalues di of the singular value decomposition of the data matrix X. [7]

Question 3 [20 marks].

(a) Explain what is meant by single linkage in agglomerative clustering. [3]

(b) Consider the following distance matrix

where row and columns are indexed as usual by individuals.

(i) If agglomerative single linkage clustering were to be performed, which individuals would be merged first and why? [4]

(ii) Explain why in the first step the result is the same regardless of the linkage used. [3]

(iii) Assume you are at a step in agglomerative clustering in which individuals 1,2,3 belong to one cluster and indivi MTH6101: Introduction to Machine Learning Main Examination period 2020 duals 4,5 belong to another cluster. Using single linkage, find the distance between these two clusters. [5]

(iv) Using average linkage, give the distance between clusters in Question (biii). [5]

Question 4 [23 marks]. The following data are the results of a classification analysis. The output includes the validation output Ytrue and the classifications obtained with three trained classification algorithms termed Y1, Y2 and Y3.

## Ytrue Y1 Y2 Y3

## [1,] 1 1 0 1

## [2,] 0 0 1 0

## [3,] 1 1 0 0

## [4,] 0 0 1 1

## [5,] 0 1 1 0

## [6,] 0 0 1 1

## [7,] 0 0 0 0

## [8,] 0 0 1 0

## [9,] 0 0 0 0

## [10,] 1 1 0 0

## [11,] 1 1 0 1

## [12,] 1 1 0 0

(a) Complete the following confusion matrices. [9]

(b) Compute the False Positive Rate (FPR) and True Positive Rate (TPR) for each confusion matrix, completing in the table below. [6]

(c) Plot your results in the ROC graph below and briefly comment on the performance of classifiers. Which is the best classifier? [8]

Question 5 [18 marks].

(a) The Lasso criterion is L = 2/1||Y − Xβ||2/2 + λ||β||1. Explain what the components of the Lasso criterion are. [3]

(b) Explain what are the solutions to lasso as λ → 0. Also as λ → ∞. [2]

(c) The following table contains output from a lasso fit to model with d = 3 variables and n = 20 observations. For each row in the table, compute s, the proportion of shrinkage defined as s = s(λ) = ||β(λ)||1/ maxλ ||β(λ)||1 and write its value in the correct position to complete the table. [6]

(d) Using your completed information, add the lasso paths to the following plot. In your plot, label each path according to its corresponding variable         

  • 5
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值