Andrew Ng Machine Learning 第八周
前言
网易云课堂(双语字幕,不卡):https://study.163.com/course/courseMain.htm?courseId=1004570029
Coursera:https://www.coursera.org/learn/machine-learning
本人初学者,先在网易云课堂上看网课,再去Coursera上做作业,开博客以记录,文章中引用图片皆为课程中所截。
无监督学习
1.模型描述
Tips:即不含任何标签的训练集
2.K-Means算法
Tips:做法即(1)随机选择k的个数(2)随机选择k个训练集中的点作为聚类中心(3)将离第i个聚类中心最近的m个点归为一组(4)将各组的m个点的位置取平均点,作为新的k的聚类中心(5)根据代价函数J,最小化代价函数(6)重复第2步(7)重复第1步
降维
1.目标Ⅰ:数据压缩
2.可视化
Tips:将多维数据二维化或者三维化使得可以用计算机绘制出可视图
3.主成分分析方法(PCA)
(1)模型分析和数据处理
Tips:在实现PCA之前先对数据进行均值标准化或者特征缩放,寻找k条向量(k=1即为一条直线,k=2即为一个平面)使得各点到该向量距离最短
(2)计算步骤
1.计算协方差
Tips:Σ(sigma)为n×n矩阵
2.SVD
Tips:U为n×n矩阵
3.Ureduce
Tips:Ureduce为n×k矩阵
4.z(i)
Tips:z(i)=Ureduce.transpose×x(i),z为k维向量
(3)主成分数量选择(即k维)
Tips:分子为投影误差(该点到对应投影点的距离)平方的平均值,分母为各点对应向量长度的平均值
Tips:k维即:对角线上从左上至右下k个元素的和除以对角线全部元素的和满足≥0.99的情况
(4)应用PCA的建议
Tips:将监督学习的训练集分离,将x降维成z,用新的z和y来计算新的监督学习,以加速学习算法
Tips:使用PCA之前先用裸数据跑算法
题目
1.Question 1
For which of the following tasks might K-means clustering be a suitable algorithm? Select all that apply.
解答:AB
2.Question 2
解答:C
3.Question 3
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
解答:BC
4.Question 4
Suppose you have an unlabeled dataset {x(1),…x(m)} You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data. What is the recommended way for choosing which one of these 50 clusterings to use?
解答:A
5.Question 5
Which of the following statements are true? Select all that apply.
解答:BC
6.Question 6
Consider the following 2D dataset:
Which of the following figures correspond to possible values that PCA may return for u(1) (the first eigenvector / first principal component)? Check all that apply (you may have to check more than one figure).
解答:BD
7.Question 7
Which of the following is a reasonable way to select the number of principal components kk?
(Recall that nn is the dimensionality of the input data and mm is the number of input examples.)
解答:A
8.Question 8
Suppose someone tells you that they ran PCA in such a way that “95% of the variance was retained.” What is an equivalent statement to this?
解答:D
9.Question 9
Which of the following statements are true? Check all that apply.
解答:AB
10.Question 10
Which of the following are recommended applications of PCA? Select all that apply.
解答:AB