目录
1. Clustering
1.1 Unsupervised Learning: Introduction
1.2 K-Means Algorithm
1.3 Optimization Objective
1.4 Random initialization
1.5 Choosing the number of clusters
2. Motivation of Dimensionality Reduction
2.1 Motivation I: Data Compression
2.2 Motivation II: Data visualization
3. Principal Component Analysis
3.1 PCA problem formulation
3.2 PCA algorithm
4. Applying PCA
4.1 Recinstructor from compressed representation
4.2 Choosing the number of Principal Component: K value
4.3 Advice for applying PCA
1. Clustering
1.1 Unsupervised Learning: Introduction
Supervised Learning:
fig 1
(引自coursera week 8 Unsupervised Learning: Introduction)
note:对于一组有标记的训练数据,提出一个适当的假设
Unsupervised Learning:
fig 2
(引自coursera week 8 Unsupervised Learning: Introduction)
Application of clustering:
- Market segementation
- Social network analysis
- Organize computing clusters
- Astronamical data analysis
1.2 K-Means Algorithm
fig 3
(引自coursera week 8 K-Means Algorithm)
K-Means Algorithm:
input:
- K(#clusters)
- training set
Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
for i=1 to m
c(i) = index (from 1 to K) of clusters centroid closest to x(i)
for k=1 to K
u(k) = mean of points assigned to cluster K
}
note: 当某个u_i没有被分配样本时直接移除它
K-means for non-separated clusters:
fig 4
(引自coursera week 8 K-Means Algorithm)
1.3 Optimization Objective
K-means optimization objective:
c(i) = index of cluster(1, 2, ..., K) to which example x(i) is currently assigned.
u(k) = cluster centroid k
u_c(i) = cluster centroid of cluster to which example x(i) has been assigned.
Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
for i=1 to m
c(i) = index (from 1 to K) of clusters centroid closest to x(i)
minimize J W.R.T u1, ..., uk
for k=1 to K
u(k) = mean of points assigned to cluster K
minimize J W.R.T c1, ..., cm
}
1.4 Random initialization
Should hace K < m
Randomly pick K training examples
set u1, ..., uk equal to these K examples
Local optima:
fig 5
(引自coursera week 8 Random initialization)
try rand initialization uk much more times
For i = 1 to 100 {
(1) Random initialize K-means
(2) ran K-means.Get c(i), u(i)
(3) Comput J
}
pick clustering that gave lowest J
1.5 Choosing the number of clusters
Choosing the value of K:
Elbow method:
fig 6
(引自coursera week 8 Choosing the number of clusters)
Sometimes, you're running K-means to get clusters to use for some later / downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.
选择K更好的方法是了解你聚类的目的,然后思考聚类的数目是多少才适合你运行K-means的后续目的。
2. Motivation of Dimensionality Reduction
2.1 Motivation I: Data Compression
fig 7
(引自coursera week 8 Motivation I: Data Compression)
2.2 Motivation II: Data visualization
3. Principal Component Analysis
3.1 PCA problem formulation
PCA problem formulation:
fig 8
(引自coursera week 8 PCA problem formulation)
PCA实际上是找到一个低维平面,使得所有的样本点到其上的距离最短。或者说找到一组向量,将原始数据投影到这组向量所展开的子空间中.
note: 在进行PCA之前进行归一化和特征规范化
Reduce from 2D to 1D: find a drection (a vector u(i)) onto which to project the data so as to minimize the projection error.
u(i) or -u(i) is OK
Reduce from n-D to k-D: find k vector u(1), ...., u(k) onto which to project the data, so as to min the projection error.
PCA is not linear regression:
fig 9
(引自coursera week 8 PCA problem formulation)
left: linear regression, right: PCA
3.2 PCA algorithm
Data preprocessing:
Training set: x(i), x(2), ..., x(m)
preprocessing(feature scaling / mean normalization)
PCA algorithm:
reduce data from n-D to k-D
(1) compute "covariance matrix":
(2) Compute "eigenvectors"
[U, S, V] = svd(sigma)
U为n*n矩阵,列向量为eigen vectors
去U的前k个列向量组成新的矩阵U_reduce来投影数据
(3) Z(i) = U_reduce^T * x(i)
4. Applying PCA
4.1 Recinstructor from compressed representation
Z(i) = U_reduce^T * x(i)
x(i)_approx = U_reduce * Z(i)
4.2 Choosing the number of Principal Component: K value
Average squared projection error:
Total variation in the data:
Typically choose k to be smallest value so that:
-----> 99% of variance is retained(or 90%)
Algorithm:
fig 10
(引自coursera week 8 Choosing the number of Principal Component: K value)
pick the smallest value k.
可作为表征PCA性能的量
4.3 Advice for applying PCA
note: U_reduce是在训练集上找出来的,然后在应用到CV or test set 上
Application of PCA:
- Compression: reduce memory and speed up learning algorithm
- Visualization
Bad use or PCA: to prevent overfitting, use z instead of x to reduce the number of features to k < n, thus fewer feature, less likely to overfit.
****use regularization instead****
PCA is sometimes used where it shouldn't be:
Design ML system:
- Get training set
- Run PCA, x ----> z
- train LR on (z(i), y(i))
-Test in test: map x to z
Before implementing PCA, first try running whatever you want to do with the orignal/raw data. On if it doesn't do what you want, then implement PCA.