机器学习系列之coursera week 8 Unsupervised Learning

目录

1. Clustering

1.1 Unsupervised Learning: Introduction

1.2 K-Means Algorithm

1.3 Optimization Objective

1.4 Random initialization

1.5 Choosing the number of clusters

2. Motivation of Dimensionality Reduction

2.1 Motivation I: Data Compression

2.2 Motivation II: Data visualization

3. Principal Component Analysis

3.1 PCA problem formulation

3.2 PCA algorithm

4. Applying PCA

4.1 Recinstructor from compressed representation

4.2 Choosing the number of Principal Component: K value

4.3 Advice for applying PCA


1. Clustering

1.1 Unsupervised Learning: Introduction

Supervised Learning:

fig 1

(引自coursera week 8 Unsupervised Learning: Introduction)

note:对于一组有标记的训练数据,提出一个适当的假设

Unsupervised Learning:

fig 2

(引自coursera week 8 Unsupervised Learning: Introduction)

Application of clustering:

- Market segementation

- Social network analysis

- Organize computing clusters

- Astronamical data analysis

1.2 K-Means Algorithm

fig 3

(引自coursera week 8 K-Means Algorithm)

K-Means Algorithm:

input:

- K(#clusters)

- training set

Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
		for i=1 to m
			c(i) = index (from 1 to K) of clusters centroid closest to x(i)
		for k=1 to K
			u(k) = mean of points assigned to cluster K
	}

note: 当某个u_i没有被分配样本时直接移除它

K-means for non-separated clusters:

fig 4

(引自coursera week 8 K-Means Algorithm)

1.3 Optimization Objective

K-means optimization objective:

c(i) = index of cluster(1, 2, ..., K) to which example x(i) is currently assigned.

u(k) = cluster centroid k

u_c(i) = cluster centroid of cluster to which example x(i) has been assigned.

Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
		for i=1 to m
			c(i) = index (from 1 to K) of clusters centroid closest to x(i)
        minimize J W.R.T u1, ..., uk
		for k=1 to K
			u(k) = mean of points assigned to cluster K
        minimize J W.R.T c1, ..., cm
	}

1.4 Random initialization

Should hace K < m

Randomly pick K training examples

set u1, ..., uk equal to these K examples

Local optima:

fig 5

(引自coursera week 8 Random initialization)

try rand initialization uk much more times

For i = 1 to 100 {

      (1) Random initialize  K-means

      (2) ran K-means.Get c(i), u(i)

      (3) Comput J

}

pick clustering that gave lowest J

1.5 Choosing the number of clusters

Choosing the value of K:

Elbow method:

fig 6

(引自coursera week 8 Choosing the number of clusters)

Sometimes, you're running K-means to get clusters to use for some later / downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.

选择K更好的方法是了解你聚类的目的,然后思考聚类的数目是多少才适合你运行K-means的后续目的。

2. Motivation of Dimensionality Reduction

2.1 Motivation I: Data Compression

fig 7

(引自coursera week 8 Motivation I: Data Compression)

2.2 Motivation II: Data visualization

 

3. Principal Component Analysis

3.1 PCA problem formulation

PCA problem formulation:

fig 8

(引自coursera week 8 PCA problem formulation)

PCA实际上是找到一个低维平面,使得所有的样本点到其上的距离最短。或者说找到一组向量,将原始数据投影到这组向量所展开的子空间中.

note: 在进行PCA之前进行归一化和特征规范化

Reduce from 2D to 1D: find a drection (a vector u(i)) onto which to project the data so as to minimize the projection error.

u(i) or -u(i) is OK

Reduce from n-D to k-D: find k vector u(1), ...., u(k) onto which to project the data, so as to min the projection error.

 PCA is not linear regression:

fig 9

(引自coursera week 8 PCA problem formulation)

left: linear regression, right: PCA

3.2 PCA algorithm

Data preprocessing:

Training set: x(i), x(2), ..., x(m)

preprocessing(feature scaling / mean normalization)

PCA algorithm:

reduce data from n-D to k-D

(1) compute "covariance matrix":

(2) Compute "eigenvectors"

[U, S, V] = svd(sigma)

U为n*n矩阵,列向量为eigen vectors

去U的前k个列向量组成新的矩阵U_reduce来投影数据

(3) Z(i) = U_reduce^T * x(i)

 

4. Applying PCA

4.1 Recinstructor from compressed representation

Z(i) = U_reduce^T * x(i)

x(i)_approx = U_reduce * Z(i)

4.2 Choosing the number of Principal Component: K value

Average squared projection error:

Total variation in the data:

Typically choose k to be smallest value so that:

-----> 99% of variance is retained(or 90%)

Algorithm:

fig 10

(引自coursera week 8 Choosing the number of Principal Component: K value)

pick the smallest value k.

可作为表征PCA性能的量

4.3 Advice for applying PCA

note: U_reduce是在训练集上找出来的,然后在应用到CV or test set 上

Application of PCA:

- Compression: reduce memory and speed up learning algorithm

- Visualization

Bad use or PCA: to prevent overfitting, use z instead of x to reduce the number of features to k < n, thus fewer feature, less likely to overfit.

****use regularization instead****

PCA is sometimes used where it shouldn't be: 

Design ML system:

- Get training set

- Run PCA, x ----> z

- train LR on (z(i), y(i))

-Test in test: map x to z

Before implementing PCA, first try running whatever you want to do with the orignal/raw data. On if it doesn't do what you want, then implement PCA.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值