机器学习系列之coursera week 8 Unsupervised Learning

最新推荐文章于 2020-01-09 15:36:55 发布

爱战术的码农新人

最新推荐文章于 2020-01-09 15:36:55 发布

阅读量227

点赞数

本文链接：https://blog.csdn.net/zyh3826/article/details/82724753

版权

1. Clustering

1.1 Unsupervised Learning: Introduction

Supervised Learning:

fig 1

(引自coursera week 8 Unsupervised Learning: Introduction)

note：对于一组有标记的训练数据，提出一个适当的假设

Unsupervised Learning:

fig 2

(引自coursera week 8 Unsupervised Learning: Introduction)

Application of clustering:

- Market segementation

- Social network analysis

- Organize computing clusters

- Astronamical data analysis

1.2 K-Means Algorithm

fig 3

(引自coursera week 8 K-Means Algorithm)

K-Means Algorithm:

input:

- K(#clusters)

- training set

Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
		for i=1 to m
			c(i) = index (from 1 to K) of clusters centroid closest to x(i)
		for k=1 to K
			u(k) = mean of points assigned to cluster K
	}

note: 当某个u_i没有被分配样本时直接移除它

K-means for non-separated clusters:

fig 4

(引自coursera week 8 K-Means Algorithm)

1.3 Optimization Objective

K-means optimization objective:

c(i) = index of cluster(1, 2, ..., K) to which example x(i) is currently assigned.

u(k) = cluster centroid k

u_c(i) = cluster centroid of cluster to which example x(i) has been assigned.

Randomly iniatialize K cluster centroids u1, u2, ..., uk belong to R^n
Repeat{
		for i=1 to m
			c(i) = index (from 1 to K) of clusters centroid closest to x(i)
        minimize J W.R.T u1, ..., uk
		for k=1 to K
			u(k) = mean of points assigned to cluster K
        minimize J W.R.T c1, ..., cm
	}

1.4 Random initialization

Should hace K < m

Randomly pick K training examples

set u1, ..., uk equal to these K examples

Local optima:

fig 5

(引自coursera week 8 Random initialization)

try rand initialization uk much more times

For i = 1 to 100 {

(1) Random initialize K-means

(2) ran K-means.Get c(i), u(i)

(3) Comput J

}

pick clustering that gave lowest J

1.5 Choosing the number of clusters

Choosing the value of K:

Elbow method:

fig 6

(引自coursera week 8 Choosing the number of clusters)

Sometimes, you're running K-means to get clusters to use for some later / downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.

选择K更好的方法是了解你聚类的目的，然后思考聚类的数目是多少才适合你运行K-means的后续目的。

2. Motivation of Dimensionality Reduction

2.1 Motivation I: Data Compression

fig 7

(引自coursera week 8 Motivation I: Data Compression)

2.2 Motivation II: Data visualization

3. Principal Component Analysis

3.1 PCA problem formulation

PCA problem formulation:

fig 8

(引自coursera week 8 PCA problem formulation)

PCA实际上是找到一个低维平面，使得所有的样本点到其上的距离最短。或者说找到一组向量，将原始数据投影到这组向量所展开的子空间中.

note: 在进行PCA之前进行归一化和特征规范化

Reduce from 2D to 1D: find a drection (a vector u(i)) onto which to project the data so as to minimize the projection error.

u(i) or -u(i) is OK

Reduce from n-D to k-D: find k vector u(1), ...., u(k) onto which to project the data, so as to min the projection error.

PCA is not linear regression:

fig 9

(引自coursera week 8 PCA problem formulation)

left: linear regression, right: PCA

3.2 PCA algorithm

Data preprocessing:

Training set: x(i), x(2), ..., x(m)

preprocessing(feature scaling / mean normalization)

PCA algorithm:

reduce data from n-D to k-D

(1) compute "covariance matrix":

(2) Compute "eigenvectors"

[U, S, V] = svd(sigma)

U为n*n矩阵，列向量为eigen vectors

去U的前k个列向量组成新的矩阵U_reduce来投影数据

(3) Z(i) = U_reduce^T * x(i)

4. Applying PCA

4.1 Recinstructor from compressed representation

Z(i) = U_reduce^T * x(i)

x(i)_approx = U_reduce * Z(i)

4.2 Choosing the number of Principal Component: K value

Average squared projection error:

Total variation in the data:

Typically choose k to be smallest value so that:

-----> 99% of variance is retained(or 90%)

Algorithm:

fig 10

(引自coursera week 8 Choosing the number of Principal Component: K value)

pick the smallest value k.

可作为表征PCA性能的量

4.3 Advice for applying PCA

note: U_reduce是在训练集上找出来的，然后在应用到CV or test set 上

Application of PCA:

- Compression: reduce memory and speed up learning algorithm

- Visualization

Bad use or PCA: to prevent overfitting, use z instead of x to reduce the number of features to k < n, thus fewer feature, less likely to overfit.

****use regularization instead****

PCA is sometimes used where it shouldn't be:

Design ML system:

- Get training set

- Run PCA, x ----> z

- train LR on (z(i), y(i))

-Test in test: map x to z

Before implementing PCA, first try running whatever you want to do with the orignal/raw data. On if it doesn't do what you want, then implement PCA.

爱战术的码农新人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习系列之coursera week 8 Unsupervised Learning

目录1. Clustering1.1 Unsupervised Learning: Introduction1.2 K-Means Algorithm1.3 Optimization Objective1.4 Random initialization1.5 Choosing the number of clusters2. Motivation of Dimensio...
复制链接

扫一扫