Machine Learning 07 - Unsupervised Learning

最新推荐文章于 2024-08-28 10:34:03 发布

能智工人

最新推荐文章于 2024-08-28 10:34:03 发布

阅读量208

点赞数 1

分类专栏：机器学习文章标签：机器学习人工智能 standFord

本文链接：https://blog.csdn.net/ddragon1/article/details/79604017

版权

机器学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

正在学习Stanford吴恩达的机器学习课程，常做笔记，以便复习巩固。
鄙人才疏学浅，如有错漏与想法，还请多包涵，指点迷津。

7.1 Clustering

7.1.1 K-means algorithm

Intuition

K-means algorithm has two steps :

Cluster assignment
Move centroid step

The algorithm illustrations is show in the picture below :

steps

Symbols

$c^{(i)}$ : index of cluster ( $1,2,\dots K$ ) to which example $x^{(i)}$ is currently assigned
$\mu_{k}$ : cluster centroid $k \; (\mu_{k} \in \mathbb{R}^{n})$
$\mu_{c^{(i)}}$ : cluster centroid of cluster to which example $x^{(i)}$ has been assigned

Optimization objective

min c (1), \dots, c (m), μ 1, \dots, μ K J (c (1), \dots, c (m), μ 1, \dots, μ K) = 1 m \sum i = 1 m ∥ ∥ x (i) - μ c (i) ∥ ∥ 2

$\underset{c^{(1)},\cdots ,c^{(m)}, \mu _{1},\cdots ,\mu _{K}}{\text{min}} \; J(c^{(1)},\cdots ,c^{(m)},\mu _{1},\cdots ,\mu _{K})=\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-\mu _{c^{(i)}} \right \|^{2}$

K-means algorithm - Algorithm 4

Randomly initialize $K$ cluster centroids $\mu_{1}, \mu_{2}, \dots, \mu_{K} \in \mathbb{R}^{n}$
Repeat{
$\quad$ for $i=1$ to $m$
$\qquad$ $c^{(i)}:=$ index (from $1$ to $K$ ) of cluster centroid closest to $x^{(i)}$
$\quad$ for $k$ 1 to $K$
$\qquad$ $\mu _{k}:=$ average (mean) of points assigned to cluster $k$
$\quad$ }

7.1.2 Important tricks

We randomly choose the $K$ cluster centroids, and different case result in different optimal solution, which may cause local optimal.

For example :

local optimal

Random Initialization

For $i=1$ to $100$ {
$\quad$ Randomly initialize K-means.
$\quad$ Run K-means. Get $c^{(1)},\cdots ,c^{(m)},\mu _{1},\cdots ,\mu _{K}$ .
$\quad$ Compute cost function (distortion)
$\qquad$ $J(c^{(1)},\cdots ,c^{(m)},\mu _{1},\cdots ,\mu _{K})$
$\quad$ }
Pick clustering that gave lowest cost $J(c^{(1)},\cdots ,c^{(m)},\mu _{1},\cdots ,\mu _{K})$ .

For $k=2 \; to \;10$ , random initialization behave well, when $k$ is large, it is easy to get a good solution at a time.

Number of Clusters

Choosing the number of clusters is a matter of option. It is often based on experience.

One way to try (but not always effective) is Elbow method, draw the $J-K$ figure, and choose K.

Sometimes, K-means is used for some later/downstream purpose. Evaluate K-means based on metric for how well it performs for that later purpose.

7.2 Dimensionality Reduction

7.2.1 Intuition

The intuition from 2D to 1D and from 3D to 2D is showed below :

Application : Data Compress, Data Visualization …

7.2.2 Principal Component Analysis

Reduce from $n$ -dimension to $k$ -dimension, what the PCA do is :

Find $k$ vectors $u^{(1)},u^{(2)},\cdots u^{(k)}\in \mathbb{R}^{n}$ onto which to project the data so as to minimaze the projection error.

Principal Component Analysis - Algorithm 5

Preprocessing “feature scaling” / “mean normalization” (ensure zero mean)
Calculate the covariance matrix :
$\quad \Sigma = \frac{1}{m}\sum_{i=1}^{m}(x^{(i)})(x^{(i)})^{T}$ (mark Sigma = $\Sigma$ )
Do the single value decomposition :
$\quad$ [U, S, V] = svd(Sigma);
$\quad$ Ureduce = U(:, 1 : k);
$\quad$ z = Ureduce’ * x;

Reconstruction from Compressed Representation

$x (i) = U reduce z (i), i = 1, 2, \dots, m$ $x^{(i)} = U_{\text{reduce}}z^{(i)}, \; i=1,2,\cdots, m$

7.2.3 Choose the $k$

Here the $k$ (dimension of $z$ ) is also call number of principal components.

Typically, choose $k$ to be smallest value so that

$1 m \sum m i = 1 ∥ ∥ x ( i ) - x ( i ) a p p r o x ∥ ∥ 2 1 m \sum m i = 1 ∥ ∥ x ( i ) ∥ ∥ 2 \leq 0.01$ $\frac{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x_{approx}^{(i)} \right \|^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2}}\leq 0.01$

The number $0.01$ indicates that $99\%$ of variance is retained.

An easier way to calculate is showed below :

Choose the $k$ - Algorithm 6

[U, S, V] = svd(Sigma)
Pick smallest value of $k$ for which

$\sum k i = 1 S i i \sum m i = 1 S i i \geq 0.99$ $\frac{\sum_{i=1}^{k}S_{ii}}{\sum_{i=1}^{m}S_{ii}}\geq 0.99$

7.2.4 Advice for applying PCA

Supervisied learning speedup

Given a dataset : $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots (x^{(m)},y^{(m)})$ , $\; x^{(i)} \in {R}^{n}$

Extract inputs and get unlabeled dataset.
Apply PCA algorithm.
Get new training set.

Finally, we get new training set : $(z^{(1)},y^{(1)}),(z^{(2)},y^{(2)}),\cdots (z^{(m)},y^{(m)})$ , $\; z^{(i)} \in {R}^{k}$

Note :

Mapping $x^{(i)}\rightarrow z^{(i)}$ should be defined by running PCA only the training set.
This mapping can be applied to cross validation and test sets.

Bad use of PCA : To prevent overfitting

That is : use $z^{(i)}$ instead of $x^{(i)}$ to reduce the number of features to $k<n$

Reason : PCA will throw away some valuable information.

Consider machine learning without PCA first

Before implementing PCA, first try running whatever to get with the raw/original data. Only if that doesn’t do idealy, them implement PCA.

能智工人

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning 07 - Unsupervised Learning

正在学习Stanford吴恩达的机器学习课程，常做笔记，以便复习巩固。鄙人才疏学浅，如有错漏与想法，还请多包涵，指点迷津。7.1 Clustering7.1.1 K-means algorithmIntuitionK-means algorithm has two steps : Cluster assignmentMove centroid stepT...
复制链接

扫一扫