吴恩达·Machine Learning || chap14 Dimensionality Reduction简记

最新推荐文章于 2024-09-16 16:56:48 发布

The Prestige

最新推荐文章于 2024-09-16 16:56:48 发布

阅读量89

点赞数

分类专栏： Machine Learning 文章标签：机器学习

本文链接：https://blog.csdn.net/qq_46203130/article/details/120206327

版权

Machine Learning 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

本文介绍了主成分分析PCA的基本原理和应用，包括数据压缩、减少内存需求和加快学习算法速度。PCA通过找到数据的主要方向（特征向量）来降低数据维度，用于数据可视化和特征选择，防止过拟合。同时强调不应过度依赖PCA解决过拟合问题，应优先考虑使用正则化。此外，PCA在设计机器学习系统中也有重要作用，但应谨慎使用，先尝试原始数据进行处理，若效果不佳再引入PCA。

摘要由CSDN通过智能技术生成

14-1 Motivation I:Data Compression

Data Compression

Reduce data from 2D to 1D:project line $x_1,x_2\longrightarrow z_1$

Reduce data from 3D to 2D:project plane $x_1,x_2,x_3\longrightarrow z_1,z_2$

14-2 Motivation I:Data Visualization

Data Visualization

14-3 Principal Component Analysis problem formulation

PCA 主成分分析

Principal Component Analysis (PCA) problem formulation

Reduce from 2-dimension to 1-dimension: Find a direction (a vector $u^{(i)}\in \mathbb{R}^n$ ) onto which to project the data so as to minimize the projection error

Reduce from n-dimension to k-dimension: Find k vectors $u^{(1)},u^{(2)},\cdots,u^{(k)}$ onto which to project the data, so as to minimize the projection error.

PCA is not linear regression

linear regression: distance y x $\longrightarrow$ y

PCA: distance $x_1,x_2,\cdots$

14-4 Principal Component Analysis algorithm

Data preprocessing

Training set: $x^{(i)},x^{(2)},\cdots,x^{(m)}$

Preprocessing (feature scaling/mean normalization):

$\mu_j=\frac{1}{m}\sum^m_{i=1}x_j^{(i)}$

Replace each $x_j^{(i)}$ with $x_j-\mu_j$ .

If different features on different scales(e.g., $x_1$ =size of house, $x_2$ =number of bedrooms), scale features to have comparable range of values.

Principal Component Analysis (PCA) algorithm

Reduce data from n-dimensions to k-dimensions
Compute “covariance matrix”
$\sum=\frac{1}{m}\sum^{n}_{i=1}(x^{(i)})(x^{(i)})^T$
Compute "eigenvectors"of matrix $\sum$ :

[U,S,V]=svd(Sigma); %or eig(sigma)

Sigma: $n\times n$ matrix

From [U,S,V]=svd(Sigma),we get:

$U_{reduce}$ ： $U=[u^{(1)},u^{(2)},u^{(3)},\cdots,u^{(n)}]\in\mathbb{R}^{n\times n}$

在这里插入图片描述

After mean normalization(ensure every feature has zero mean )and optionally feature scaling:

$Sigma=\frac{1}{m}\sum^{n}_{i=1}(x^{(i)})(x^{(i)})^T$

$[U, S, V] = s v d (S i g m a);$

$U r e d u c e = U (:, 1 : k);$

$z = U r e d u c e^{'} * x$

14-5 Choosing the number of principal components

Choosing k (number of principal components)
在这里插入图片描述

在这里插入图片描述

14-6 Reconstruction from compressed representation

Reconstruction from compressed representation

$z=U_{reduce}^Tx\quad x_{approx}=U_{reduce}.z$

14-7 Advice for applying PCA

Supervise learning speedup

$\cdots , ( x ^ { ( m ) } , y ^ { ( m ) } )$

Extract inputs:

Unlabeled dataset: $\cdots , x ^ { ( m ) }\in\mathbb{R}^{10000}\longrightarrow z ^ { ( 1 ) } , z ^ { ( 2 ) } , \cdots , z ^ { ( m ) }\in\mathbb{R}^{1000}$

New training set:

$\cdots , ( z ^ { ( m ) } , y ^ { ( m ) } )$

Note: Mapping $x^{(i)}\rightarrow z^{(i)}$ should be defined by running PCA only on the training set. This mapping can be applied as well to the examples $x_{cv}^{(i)}$ and $x_{test}^{(i)}$ in the cross validation and test sets.

Application of PCA

Compression
- Reduce memory/disk needed to store data
- Speed up learning algorithm
Visualization

Bad use of PCA: To prevent overfitting

Use : $z^{(i)}$ instead of $x^{(i)}$ to reduce the number of features to $k < n$
Thus, fewer features, less likely to overfit

$\color{red}\large{\times}$

This might work OK, but isnt a good way to address overfitting. Use regularization instead

$\underset{\theta}{min} \frac { 1 } { 2 m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } )^2+ \frac { \lambda } { 2 m } \sum _ { j = 1 } ^ { n } \theta _ { j } ^ { 2 }$

PCA is sometimes used where it shouldn’t be

Design of ML system:

Get training set $\{ ( x ^ { ( 1 ) } , y ^ { ( 1 ) } ) , ( x ^ { ( 2 ) } , y ^ { ( 2 ) } ) , \cdots , ( x ^ { ( m ) } , y ^ { ( m ) } ) \}$
Run PCA to reduce $x^{(i)}$ in dimension to get $z^{(i)}$
Train logistic regression on $\{ ( z ^ { ( 1 ) } , y ^ { ( 1 ) } ) , \cdots , ( z ^ { ( m ) } , y ^ { ( m ) } ) \}$
Test on test set:Map $x_{test}^{(i)}$ to $z_{test}^{(i)}$ .Run $h_\theta(z)$ on $\{(z_{test}^{(1)},y_{test}^{(1)},\cdots,(z_{test}^{(m)},y_{test}^{(m)})\}$

How about doing the whole thing without using PCA?

Before implementing PCA, first try running whatever you want to do with the original /raw data $x^{(i)}$ . Only if that doesn’t do what you want, then implement PCA and consider using $z^{(i)}$