吴恩达·Machine Learning || chap14 Dimensionality Reduction简记

本文介绍了主成分分析PCA的基本原理和应用,包括数据压缩、减少内存需求和加快学习算法速度。PCA通过找到数据的主要方向(特征向量)来降低数据维度,用于数据可视化和特征选择,防止过拟合。同时强调不应过度依赖PCA解决过拟合问题,应优先考虑使用正则化。此外,PCA在设计机器学习系统中也有重要作用,但应谨慎使用,先尝试原始数据进行处理,若效果不佳再引入PCA。
摘要由CSDN通过智能技术生成

14-1 Motivation I:Data Compression

Data Compression

Reduce data from 2D to 1D:project line x 1 , x 2 ⟶ z 1 x_1,x_2\longrightarrow z_1 x1,x2z1

Reduce data from 3D to 2D:project plane x 1 , x 2 , x 3 ⟶ z 1 , z 2 x_1,x_2,x_3\longrightarrow z_1,z_2 x1,x2,x3z1,z2

14-2 Motivation I:Data Visualization

Data Visualization

14-3 Principal Component Analysis problem formulation

PCA 主成分分析

Principal Component Analysis (PCA) problem formulation

Reduce from 2-dimension to 1-dimension: Find a direction (a vector u ( i ) ∈ R n u^{(i)}\in \mathbb{R}^n u(i)Rn) onto which to project the data so as to minimize the projection error

Reduce from n-dimension to k-dimension: Find k vectors u ( 1 ) , u ( 2 ) , ⋯   , u ( k ) u^{(1)},u^{(2)},\cdots,u^{(k)} u(1),u(2),,u(k) onto which to project the data, so as to minimize the projection error.

PCA is not linear regression

linear regression: distance y x ⟶ \longrightarrow y

PCA: distance x 1 , x 2 , ⋯ x_1,x_2,\cdots x1,x2,

14-4 Principal Component Analysis algorithm

Data preprocessing

Training set: x ( i ) , x ( 2 ) , ⋯   , x ( m ) x^{(i)},x^{(2)},\cdots,x^{(m)} x(i),x(2),,x(m)

Preprocessing (feature scaling/mean normalization):

μ j = 1 m ∑ i = 1 m x j ( i ) \mu_j=\frac{1}{m}\sum^m_{i=1}x_j^{(i)} μj=m1i=1mxj(i)

​ Replace each x j ( i ) x_j^{(i)} xj(i) with x j − μ j x_j-\mu_j xjμj.

If different features on different scales(e.g., x 1 x_1 x1=size of house, x 2 x_2 x2 =number of bedrooms), scale features to have comparable range of values.

Principal Component Analysis (PCA) algorithm

Reduce data from n-dimensions to k-dimensions
Compute “covariance matrix”
∑ = 1 m ∑ i = 1 n ( x ( i ) ) ( x ( i ) ) T \sum=\frac{1}{m}\sum^{n}_{i=1}(x^{(i)})(x^{(i)})^T =m1i=1n(x(i))(x(i))T
Compute "eigenvectors"of matrix ∑ \sum :

[U,S,V]=svd(Sigma); %or eig(sigma)

Sigma: n × n n\times n n×n matrix

From [U,S,V]=svd(Sigma),we get:

U r e d u c e U_{reduce} Ureduce U = [ u ( 1 ) , u ( 2 ) , u ( 3 ) , ⋯   , u ( n ) ] ∈ R n × n U=[u^{(1)},u^{(2)},u^{(3)},\cdots,u^{(n)}]\in\mathbb{R}^{n\times n} U=[u(1),u(2),u(3),,u(n)]Rn×n

在这里插入图片描述


After mean normalization(ensure every feature has zero mean )and optionally feature scaling:

S i g m a = 1 m ∑ i = 1 n ( x ( i ) ) ( x ( i ) ) T Sigma=\frac{1}{m}\sum^{n}_{i=1}(x^{(i)})(x^{(i)})^T Sigma=m1i=1n(x(i))(x(i))T

[ U , S , V ] = s v d ( S i g m a ) ; [U,S,V]=svd(Sigma); [U,S,V]=svd(Sigma);

U r e d u c e = U ( : , 1 : k ) ; Ureduce=U(:,1:k); Ureduce=U(:,1:k);

z = U r e d u c e ′ ∗ x z=Ureduce'*x z=Ureducex

14-5 Choosing the number of principal components

Choosing k (number of principal components)
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

14-6 Reconstruction from compressed representation

Reconstruction from compressed representation

z = U r e d u c e T x x a p p r o x = U r e d u c e . z z=U_{reduce}^Tx\quad x_{approx}=U_{reduce}.z z=UreduceTxxapprox=Ureduce.z

14-7 Advice for applying PCA

Supervise learning speedup

( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯   , ( x ( m ) , y ( m ) ) ( x ^ { ( 1 ) } , y ^ { ( 1 ) } ) , ( x ^ { ( 2 ) } , y ^ { ( 2 ) } ) , \cdots , ( x ^ { ( m ) } , y ^ { ( m ) } ) (x(1),y(1)),(x(2),y(2)),,(x(m),y(m))

Extract inputs:

​ Unlabeled dataset: x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) ∈ R 10000 ⟶ z ( 1 ) , z ( 2 ) , ⋯   , z ( m ) ∈ R 1000 x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( m ) }\in\mathbb{R}^{10000}\longrightarrow z ^ { ( 1 ) } , z ^ { ( 2 ) } , \cdots , z ^ { ( m ) }\in\mathbb{R}^{1000} x(1),x(2),,x(m)R10000z(1),z(2),,z(m)R1000

​ New training set:

( z ( 1 ) , y ( 1 ) ) , ( z ( 2 ) , y ( 2 ) ) , ⋯   , ( z ( m ) , y ( m ) ) ( z ^ { ( 1 ) } , y ^ { ( 1 ) } ) , ( z ^ { ( 2 ) } , y ^ { ( 2 ) } ) , \cdots , ( z ^ { ( m ) } , y ^ { ( m ) } ) (z(1),y(1)),(z(2),y(2)),,(z(m),y(m))

Note: Mapping x ( i ) → z ( i ) x^{(i)}\rightarrow z^{(i)} x(i)z(i) should be defined by running PCA only on the training set. This mapping can be applied as well to the examples x c v ( i ) x_{cv}^{(i)} xcv(i) and x t e s t ( i ) x_{test}^{(i)} xtest(i) in the cross validation and test sets.

Application of PCA

  • Compression
    • Reduce memory/disk needed to store data
    • Speed up learning algorithm
  • Visualization

Bad use of PCA: To prevent overfitting

Use : z ( i ) z^{(i)} z(i) instead of x ( i ) x^{(i)} x(i) to reduce the number of features to k < n k<n k<n
Thus, fewer features, less likely to overfit

× \color{red}\large{\times} ×

This might work OK, but isnt a good way to address overfitting. Use regularization instead

m i n θ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 1 n θ j 2 \underset{\theta}{min} \frac { 1 } { 2 m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } )^2+ \frac { \lambda } { 2 m } \sum _ { j = 1 } ^ { n } \theta _ { j } ^ { 2 } θmin2m1i=1m(hθ(x(i))y(i))2+2mλj=1nθj2

PCA is sometimes used where it shouldn’t be

Design of ML system:

  • Get training set { ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯   , ( x ( m ) , y ( m ) ) } \{ ( x ^ { ( 1 ) } , y ^ { ( 1 ) } ) , ( x ^ { ( 2 ) } , y ^ { ( 2 ) } ) , \cdots , ( x ^ { ( m ) } , y ^ { ( m ) } ) \} {(x(1),y(1)),(x(2),y(2)),,(x(m),y(m))}

  • Run PCA to reduce x ( i ) x^{(i)} x(i) in dimension to get z ( i ) z^{(i)} z(i)

  • Train logistic regression on { ( z ( 1 ) , y ( 1 ) ) , ⋯   , ( z ( m ) , y ( m ) ) } \{ ( z ^ { ( 1 ) } , y ^ { ( 1 ) } ) , \cdots , ( z ^ { ( m ) } , y ^ { ( m ) } ) \} {(z(1),y(1)),,(z(m),y(m))}

  • Test on test set:Map x t e s t ( i ) x_{test}^{(i)} xtest(i) to z t e s t ( i ) z_{test}^{(i)} ztest(i).Run h θ ( z ) h_\theta(z) hθ(z) on { ( z t e s t ( 1 ) , y t e s t ( 1 ) , ⋯   , ( z t e s t ( m ) , y t e s t ( m ) ) } \{(z_{test}^{(1)},y_{test}^{(1)},\cdots,(z_{test}^{(m)},y_{test}^{(m)})\} {(ztest(1),ytest(1),,(ztest(m),ytest(m))}

How about doing the whole thing without using PCA?

Before implementing PCA, first try running whatever you want to do with the original /raw data x ( i ) x^{(i)} x(i). Only if that doesn’t do what you want, then implement PCA and consider using z ( i ) z^{(i)} z(i)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值