机器学习入门 05 主成分分析 PCA(Principal Component Analysis)

1 介绍

1.1 特点
  • 一个非监督的机器学习算法
  • 主要用于数据的降维,通过降维,可以发现便于人类理解的特征
  • 其他应用:可视化;去噪
1.2 数学意义

这里写图片描述

  1. 找到让样本间间距最大的轴
  2. 定义样本间间距,使用方差: V a r ( x ) = 1 m ∑ i = 1 m ( x i − x ˉ ) 2 Var(x)=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ i }-\bar { x } ) }^{ 2 } } Var(x)=m1i=1m(xixˉ)2
  3. 找到一个轴,使得样本空间的所有点映射到这个轴后,方差最大。
1.3 步骤
  1. 对所有样本进行demean处理,即将样本的均值归为0,则 V a r ( x ) = 1 m ∑ i = 1 m ( x i ) 2 , x ˉ = 0 Var(x)=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ i }) }^{ 2 } } ,\quad \bar { x } =0 Var(x)=m1i=1m(xi)2,xˉ=0
  2. 求一个轴的方向 w = ( w 1 , w 2 ) w=({ w }_{ 1 },{ w }_{ 2 }) w=(w1,w2),使得我们所有的样本映射到 w w w 以后,有 V a r ( x p r o ) = 1 m ∑ i = 1 m ( x p r o ( i ) − x ˉ p r o ) 2 Var({ x }_{ pro })=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ pro }^{ (i) }-{ \bar { x } }_{ pro }) }^{ 2 } } Var(xpro)=m1i=1m(xpro(i)xˉpro)2 的值最大。
1.4 公式推导

1) V a r ( x p r o ) = 1 m ∑ i = 1 m ( x p r o ( i ) − x ˉ p r o ) 2 = 1 m ∑ i = 1 m ∥ x p r o ( i ) − x ˉ p r o ∥ 2 = 1 m ∑ i = 1 m ∥ x p r o ( i ) ∥ 2 Var({ x }_{ pro })=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ pro }^{ (i) }-{ \bar { x } }_{ pro }) }^{ 2 } } \\ \qquad =\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { \left\| { x }_{ pro }^{ (i) }-{ \bar { x } }_{ pro } \right\| }^{ 2 } } \\ \qquad =\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { \left\| { x }_{ pro }^{ (i) } \right\| }^{ 2 } } Var(xpro)=m1i=1m(xpro(i)xˉpro)2=m1i=1mxpro(i)xˉpro2=m1i=1mxpro(i)2

2) ∥ x p r o ( i ) ∥ \left\| { x }_{ pro }^{ (i) } \right\| xpro(i)
这里写图片描述

x 1 = x ( i ) = ( x 1 ( i ) , x 2 ( i ) ) , x 2 = x p r o ( i ) = ( x p r o 1 ( i ) , x p r o 2 ( i ) ) { x }_{ 1 }={ x }^{ (i) }=({ x }_{ 1 }^{ (i) },{ x }_{ 2 }^{ (i) }),\quad { x }_{ 2 }={ x }_{ pro }^{ (i) }=({ x }_{ pro1 }^{ (i) },{ x }_{ pro2 }^{ (i) }) x1=x(i)=(x1(i),x2(i)),x2=xpro(i)=(xpro1(i),xpro2(i))

x ( i ) ⋅ w = ∥ x ( i ) ∥ ⋅ ∥ w ∥ ⋅ c o s θ { x }^{ (i) }\cdot w=\left\| { x }^{ (i) } \right\| \cdot \left\| w \right\| \cdot cos\theta \\ x(i)w=x(i)wcosθ
∥ w ∥ = 1 \left\| w \right\| =1 w=1,有 x ( i ) ⋅ w = ∥ x ( i ) ∥ ⋅ c o s θ = ∥ x p r o ( i ) ∥ { x }^{ (i) }\cdot w=\left\| { x }^{ (i) } \right\| \cdot cos\theta =\left\| { x }_{ pro }^{ (i) } \right\| x(i)w=x(i)cosθ=xpro(i)

∵ ∥ x p r o ( i ) ∥ = x ( i ) ⋅ w ∴ V a r ( x p r o ) = 1 m ∑ i = 1 m ∥ x p r o ∥ 2 = 1 m ∑ i = 1 m ( x ( i ) ⋅ w ) 2 \because \left\| { x }_{ pro }^{ (i) } \right\| ={ x }^{ (i) }\cdot w\\ \therefore Var({ x }_{ pro })=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { \left\| { x }_{ pro } \right\| }^{ 2 } } =\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }\cdot w) }^{ 2 } } xpro(i)=x(i)wVar(xpro)=m1i=1mxpro2=m1i=1m(x(i)w)2

3)目标:求 w w w,使 V a r ( x p r o ) = 1 m ∑ i = 1 m ( x ( i ) ⋅ w ) 2 Var({ x }_{ pro })=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }\cdot w) }^{ 2 } } Var(xpro)=m1i=1m(x(i)w)2的值最大

2 梯度上升法求梯度

V a r ( x p r o ) = 1 m ∑ i = 1 m ( x ( i ) ⋅ w ) 2 = 1 m ∑ i = 1 m ( x 1 ( i ) w 1 + x 2 ( i ) w 2 + ⋯ + x n ( i ) w n ) 2 = 1 m ∑ i = 1 m ( ∑ j = 1 n x j ( i ) w j ) 2 Var({ x }_{ pro })=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }\cdot w) }^{ 2 } } \\ \qquad =\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ 1 }^{ (i) }{ w }_{ 1 }+{ x }_{ 2 }^{ (i) }{ w }_{ 2 }+\cdots +{ x }_{ n }^{ (i) }{ w }_{ n }) }^{ 2 } } \\ \qquad =\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { (\sum _{ j=1 }^{ n }{ { x }_{ j }^{ (i) }{ w }_{ j } } ) }^{ 2 } } Var(xpro)=m1i=1m(x(i)w)2=m1i=1m(x1(i)w1+x2(i)w2++xn(i)wn)2=m1i=1m(j=1nxj(i)wj)2

f ( x ) = 1 m ∑ i = 1 m ( x 1 ( i ) w 1 + x 2 ( i ) w 2 + ⋯ + x n ( i ) w n ) 2 f(x)=\frac { 1 }{ m } \sum _{ i=1 }^{ m }{ { ({ x }_{ 1 }^{ (i) }{ w }_{ 1 }+{ x }_{ 2 }^{ (i) }{ w }_{ 2 }+\cdots +{ x }_{ n }^{ (i) }{ w }_{ n }) }^{ 2 } } f(x)=m1i=1m(x1(i)w1+x2(i)w2++xn(i)wn)2,有

∇ f = ( ∂ f ∂ w 1 ∂ f ∂ w 2 ⋮ ∂ f ∂ w n ) = 2 m ( ∑ i = 1 m ( x 1 ( i ) w 1 + x 2 ( i ) w 2 + ⋯ + x n ( i ) w n ) ( x 1 ( i ) ) ∑ i = 1 m ( x 1 ( i ) w 1 + x 2 ( i ) w 2 + ⋯ + x n ( i ) w n ) ( x 2 ( i ) ) ⋮ ∑ i = 1 m ( x 1 ( i ) w 1 + x 2 ( i ) w 2 + ⋯ + x n ( i ) w n ) ( x n ( i ) ) ) = 2 m ( ∑ i = 1 m ( x ( i ) w ) ( x 1 ( i ) ) ∑ i = 1 m ( x ( i ) w ) ( x 2 ( i ) ) ⋮ ∑ i = 1 m ( x ( i ) w ) ( x n ( i ) ) ) = 2 m ( x ( 1 ) w , x ( 2 ) w , ⋯   , x ( m ) w ) ⋅ ( x 1 ( 1 ) x 2 ( 1 ) ⋯ x n ( 1 ) x 1 ( 2 ) x 2 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( m ) x 2 ( m ) ⋯ x n ( m ) ) = 2 m ( x w ⊺ x ) ⊺ = 2 m x ⊺ ( x w ) \nabla f=\begin{pmatrix} \frac { \partial f }{ \partial { w }_{ 1 } } \\ \frac { \partial f }{ \partial { w }_{ 2 } } \\ \vdots \\ \frac { \partial f }{ \partial { w }_{ n } } \end{pmatrix}=\frac { 2 }{ m } \begin{pmatrix} \sum _{ i=1 }^{ m }{ { ({ x }_{ 1 }^{ (i) }{ w }_{ 1 }+{ x }_{ 2 }^{ (i) }{ w }_{ 2 }+\cdots +{ x }_{ n }^{ (i) }{ w }_{ n }) } } ({ x }_{ 1 }^{ (i) }) \\ \sum _{ i=1 }^{ m }{ { ({ x }_{ 1 }^{ (i) }{ w }_{ 1 }+{ x }_{ 2 }^{ (i) }{ w }_{ 2 }+\cdots +{ x }_{ n }^{ (i) }{ w }_{ n }) } } ({ x }_{ 2 }^{ (i) }) \\ \vdots \\ \sum _{ i=1 }^{ m }{ { ({ x }_{ 1 }^{ (i) }{ w }_{ 1 }+{ x }_{ 2 }^{ (i) }{ w }_{ 2 }+\cdots +{ x }_{ n }^{ (i) }{ w }_{ n }) } } ({ x }_{ n }^{ (i) }) \end{pmatrix}\\ =\frac { 2 }{ m } \begin{pmatrix} \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }w) } } ({ x }_{ 1 }^{ (i) }) \\ \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }w) } } ({ x }_{ 2 }^{ (i) }) \\ \vdots \\ \sum _{ i=1 }^{ m }{ { ({ x }^{ (i) }w) } } ({ x }_{ n }^{ (i) }) \end{pmatrix}=\frac { 2 }{ m } ({ x }^{ (1) }w,{ x }^{ (2) }w,\cdots ,{ x }^{ (m) }w)\cdot \begin{pmatrix} { x }_{ 1 }^{ (1) } & { x }_{ 2 }^{ (1) } & \cdots & { x }_{ n }^{ (1) } \\ { x }_{ 1 }^{ (2) } & { x }_{ 2 }^{ (2) } & \cdots & { x }_{ n }^{ (2) } \\ \vdots & \vdots & \ddots & \vdots \\ { x }_{ 1 }^{ (m) } & { x }_{ 2 }^{ (m) } & \cdots & { x }_{ n }^{ (m) } \end{pmatrix}\\ =\frac { 2 }{ m } { ({ xw }^{ \intercal }x) }^{ \intercal }=\frac { 2 }{ m } { x }^{ \intercal }(xw) f=w1fw2fwnf=m2i=1m(x1(i)w1+x2(i)w2++xn(i)wn)(x1(i))i=1m(x1(i)w1+x2(i)w2++xn(i)wn)(x2(i))i=1m(x1(i)w1+x2(i)w2++xn(i)wn)(xn(i))=m2i=1m(x(i)w)(x1(i))i=1m(x(i)w)(x2(i))i=1m(x(i)w)(xn(i))=m2(x(1)w,x(2)w,,x(m)w)x1(1)x1(2)x1(m)x2(1)x2(2)x2(m)xn(1)xn(2)xn(m)=m2(xwx)=m2x(xw)

3 求前n个主成分

1)求出第一主成分后,如何求出下一主成分?

–> 数据进行改变,将数据在第一个主成分上的分量去掉,然后在新的数据上求第一主成分。

这里写图片描述

x ( i ) ⋅ w = ∥ x p r o ( i ) ∥ x p r o ( i ) = ∥ x p r o ( i ) ∥ ⋅ w x ′ ( i ) = x ( i ) − x p r o ( i ) { x }^{ (i) }\cdot w=\left\| { x }_{ pro }^{ (i) } \right\| \\ { x }_{ pro }^{ (i) }=\left\| { x }_{ pro }^{ (i) } \right\| \cdot w\\ { x }^{ '(i) }={ x }^{ (i) }-{ x }_{ pro }^{ (i) } x(i)w=xpro(i)xpro(i)=xpro(i)wx(i)=x(i)xpro(i)

4 降维

降维可以使得数据处理时间减少,但会损失准确度。

1)高维数据向低维数据映射

x = ( x 1 ( 1 ) x 2 ( 1 ) ⋯ x n ( 1 ) x 1 ( 2 ) x 2 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( m ) x 2 ( m ) ⋯ x n ( m ) ) x=\begin{pmatrix} { x }_{ 1 }^{ (1) } & { x }_{ 2 }^{ (1) } & \cdots & { x }_{ n }^{ (1) } \\ { x }_{ 1 }^{ (2) } & { x }_{ 2 }^{ (2) } & \cdots & { x }_{ n }^{ (2) } \\ \vdots & \vdots & \ddots & \vdots \\ { x }_{ 1 }^{ (m) } & { x }_{ 2 }^{ (m) } & \cdots & { x }_{ n }^{ (m) } \end{pmatrix} x=x1(1)x1(2)x1(m)x2(1)x2(2)x2(m)xn(1)xn(2)xn(m)

前 k 个成分:

w k = ( w 1 ( 1 ) w 2 ( 1 ) ⋯ w n ( 1 ) w 1 ( 2 ) w 2 ( 2 ) ⋯ w n ( 2 ) ⋮ ⋮ ⋱ ⋮ w 1 ( k ) w 2 ( k ) ⋯ w n ( k ) ) { w }_{ k }=\begin{pmatrix} { w }_{ 1 }^{ (1) } & { w }_{ 2 }^{ (1) } & \cdots & { w }_{ n }^{ (1) } \\ { w }_{ 1 }^{ (2) } & { w }_{ 2 }^{ (2) } & \cdots & { w }_{ n }^{ (2) } \\ \vdots & \vdots & \ddots & \vdots \\ { w }_{ 1 }^{ (k) } & { w }_{ 2 }^{ (k) } & \cdots & { w }_{ n }^{ (k) } \end{pmatrix} wk=w1(1)w1(2)w1(k)w2(1)w2(2)w2(k)wn(1)wn(2)wn(k)

从 n 维映射到 k 维:

x m × n ⋅ w k ⊺ n × k = x k m × k = ( x 1 ( 1 ) x 2 ( 1 ) ⋯ x k ( 1 ) x 1 ( 2 ) x 2 ( 2 ) ⋯ x k ( 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( m ) x 2 ( m ) ⋯ x k ( m ) ) \underset { m\times n }{ x } \cdot \underset { n\times k }{ { w }_{ k }^{ \intercal } } =\underset { m\times k }{ { x }_{ k } } =\begin{pmatrix} { x }_{ 1 }^{ (1) } & { x }_{ 2 }^{ (1) } & \cdots & { x }_{ k }^{ (1) } \\ { x }_{ 1 }^{ (2) } & { x }_{ 2 }^{ (2) } & \cdots & { x }_{ k }^{ (2) } \\ \vdots & \vdots & \ddots & \vdots \\ { x }_{ 1 }^{ (m) } & { x }_{ 2 }^{ (m) } & \cdots & { x }_{ k }^{ (m) } \end{pmatrix} m×nxn×kwk=m×kxk=x1(1)x1(2)x1(m)x2(1)x2(2)x2(m)xk(1)xk(2)xk(m)

从 k 维还原成 n 维:(还原后会损失信息)

x k m × k ⋅ w k k × n = x m m × n \underset { m\times k }{ { x }_{ k } } \cdot \underset { k\times n }{ { w }_{ k }^{ } } =\underset { m\times n }{ { x }_{ m } } m×kxkk×nwk=m×nxm

4 代码

1)PCA.py

import numpy as np
import matplotlib.pyplot as plt


class PCA:

    def __init__(self, n_components):
        """ 初始化PCA"""
        assert n_components >= 1, "n_components must be valid."
        self.n_components = n_components
        # 获取的主成分矩阵
        self.components_ = None

    def fit(self, xx, eta=0.01, n_iters=1e4):
        """ 获得数据集xx的前n个主成分"""
        assert self.n_components <= xx.shape[1], "n_components must be greater than the feature num of xx."

        def demean(xx):
            return xx - np.mean(xx, axis=0)

        def f(w, xx):
            """ 函数f """
            return np.sum((xx.dot(w) ** 2)) / len(xx)

        def df(w, xx):
            """ 函数f的梯度 """
            return xx.T.dot(xx.dot(w)) * 2. / len(xx)

        def direction(w):
            """ 求w的单位向量"""
            return w / np.linalg.norm(w)

        def first_component(xx, initial_w, eta=0.01, n_iters=1e4, epsilon=1e-8):
            w = direction(initial_w)
            cur_iter = 0

            while cur_iter < n_iters:
                gradient = df(w, xx)
                last_w = w
                w = w + eta * gradient
                w = direction(w)  # 注意1:每次求一个单位方向
                if (abs(f(w, xx) - f(last_w, xx)) < epsilon):
                    break
                cur_iter += 1
            return w

        xx_pca = demean(xx)
        self.components_ = np.empty(shape=(self.n_components, xx.shape[1]))
        for i in range(self.n_components):
            # 注意2:初始位置不能为0,不能从0向量开始
            initial_w = np.random.random(xx_pca.shape[1])
            w = first_component(xx_pca, initial_w, eta, n_iters)
            self.components_[i, :] = w

            plt.scatter(xx_pca[:, 0], xx_pca[:, 1])
            plt.plot([0, w[0] * 30], [0, w[1] * 30], color='r')
            plt.show()

            # 去除前一个主成分
            xx_pca = xx_pca - xx_pca.dot(w).reshape(-1, 1) * w

        # 注意3:不能使用StandardScaler标准化数据(归一化)
        return self

    def transform(self, xx):
        """ 将给定的xx,映射到各个主成分分量中,降维 """
        assert xx.shape[1] == self.components_.shape[1]

        return xx.dot(self.components_.T)

    def inverse_transform(self, xx):
        """ 将给定的xx,反向映射回原来的特征空间"""
        assert xx.shape[1] == self.components_.shape[0]

        return xx.dot(self.components_)

    def __repr__(self):
        return "PCA(n_components=%d)" % self.n_components

2)测试代码:pca_test.py

import numpy as np
import matplotlib.pyplot as plt
from PCA_pro.PCA import PCA

# 1 准备数据
xx = np.empty((100, 2))
xx[:, 0] = np.random.uniform(0., 100., size=100)
xx[:, 1] = 0.75 * xx[:, 0] + 3. + np.random.normal(0, 10., size=100)

plt.scatter(xx[:, 0], xx[:, 1])
plt.show()

# 2 求主成分和其他成分
mpca2=PCA(2)
mpca2.fit(xx)
print(mpca2.components_)

# 3 降维
xx_reduction = mpca2.transform(xx)
print("xx_reduction.shape:", xx_reduction.shape)

# 恢复降维,损失了信息
xx_restore = mpca2.inverse_transform(xx_reduction)
print("xx_restore.shape", xx_restore.shape)

plt.scatter(xx[:, 0], xx[:, 1], color='g')
plt.scatter(xx_restore[:, 0], xx_restore[:, 1], color='r', alpha=0.5)
plt.show()

3)运行结果:
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

[[ 0.76608005  0.64274518]
 [-0.64274259  0.76608221]]
 xx_reduction.shape: (100, 2)
xx_restore.shape (100, 2)
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值