(《机器学习》完整版系列)第6章 支持向量机SVM——6.5 核对率回归和核线性判别分析

核函数特征映射推广到更一般的情形,除了SVM和SVR中使用核方法进行扩充外,这里再讨论核对率回归和核线性判别分析。
用核方法扩展LDA算法形成核线性判别分析KLDA算法

核方法

总结前述的核函数特征映射,将其推广到更一般的情形,即表示定理【西瓜书定理6.2】,表示定理的关键在于【西瓜书式(6.57)】是关于 h h h的函数,而 h h h本身又是关于 x \boldsymbol{x} x的函数,其特例: Ω = 0 \Omega =0 Ω=0 ℓ \ell h ( x i ) h(\boldsymbol{x}_i) h(xi)的函数。

基于核函数的学习方法统称为“核方法”。

核对率回归

核对率回归:设 h ( x i ) = β T x ^ i h(\boldsymbol{x}_i)=\boldsymbol{\beta }^\mathrm{T}\hat{\boldsymbol{x}}_i h(xi)=βTx^i,则【西瓜书式(3.27)】,目标变为
min ⁡ ℓ ( h ) = ∑ i = 1 m [ − y i h ( x i ) + ln ⁡ ( 1 + e h ( x i ) ) ] \begin{align} \min \ell (h)=\sum_{i=1}^m[-y_ih(\boldsymbol{x}_i)+\ln (1+\mathrm{e}^{h(\boldsymbol{x}_i)})] \tag{6.24} \end{align} min(h)=i=1m[yih(xi)+ln(1+eh(xi))](6.24)

再取 Ω = 0 \Omega =0 Ω=0,则表示定理的优化函数【西瓜书式(6.57)】变为
F ( h ) = ℓ ( h ) \begin{align} F(h) =\ell (h) \tag{6.25} \end{align} F(h)=(h)(6.25)
由表示定理,其解可表示为【西瓜书式(6.58)】
h ∗ ( x ) = ∑ i = 1 m α i ∗ κ ( x , x i ) = α ∗ T κ ( x , x 1 :   m ) \begin{align} h^*(\boldsymbol{x}) & =\sum_{i=1}^m{\alpha}_i^*\kappa (\boldsymbol{x},\boldsymbol{x}_i)\notag \\ & ={\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x},\boldsymbol{x}_{1:\,m}) \tag{6.26} \end{align} h(x)=i=1mαiκ(x,xi)=αTκ(x,x1:m)(6.26)
其中, α ∗ = ( α 1 ∗ ; α 2 ∗ ; ⋯   ; α m ∗ ) , κ ( x , x 1 :   m ) = ( κ ( x , x 1 ) ; κ ( x , x 2 ) ; ⋯   ; κ ( x , x m ) ) \boldsymbol{\alpha}^*=({\alpha}_1^*;{\alpha}_2^*;\cdots;{\alpha}_m^*),\quad \kappa (\boldsymbol{x},\boldsymbol{x}_{1:\,m})=(\kappa (\boldsymbol{x},\boldsymbol{x}_{1});\kappa (\boldsymbol{x},\boldsymbol{x}_{2});\cdots;\kappa (\boldsymbol{x},\boldsymbol{x}_{m})) α=(α1;α2;;αm),κ(x,x1:m)=(κ(x,x1);κ(x,x2);;κ(x,xm))

由式(6.25)得
min ⁡ h ∈ H ℓ ( h ) = ℓ ( h ∗ ) = ∑ i = 1 m [ − y i h ∗ ( x i ) + ln ⁡ ( 1 + e h ∗ ( x i ) ) ] = ∑ i = 1 m [ − y i α ∗ T κ ( x i , x 1 :   m ) + ln ⁡ ( 1 + e α ∗ T κ ( x i , x 1 :   m ) ) ] ⩾ min ⁡ α ∑ i = 1 m [ − y i α T κ ( x i , x 1 :   m ) + ln ⁡ ( 1 + e α T κ ( x i , x 1 :   m ) ) ] \begin{align} \mathop{\min}\limits_{h \in \mathbb{H} }\ell (h) & =\ell (h^*)\notag \\ & =\sum_{i=1}^m[-y_ih^*(\boldsymbol{x}_i)+\ln (1+\mathrm{e}^{h^*(\boldsymbol{x}_i)})]\notag \\ & =\sum_{i=1}^m[-y_i{\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})+\ln (1+\mathrm{e}^{{\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})})]\notag \\ & \geqslant \mathop{\min}\limits_{\boldsymbol{\alpha}}\sum_{i=1}^m[-y_i\boldsymbol{\alpha}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})+\ln (1+\mathrm{e}^{\boldsymbol{\alpha}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})})] \tag{6.27} \end{align} hHmin(h)=(h)=i=1m[yih(xi)+ln(1+eh(xi))]=i=1m[yiαTκ(xi,x1:m)+ln(1+eαTκ(xi,x1:m))]αmini=1m[yiαTκ(xi,x1:m)+ln(1+eαTκ(xi,x1:m))](6.27)

比较式(6.27)与【西瓜书式(3.27)】, α \boldsymbol{\alpha} α对应于 β \boldsymbol{\beta} β κ ( x i , x 1 :   m ) \kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m}) κ(xi,x1:m)对应于 x ^ i \hat{x}_i x^i,直接套用【西瓜书式(3.27)】的解,即得到式(6.27)的解,取 α ∗ {\boldsymbol{\alpha}^*} α为该解即可。

核线性判别分析

线性判别分析(【西瓜书第3.4节LDA】)用核方法扩展形成KLDA算法,其关键点体现在映射关系表6.6中
在这里插入图片描述

从表6.6中我们可以看到特征空间中的公式均含有 ϕ ( x ) \phi (\boldsymbol{x}) ϕ(x),然而,我们并不知道它,而是知道核函数 κ ( x , x i ) \kappa (\boldsymbol{x},\boldsymbol{x}_i) κ(x,xi),由其隐式地表示
κ ( x , x i ) = ϕ ( x i ) T ϕ ( x ) \kappa (\boldsymbol{x},\boldsymbol{x}_i)={\phi (\boldsymbol{x}_i)}^\mathrm{T}\phi (\boldsymbol{x}) κ(x,xi)=ϕ(xi)Tϕ(x)

【西瓜书式(6.60)】为 max ⁡ \max max,其倒数为 min ⁡ \min min可作为损失函数,在表示定理【西瓜书定理6.2】中取特殊情况:
Ω ≡ 0 ℓ = J − 1 ( w ) \begin{align} \Omega & \equiv 0 \notag \\ \ell & =J^{-1}(\boldsymbol{w})\notag \end{align} Ω0=J1(w)
则【西瓜书式(6.57)】变为
min ⁡ F ( h ) = min ⁡ ( 0 + J − 1 ( w ) ) = max ⁡ J ( w ) \begin{align} \min F(h)=\min (0+J^{-1}(\boldsymbol{w}))=\max{J(\boldsymbol{w})} \tag{6.28} \end{align} minF(h)=min(0+J1(w))=maxJ(w)(6.28)
即【西瓜书式(6.57)】变为【西瓜书式(6.60)】,这即为优化目标。

假定通过表6.6中对应方法求出了最优模型 h ( x ) = w T ϕ ( x ) h(\boldsymbol{x})=\boldsymbol{w}^\mathrm{T}\phi (\boldsymbol{x}) h(x)=wTϕ(x),而表示定理说这个最优解具有【西瓜书式(6.58)】的形式,即
h ( x ) = w T ϕ ( x ) = ∑ i = 1 m α i κ ( x , x i ) (由【西瓜书式(6.58)】) = ∑ i = 1 m α i ( ϕ ( x i ) ) T ϕ ( x ) = [ ∑ i = 1 m α i ϕ ( x i ) ] T ϕ ( x ) \begin{align} h(\boldsymbol{x}) & =\boldsymbol{w}^\mathrm{T}\phi (\boldsymbol{x})\notag \\ & =\sum_{i=1}^m{\alpha}_i \kappa (\boldsymbol{x},\boldsymbol{x}_i)\text{(由【西瓜书式(6.58)】)}\notag \\ & =\sum_{i=1}^m{\alpha}_i(\phi (\boldsymbol{x}_i))^\mathrm{T}\phi (\boldsymbol{x})\notag \\ & =\left[\sum_{i=1}^m{\alpha}_i\phi (\boldsymbol{x}_i)\right]^\mathrm{T}\phi (\boldsymbol{x}) \end{align} h(x)=wTϕ(x)=i=1mαiκ(x,xi)(由【西瓜书式(6.58)】)=i=1mαi(ϕ(xi))Tϕ(x)=[i=1mαiϕ(xi)]Tϕ(x)

由此有
w = ∑ i = 1 m α i ϕ ( x i ) \begin{align} \boldsymbol{w}=\sum_{i=1}^m{\alpha}_i\phi (\boldsymbol{x}_i) \tag{6.29} \end{align} w=i=1mαiϕ(xi)(6.29)

假定训练集由 n n n个类(集)组成: D = X 1 ⋃ X 2 ⋃ ⋯ ⋃ X n D=\mathbf{X}_1\bigcup \mathbf{X}_2\bigcup\cdots\bigcup\mathbf{X}_n D=X1X2Xn,其中, X i \mathbf{X}_i Xi为第 i i i类的样本组成的集,但以矩阵的形式体现。

将指示函数式用到这里,有
I ( x j ∈ X i ) = {   1 , 当 x j ∈ X i   0 , 当 x j ∉ X i \begin{align} \mathbb{I} (\boldsymbol{x}_j \in \mathbf{X}_i)= \begin{cases} \ 1 ,\qquad \text{当$\boldsymbol{x}_j \in \mathbf{X}_i$}\notag \\ \ 0 ,\qquad \text{当$\boldsymbol{x}_j \notin \mathbf{X}_i$}\notag \end{cases} \end{align} I(xjXi)={ 1,xjXi 0,xj/Xi

为方便计,我们改写一下形式:
I i ( x j ) = {   1 , 当 x j ∈ X i   0 , 当 x j ∉ X i \begin{align} \mathbb{I}_i (\boldsymbol{x}_j )= \begin{cases} \ 1 ,\qquad \text{当$\boldsymbol{x}_j \in \mathbf{X}_i$}\notag \\ \ 0 ,\qquad \text{当$\boldsymbol{x}_j \notin \mathbf{X}_i$}\notag \end{cases} \end{align} Ii(xj)={ 1,xjXi 0,xj/Xi

I i \mathbb{I}_i Ii作用于 D D D的所有样本,则得到一个向量,记为
I i ( x 1 :   m ) = d e f ( I i ( x 1 ) ; I i ( x 2 ) ; ⋯   ; I i ( x m ) ) \begin{align} \mathbb{I}_i (\boldsymbol{x}_{1:\, m} )\mathop{=} \limits^{\mathrm{def}} (\mathbb{I}_i (\boldsymbol{x}_1 );\mathbb{I}_i (\boldsymbol{x}_2 );\cdots;\mathbb{I}_i (\boldsymbol{x}_m )) \tag{6.30} \end{align} Ii(x1:m)=def(Ii(x1);Ii(x2);;Ii(xm))(6.30)

ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)为(列)向量,将 ϕ \phi ϕ作用于 D D D的所有样本,则得到一个矩阵,记为
( ϕ ( x 1 :   m ) ) T = d e f ( ϕ ( x 1 ) , ϕ ( x 2 ) , ⋯   , ϕ ( x m ) ) \begin{align} (\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} \mathop{=} \limits^{\mathrm{def}} (\phi (\boldsymbol{x}_1 ),\phi (\boldsymbol{x}_2 ),\cdots,\phi (\boldsymbol{x}_m )) \tag{6.31} \end{align} (ϕ(x1:m))T=def(ϕ(x1),ϕ(x2),,ϕ(xm))(6.31)

( ϕ ( x 1 :   m ) ) = ( ( ϕ ( x 1 ) ) T ; ( ϕ ( x 2 ) ) T ; ⋯   ; ( ϕ ( x m ) ) T ) (由下面式(0.2) \begin{align} (\phi (\boldsymbol{x}_{1:\,m} ))= ((\phi (\boldsymbol{x}_1 ))^\mathrm{T};(\phi (\boldsymbol{x}_2 ))^\mathrm{T};\cdots;(\phi (\boldsymbol{x}_m ))^\mathrm{T})\quad \text{(由下面式(0.2)} \tag{6.32} \end{align} (ϕ(x1:m))=((ϕ(x1))T;(ϕ(x2))T;;(ϕ(xm))T)(由下面式(0.2(6.32)
用到公式:
X T = ( x 1 , x 2 , ⋯   , x n ) T = ( x 1 T ; x 2 T ; ⋯   ; x n T ) \begin{align} %\mathbf{X} & =(\boldsymbol{x}_1,\boldsymbol{x}_2,\cdots,\boldsymbol{x}_n)\tag{eq:300-t02be} \\ \mathbf{X}^\mathrm{T} & =(\boldsymbol{x}_1,\boldsymbol{x}_2,\cdots,\boldsymbol{x}_n)^\mathrm{T}\notag \\ & =(\boldsymbol{x}_1^\mathrm{T};\boldsymbol{x}_2^\mathrm{T};\cdots;\boldsymbol{x}_n^\mathrm{T}) \tag{0.2} \end{align} XT=(x1,x2,,xn)T=(x1T;x2T;;xnT)(0.2)
由式(6.31)、式(6.32)有
ϕ ( x 1 :   m ) ( ϕ ( x 1 :   m ) ) T = ( ϕ ( x 1 ) T ; ϕ ( x 2 ) T ; ⋯   , ϕ ( x m ) T ) ( ϕ ( x 1 ) , ϕ ( x 2 ) , ⋯   , ϕ ( x m ) ) = ( [ ϕ ( x i ) T ϕ ( x j ) ] i j ) = ( [ κ ( x i , x j ) ] i j ) (由【西瓜书式(6.22)】) = K \begin{align} \phi (\boldsymbol{x}_{1:\,m} )(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} & = \left(\phi (\boldsymbol{x}_1 )^\mathrm{T};\phi (\boldsymbol{x}_2 )^\mathrm{T};\cdots,\phi (\boldsymbol{x}_m )^\mathrm{T}\right) \left(\phi (\boldsymbol{x}_1 ),\phi (\boldsymbol{x}_2 ),\cdots,\phi (\boldsymbol{x}_m )\right)\notag \\ & =([\phi (\boldsymbol{x}_i )^\mathrm{T}\phi (\boldsymbol{x}_j )]_{ij})\notag \\ & =([\kappa (x_i,x_j)]_{ij})\quad \text{(由【西瓜书式(6.22)】)}\notag \\ & =\mathbf{K} \tag{6.33} \end{align} ϕ(x1:m)(ϕ(x1:m))T=(ϕ(x1)T;ϕ(x2)T;,ϕ(xm)T)(ϕ(x1),ϕ(x2),,ϕ(xm))=([ϕ(xi)Tϕ(xj)]ij)=([κ(xi,xj)]ij)(由【西瓜书式(6.22)】)=K(6.33)

由式(6.30)、式(6.31),改写【西瓜书式(6.61)】:
μ i ϕ = 1 m [ ∑ x j ∈ X i ϕ ( x j ) + ∑ x j ∉ X i 0 ] = 1 m i [ ∑ x j ∈ D I ( x j ∈ X i ) ϕ ( x j ) ] = 1 m i ( ϕ ( x 1 :   m ) ) T I i ( x 1 :   m ) \begin{align} {\mu}_i^{\phi } & =\frac{1}{m}\left[\sum_{\boldsymbol{x}_j \in \mathbf{X}_i}{\phi }(\boldsymbol{x}_j)+\sum_{\boldsymbol{x}_j \notin \mathbf{X}_i}0\right]\notag \\ & =\frac{1}{m_i}\left[\sum_{\boldsymbol{x}_j \in D}\mathbb{I} (\boldsymbol{x}_j \in \mathbf{X}_i){\phi }(\boldsymbol{x}_j)\right]\notag \\ & =\frac{1}{m_i}(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\mathbb{I}_i (\boldsymbol{x}_{1:\,m} ) \tag{6.34} \end{align} μiϕ=m1 xjXiϕ(xj)+xj/Xi0 =mi1 xjDI(xjXi)ϕ(xj) =mi1(ϕ(x1:m))TIi(x1:m)(6.34)

同样有
μ j ϕ = 1 m j ( ϕ ( x 1 :   m ) ) T I j ( x 1 :   m ) \begin{align} {\mu}_j^{\phi } =\frac{1}{m_j}(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\mathbb{I}_j (\boldsymbol{x}_{1:\,m} ) \tag{6.35} \end{align} μjϕ=mj1(ϕ(x1:m))TIj(x1:m)(6.35)

由式(6.34)、式(6.35),有
μ i ϕ − μ j ϕ = ( ϕ ( x 1 :   m ) ) T [ 1 m i I i ( x 1 :   m ) − 1 m j I j ( x 1 :   m ) ] \begin{align} {\mu}_i^{\phi } -{\mu}_j^{\phi } =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_i}\mathbb{I}_i (\boldsymbol{x}_{1:\,m} )-\frac{1}{m_j}\mathbb{I}_j (\boldsymbol{x}_{1:\,m} )\right] \tag{6.36} \end{align} μiϕμjϕ=(ϕ(x1:m))T[mi1Ii(x1:m)mj1Ij(x1:m)](6.36)

由式(6.36)改写【西瓜书式(6.62)】:
S b ϕ = ( ϕ ( x 1 :   m ) ) T [ 1 m 1 I 1 ( x 1 :   m ) − 1 m 0 I 0 ( x 1 :   m ) ] ( ( ϕ ( x 1 :   m ) ) T [ 1 m 1 I 1 − 1 m 0 I 0 ] ) T = ( ϕ ( x 1 :   m ) ) T [ I 1 ( x 1 :   m ) m 1 − I 0 ( x 1 :   m ) m 0 ] [ I 1 ( x 1 :   m ) m 1 − I 0 ( x 1 :   m ) m 0 ] T ϕ ( x 1 :   m ) = ϕ T [ ⋅ ] [ ⋅ ] T ϕ (简记) \begin{align} \mathbf{S}_{\mathrm{b}}^{\phi } & =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_1}\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )-\frac{1}{m_0}\mathbb{I}_0 (\boldsymbol{x}_{1:\,m} )\right]\left((\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_1}\mathbb{I}_1 -\frac{1}{m_0}\mathbb{I}_0 \right]\right)^\mathrm{T}\notag \\ & =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} \left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0} \right] \left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0}\right]^\mathrm{T}\phi (\boldsymbol{x}_{1:\,m} )\notag \\ & ={\phi}^\mathrm{T}[\cdot][\cdot]^\mathrm{T}{\phi}\qquad \text{(简记)} \tag{6.37} \end{align} Sbϕ=(ϕ(x1:m))T[m11I1(x1:m)m01I0(x1:m)]((ϕ(x1:m))T[m11I1m01I0])T=(ϕ(x1:m))T[m1I1(x1:m)m0I0(x1:m)][m1I1(x1:m)m0I0(x1:m)]Tϕ(x1:m)=ϕT[][]Tϕ(简记)(6.37)

由式(6.31)改写式(6.29):
w = ϕ ( x 1 :   m ) T α , ( α = ( α 1 ; α 2 ; ⋯   ; α m ) ) \begin{align} \boldsymbol{w}=\phi (\boldsymbol{x}_{1:\,m} )^\mathrm{T}\boldsymbol{\alpha},\quad (\boldsymbol{\alpha}=({\alpha}_1;{\alpha}_2;\cdots;{\alpha}_m)) \tag{6.38} \end{align} w=ϕ(x1:m)Tα,(α=(α1;α2;;αm))(6.38)

由式(6.37)、式(6.38)有(必要时采用简记)
w T S b ϕ w = ( ϕ ( x 1 :   m ) T α ) T S b ϕ ( ϕ ( x 1 :   m ) ) T α (由式(6.38)) = α T ϕ [ ϕ T [ ⋅ ] [ ⋅ ] T ϕ ] ϕ T α (由式(6.37)) = α T ( ϕ ϕ T ) [ ⋅ ] [ ⋅ ] T ( ϕ ϕ T ) α = α T K [ ⋅ ] [ ⋅ ] T K α (由式(6.33)) = α T ( K [ ⋅ ] ) ( [ ⋅ ] T K T ) α (由 K 的对称性) = α T ( K [ ⋅ ] ) ( K [ ⋅ ] ) T α \begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }\boldsymbol{w} & =\left(\phi (\boldsymbol{x}_{1:\,m} )^\mathrm{T}\boldsymbol{\alpha}\right)^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\boldsymbol{\alpha}\quad \text{(由式(6.38))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\phi[{\phi}^\mathrm{T}[\cdot][\cdot]^\mathrm{T}{\phi}]{\phi}^\mathrm{T}\boldsymbol{\alpha}\quad \text{(由式(6.37))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\phi{\phi}^\mathrm{T})[\cdot][\cdot]^\mathrm{T}({\phi}{\phi}^\mathrm{T})\boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{K}[\cdot][\cdot]^\mathrm{T}\mathbf{K}\boldsymbol{\alpha}\quad \text{(由式(6.33))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\mathbf{K}[\cdot])([\cdot]^\mathrm{T}\mathbf{K}^\mathrm{T})\boldsymbol{\alpha}\quad \text{(由$\mathbf{K}$的对称性)}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\mathbf{K}[\cdot])(\mathbf{K}[\cdot])^\mathrm{T}\boldsymbol{\alpha}\quad \tag{6.39} \end{align} wTSbϕw=(ϕ(x1:m)Tα)TSbϕ(ϕ(x1:m))Tα(由式(6.38)=αTϕ[ϕT[][]Tϕ]ϕTα(由式(6.37)=αT(ϕϕT)[][]T(ϕϕT)α=αTK[][]TKα(由式(6.33)=αT(K[])([]TKT)α(由K的对称性)=αT(K[])(K[])Tα(6.39)
其中
K [ ⋅ ] = K [ I 1 ( x 1 :   m ) m 1 − I 0 ( x 1 :   m ) m 0 ] = 1 m 1 K I 1 ( x 1 :   m ) − 1 m 0 K I 0 ( x 1 :   m ) \begin{align} \mathbf{K}[\cdot] & =\mathbf{K}\left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0}\right]\notag \\ & =\frac{1}{m_1}\mathbf{K}\mathbb{I}_1 (\boldsymbol{x}_{1:\,m})-\frac{1}{m_0}\mathbf{K}\mathbb{I}_0 (\boldsymbol{x}_{1:\,m} ) \end{align} K[]=K[m1I1(x1:m)m0I0(x1:m)]=m11KI1(x1:m)m01KI0(x1:m)

引入【西瓜书式(6.66)   ∼ \,\thicksim (6.69)】定义及记号 1 i = d e f I i ( x 1 :   m ) \boldsymbol{1}_i\mathop{=} \limits^{\mathrm{def}} \mathbb{I}_i (\boldsymbol{x}_{1:\,m}) 1i=defIi(x1:m),则式(6.39)变为
w T S b ϕ w = α T [ 1 m 1 K 1 1 − 1 m 0 K 1 0 ] [ 1 m 1 K 1 1 − 1 m 0 K 1 0 ] T α = α T ( μ ^ 1 − μ ^ 0 ) ( μ ^ 1 − μ ^ 0 ) T α = α T M α \begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }\boldsymbol{w} & =\boldsymbol{\alpha}^\mathrm{T} \left[\frac{1}{m_1}\mathbf{K}\boldsymbol{1}_1-\frac{1}{m_0}\mathbf{K}\boldsymbol{1}_0\right] \left[\frac{1}{m_1}\mathbf{K}\boldsymbol{1}_1-\frac{1}{m_0}\mathbf{K}\boldsymbol{1}_0\right]^\mathrm{T} \boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\hat{\boldsymbol{\mu} }_1-\hat{\boldsymbol{\mu} }_0)(\hat{\boldsymbol{\mu} }_1-\hat{\boldsymbol{\mu} }_0)^\mathrm{T}\boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{M}\boldsymbol{\alpha} \tag{6.40} \end{align} wTSbϕw=αT[m11K11m01K10][m11K11m01K10]Tα=αT(μ^1μ^0)(μ^1μ^0)Tα=αTMα(6.40)

与上述推导式(6.40)过程类似,有
w T S w ϕ w = α T N α \begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{w}}^{\phi }\boldsymbol{w} & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{N}\boldsymbol{\alpha} \tag{6.41} \end{align} wTSwϕw=αTNα(6.41)

由式(6.40)、式(6.41),优化目标由【西瓜书式(6.60)】变为【西瓜书式(6.70)】,这样,就可以使用第3章的线性判别分析(LDA)求解(参照【西瓜书式(3.35)】的求解过程)。

问题来了:【西瓜书式(6.60)】与【西瓜书式(6.70)】这两个式子形式上差不多,为什么不直接求前者?

因为,前者是求 w \boldsymbol{w} w,由式(6.29)知,它与 ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数关连,而该函数通常是不知道的。 转化成后者之后, ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数相关的内容成了核矩阵(式(6.33)),核矩阵 K \mathbf{K} K体现在 M \mathbf{M} M N \mathbf{N} N中,而核矩阵 K \mathbf{K} K通常是已知的,也就是【西瓜书式(6.70)】避开了未知的 ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数,这就是目标表达式转换的原因。

本文为原创,您可以:

  • 点赞(支持博主)
  • 收藏(待以后看)
  • 转发(他考研或学习,正需要)
  • 评论(或讨论)
  • 引用(支持原创)
  • 不侵权

上一篇:6.4 软间隔与正则化、支持向量回归
下一篇:7.1 贝叶斯决策论(贝叶斯学派与频率学派有很大的分岐)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值