核函数特征映射推广到更一般的情形,除了SVM和SVR中使用核方法进行扩充外,这里再讨论核对率回归和核线性判别分析。
用核方法扩展LDA算法形成核线性判别分析KLDA算法
核方法
总结前述的核函数特征映射,将其推广到更一般的情形,即表示定理【西瓜书定理6.2】,表示定理的关键在于【西瓜书式(6.57)】是关于 h h h的函数,而 h h h本身又是关于 x \boldsymbol{x} x的函数,其特例: Ω = 0 \Omega =0 Ω=0, ℓ \ell ℓ为 h ( x i ) h(\boldsymbol{x}_i) h(xi)的函数。
基于核函数的学习方法统称为“核方法”。
核对率回归
核对率回归:设
h
(
x
i
)
=
β
T
x
^
i
h(\boldsymbol{x}_i)=\boldsymbol{\beta }^\mathrm{T}\hat{\boldsymbol{x}}_i
h(xi)=βTx^i,则【西瓜书式(3.27)】,目标变为
min
ℓ
(
h
)
=
∑
i
=
1
m
[
−
y
i
h
(
x
i
)
+
ln
(
1
+
e
h
(
x
i
)
)
]
\begin{align} \min \ell (h)=\sum_{i=1}^m[-y_ih(\boldsymbol{x}_i)+\ln (1+\mathrm{e}^{h(\boldsymbol{x}_i)})] \tag{6.24} \end{align}
minℓ(h)=i=1∑m[−yih(xi)+ln(1+eh(xi))](6.24)
再取
Ω
=
0
\Omega =0
Ω=0,则表示定理的优化函数【西瓜书式(6.57)】变为
F
(
h
)
=
ℓ
(
h
)
\begin{align} F(h) =\ell (h) \tag{6.25} \end{align}
F(h)=ℓ(h)(6.25)
由表示定理,其解可表示为【西瓜书式(6.58)】
h
∗
(
x
)
=
∑
i
=
1
m
α
i
∗
κ
(
x
,
x
i
)
=
α
∗
T
κ
(
x
,
x
1
:
m
)
\begin{align} h^*(\boldsymbol{x}) & =\sum_{i=1}^m{\alpha}_i^*\kappa (\boldsymbol{x},\boldsymbol{x}_i)\notag \\ & ={\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x},\boldsymbol{x}_{1:\,m}) \tag{6.26} \end{align}
h∗(x)=i=1∑mαi∗κ(x,xi)=α∗Tκ(x,x1:m)(6.26)
其中,
α
∗
=
(
α
1
∗
;
α
2
∗
;
⋯
;
α
m
∗
)
,
κ
(
x
,
x
1
:
m
)
=
(
κ
(
x
,
x
1
)
;
κ
(
x
,
x
2
)
;
⋯
;
κ
(
x
,
x
m
)
)
\boldsymbol{\alpha}^*=({\alpha}_1^*;{\alpha}_2^*;\cdots;{\alpha}_m^*),\quad \kappa (\boldsymbol{x},\boldsymbol{x}_{1:\,m})=(\kappa (\boldsymbol{x},\boldsymbol{x}_{1});\kappa (\boldsymbol{x},\boldsymbol{x}_{2});\cdots;\kappa (\boldsymbol{x},\boldsymbol{x}_{m}))
α∗=(α1∗;α2∗;⋯;αm∗),κ(x,x1:m)=(κ(x,x1);κ(x,x2);⋯;κ(x,xm))
由式(6.25)得
min
h
∈
H
ℓ
(
h
)
=
ℓ
(
h
∗
)
=
∑
i
=
1
m
[
−
y
i
h
∗
(
x
i
)
+
ln
(
1
+
e
h
∗
(
x
i
)
)
]
=
∑
i
=
1
m
[
−
y
i
α
∗
T
κ
(
x
i
,
x
1
:
m
)
+
ln
(
1
+
e
α
∗
T
κ
(
x
i
,
x
1
:
m
)
)
]
⩾
min
α
∑
i
=
1
m
[
−
y
i
α
T
κ
(
x
i
,
x
1
:
m
)
+
ln
(
1
+
e
α
T
κ
(
x
i
,
x
1
:
m
)
)
]
\begin{align} \mathop{\min}\limits_{h \in \mathbb{H} }\ell (h) & =\ell (h^*)\notag \\ & =\sum_{i=1}^m[-y_ih^*(\boldsymbol{x}_i)+\ln (1+\mathrm{e}^{h^*(\boldsymbol{x}_i)})]\notag \\ & =\sum_{i=1}^m[-y_i{\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})+\ln (1+\mathrm{e}^{{\boldsymbol{\alpha}^*}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})})]\notag \\ & \geqslant \mathop{\min}\limits_{\boldsymbol{\alpha}}\sum_{i=1}^m[-y_i\boldsymbol{\alpha}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})+\ln (1+\mathrm{e}^{\boldsymbol{\alpha}^\mathrm{T}\kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m})})] \tag{6.27} \end{align}
h∈Hminℓ(h)=ℓ(h∗)=i=1∑m[−yih∗(xi)+ln(1+eh∗(xi))]=i=1∑m[−yiα∗Tκ(xi,x1:m)+ln(1+eα∗Tκ(xi,x1:m))]⩾αmini=1∑m[−yiαTκ(xi,x1:m)+ln(1+eαTκ(xi,x1:m))](6.27)
比较式(6.27)与【西瓜书式(3.27)】, α \boldsymbol{\alpha} α对应于 β \boldsymbol{\beta} β、 κ ( x i , x 1 : m ) \kappa (\boldsymbol{x}_i,\boldsymbol{x}_{1:\,m}) κ(xi,x1:m)对应于 x ^ i \hat{x}_i x^i,直接套用【西瓜书式(3.27)】的解,即得到式(6.27)的解,取 α ∗ {\boldsymbol{\alpha}^*} α∗为该解即可。
核线性判别分析
线性判别分析(【西瓜书第3.4节LDA】)用核方法扩展形成KLDA算法,其关键点体现在映射关系表6.6中
从表6.6中我们可以看到特征空间中的公式均含有
ϕ
(
x
)
\phi (\boldsymbol{x})
ϕ(x),然而,我们并不知道它,而是知道核函数
κ
(
x
,
x
i
)
\kappa (\boldsymbol{x},\boldsymbol{x}_i)
κ(x,xi),由其隐式地表示
κ
(
x
,
x
i
)
=
ϕ
(
x
i
)
T
ϕ
(
x
)
\kappa (\boldsymbol{x},\boldsymbol{x}_i)={\phi (\boldsymbol{x}_i)}^\mathrm{T}\phi (\boldsymbol{x})
κ(x,xi)=ϕ(xi)Tϕ(x)
【西瓜书式(6.60)】为
max
\max
max,其倒数为
min
\min
min可作为损失函数,在表示定理【西瓜书定理6.2】中取特殊情况:
Ω
≡
0
ℓ
=
J
−
1
(
w
)
\begin{align} \Omega & \equiv 0 \notag \\ \ell & =J^{-1}(\boldsymbol{w})\notag \end{align}
Ωℓ≡0=J−1(w)
则【西瓜书式(6.57)】变为
min
F
(
h
)
=
min
(
0
+
J
−
1
(
w
)
)
=
max
J
(
w
)
\begin{align} \min F(h)=\min (0+J^{-1}(\boldsymbol{w}))=\max{J(\boldsymbol{w})} \tag{6.28} \end{align}
minF(h)=min(0+J−1(w))=maxJ(w)(6.28)
即【西瓜书式(6.57)】变为【西瓜书式(6.60)】,这即为优化目标。
假定通过表6.6中对应方法求出了最优模型
h
(
x
)
=
w
T
ϕ
(
x
)
h(\boldsymbol{x})=\boldsymbol{w}^\mathrm{T}\phi (\boldsymbol{x})
h(x)=wTϕ(x),而表示定理说这个最优解具有【西瓜书式(6.58)】的形式,即
h
(
x
)
=
w
T
ϕ
(
x
)
=
∑
i
=
1
m
α
i
κ
(
x
,
x
i
)
(由【西瓜书式(6.58)】)
=
∑
i
=
1
m
α
i
(
ϕ
(
x
i
)
)
T
ϕ
(
x
)
=
[
∑
i
=
1
m
α
i
ϕ
(
x
i
)
]
T
ϕ
(
x
)
\begin{align} h(\boldsymbol{x}) & =\boldsymbol{w}^\mathrm{T}\phi (\boldsymbol{x})\notag \\ & =\sum_{i=1}^m{\alpha}_i \kappa (\boldsymbol{x},\boldsymbol{x}_i)\text{(由【西瓜书式(6.58)】)}\notag \\ & =\sum_{i=1}^m{\alpha}_i(\phi (\boldsymbol{x}_i))^\mathrm{T}\phi (\boldsymbol{x})\notag \\ & =\left[\sum_{i=1}^m{\alpha}_i\phi (\boldsymbol{x}_i)\right]^\mathrm{T}\phi (\boldsymbol{x}) \end{align}
h(x)=wTϕ(x)=i=1∑mαiκ(x,xi)(由【西瓜书式(6.58)】)=i=1∑mαi(ϕ(xi))Tϕ(x)=[i=1∑mαiϕ(xi)]Tϕ(x)
由此有
w
=
∑
i
=
1
m
α
i
ϕ
(
x
i
)
\begin{align} \boldsymbol{w}=\sum_{i=1}^m{\alpha}_i\phi (\boldsymbol{x}_i) \tag{6.29} \end{align}
w=i=1∑mαiϕ(xi)(6.29)
假定训练集由 n n n个类(集)组成: D = X 1 ⋃ X 2 ⋃ ⋯ ⋃ X n D=\mathbf{X}_1\bigcup \mathbf{X}_2\bigcup\cdots\bigcup\mathbf{X}_n D=X1⋃X2⋃⋯⋃Xn,其中, X i \mathbf{X}_i Xi为第 i i i类的样本组成的集,但以矩阵的形式体现。
将指示函数式用到这里,有
I
(
x
j
∈
X
i
)
=
{
1
,
当
x
j
∈
X
i
0
,
当
x
j
∉
X
i
\begin{align} \mathbb{I} (\boldsymbol{x}_j \in \mathbf{X}_i)= \begin{cases} \ 1 ,\qquad \text{当$\boldsymbol{x}_j \in \mathbf{X}_i$}\notag \\ \ 0 ,\qquad \text{当$\boldsymbol{x}_j \notin \mathbf{X}_i$}\notag \end{cases} \end{align}
I(xj∈Xi)={ 1,当xj∈Xi 0,当xj∈/Xi
为方便计,我们改写一下形式:
I
i
(
x
j
)
=
{
1
,
当
x
j
∈
X
i
0
,
当
x
j
∉
X
i
\begin{align} \mathbb{I}_i (\boldsymbol{x}_j )= \begin{cases} \ 1 ,\qquad \text{当$\boldsymbol{x}_j \in \mathbf{X}_i$}\notag \\ \ 0 ,\qquad \text{当$\boldsymbol{x}_j \notin \mathbf{X}_i$}\notag \end{cases} \end{align}
Ii(xj)={ 1,当xj∈Xi 0,当xj∈/Xi
将
I
i
\mathbb{I}_i
Ii作用于
D
D
D的所有样本,则得到一个向量,记为
I
i
(
x
1
:
m
)
=
d
e
f
(
I
i
(
x
1
)
;
I
i
(
x
2
)
;
⋯
;
I
i
(
x
m
)
)
\begin{align} \mathbb{I}_i (\boldsymbol{x}_{1:\, m} )\mathop{=} \limits^{\mathrm{def}} (\mathbb{I}_i (\boldsymbol{x}_1 );\mathbb{I}_i (\boldsymbol{x}_2 );\cdots;\mathbb{I}_i (\boldsymbol{x}_m )) \tag{6.30} \end{align}
Ii(x1:m)=def(Ii(x1);Ii(x2);⋯;Ii(xm))(6.30)
因
ϕ
(
x
i
)
\phi (\boldsymbol{x}_i)
ϕ(xi)为(列)向量,将
ϕ
\phi
ϕ作用于
D
D
D的所有样本,则得到一个矩阵,记为
(
ϕ
(
x
1
:
m
)
)
T
=
d
e
f
(
ϕ
(
x
1
)
,
ϕ
(
x
2
)
,
⋯
,
ϕ
(
x
m
)
)
\begin{align} (\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} \mathop{=} \limits^{\mathrm{def}} (\phi (\boldsymbol{x}_1 ),\phi (\boldsymbol{x}_2 ),\cdots,\phi (\boldsymbol{x}_m )) \tag{6.31} \end{align}
(ϕ(x1:m))T=def(ϕ(x1),ϕ(x2),⋯,ϕ(xm))(6.31)
则
(
ϕ
(
x
1
:
m
)
)
=
(
(
ϕ
(
x
1
)
)
T
;
(
ϕ
(
x
2
)
)
T
;
⋯
;
(
ϕ
(
x
m
)
)
T
)
(由下面式(0.2)
\begin{align} (\phi (\boldsymbol{x}_{1:\,m} ))= ((\phi (\boldsymbol{x}_1 ))^\mathrm{T};(\phi (\boldsymbol{x}_2 ))^\mathrm{T};\cdots;(\phi (\boldsymbol{x}_m ))^\mathrm{T})\quad \text{(由下面式(0.2)} \tag{6.32} \end{align}
(ϕ(x1:m))=((ϕ(x1))T;(ϕ(x2))T;⋯;(ϕ(xm))T)(由下面式(0.2)(6.32)
用到公式:
X
T
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T
=
(
x
1
T
;
x
2
T
;
⋯
;
x
n
T
)
\begin{align} %\mathbf{X} & =(\boldsymbol{x}_1,\boldsymbol{x}_2,\cdots,\boldsymbol{x}_n)\tag{eq:300-t02be} \\ \mathbf{X}^\mathrm{T} & =(\boldsymbol{x}_1,\boldsymbol{x}_2,\cdots,\boldsymbol{x}_n)^\mathrm{T}\notag \\ & =(\boldsymbol{x}_1^\mathrm{T};\boldsymbol{x}_2^\mathrm{T};\cdots;\boldsymbol{x}_n^\mathrm{T}) \tag{0.2} \end{align}
XT=(x1,x2,⋯,xn)T=(x1T;x2T;⋯;xnT)(0.2)
由式(6.31)、式(6.32)有
ϕ
(
x
1
:
m
)
(
ϕ
(
x
1
:
m
)
)
T
=
(
ϕ
(
x
1
)
T
;
ϕ
(
x
2
)
T
;
⋯
,
ϕ
(
x
m
)
T
)
(
ϕ
(
x
1
)
,
ϕ
(
x
2
)
,
⋯
,
ϕ
(
x
m
)
)
=
(
[
ϕ
(
x
i
)
T
ϕ
(
x
j
)
]
i
j
)
=
(
[
κ
(
x
i
,
x
j
)
]
i
j
)
(由【西瓜书式(6.22)】)
=
K
\begin{align} \phi (\boldsymbol{x}_{1:\,m} )(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} & = \left(\phi (\boldsymbol{x}_1 )^\mathrm{T};\phi (\boldsymbol{x}_2 )^\mathrm{T};\cdots,\phi (\boldsymbol{x}_m )^\mathrm{T}\right) \left(\phi (\boldsymbol{x}_1 ),\phi (\boldsymbol{x}_2 ),\cdots,\phi (\boldsymbol{x}_m )\right)\notag \\ & =([\phi (\boldsymbol{x}_i )^\mathrm{T}\phi (\boldsymbol{x}_j )]_{ij})\notag \\ & =([\kappa (x_i,x_j)]_{ij})\quad \text{(由【西瓜书式(6.22)】)}\notag \\ & =\mathbf{K} \tag{6.33} \end{align}
ϕ(x1:m)(ϕ(x1:m))T=(ϕ(x1)T;ϕ(x2)T;⋯,ϕ(xm)T)(ϕ(x1),ϕ(x2),⋯,ϕ(xm))=([ϕ(xi)Tϕ(xj)]ij)=([κ(xi,xj)]ij)(由【西瓜书式(6.22)】)=K(6.33)
由式(6.30)、式(6.31),改写【西瓜书式(6.61)】:
μ
i
ϕ
=
1
m
[
∑
x
j
∈
X
i
ϕ
(
x
j
)
+
∑
x
j
∉
X
i
0
]
=
1
m
i
[
∑
x
j
∈
D
I
(
x
j
∈
X
i
)
ϕ
(
x
j
)
]
=
1
m
i
(
ϕ
(
x
1
:
m
)
)
T
I
i
(
x
1
:
m
)
\begin{align} {\mu}_i^{\phi } & =\frac{1}{m}\left[\sum_{\boldsymbol{x}_j \in \mathbf{X}_i}{\phi }(\boldsymbol{x}_j)+\sum_{\boldsymbol{x}_j \notin \mathbf{X}_i}0\right]\notag \\ & =\frac{1}{m_i}\left[\sum_{\boldsymbol{x}_j \in D}\mathbb{I} (\boldsymbol{x}_j \in \mathbf{X}_i){\phi }(\boldsymbol{x}_j)\right]\notag \\ & =\frac{1}{m_i}(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\mathbb{I}_i (\boldsymbol{x}_{1:\,m} ) \tag{6.34} \end{align}
μiϕ=m1
xj∈Xi∑ϕ(xj)+xj∈/Xi∑0
=mi1
xj∈D∑I(xj∈Xi)ϕ(xj)
=mi1(ϕ(x1:m))TIi(x1:m)(6.34)
同样有
μ
j
ϕ
=
1
m
j
(
ϕ
(
x
1
:
m
)
)
T
I
j
(
x
1
:
m
)
\begin{align} {\mu}_j^{\phi } =\frac{1}{m_j}(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\mathbb{I}_j (\boldsymbol{x}_{1:\,m} ) \tag{6.35} \end{align}
μjϕ=mj1(ϕ(x1:m))TIj(x1:m)(6.35)
由式(6.34)、式(6.35),有
μ
i
ϕ
−
μ
j
ϕ
=
(
ϕ
(
x
1
:
m
)
)
T
[
1
m
i
I
i
(
x
1
:
m
)
−
1
m
j
I
j
(
x
1
:
m
)
]
\begin{align} {\mu}_i^{\phi } -{\mu}_j^{\phi } =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_i}\mathbb{I}_i (\boldsymbol{x}_{1:\,m} )-\frac{1}{m_j}\mathbb{I}_j (\boldsymbol{x}_{1:\,m} )\right] \tag{6.36} \end{align}
μiϕ−μjϕ=(ϕ(x1:m))T[mi1Ii(x1:m)−mj1Ij(x1:m)](6.36)
由式(6.36)改写【西瓜书式(6.62)】:
S
b
ϕ
=
(
ϕ
(
x
1
:
m
)
)
T
[
1
m
1
I
1
(
x
1
:
m
)
−
1
m
0
I
0
(
x
1
:
m
)
]
(
(
ϕ
(
x
1
:
m
)
)
T
[
1
m
1
I
1
−
1
m
0
I
0
]
)
T
=
(
ϕ
(
x
1
:
m
)
)
T
[
I
1
(
x
1
:
m
)
m
1
−
I
0
(
x
1
:
m
)
m
0
]
[
I
1
(
x
1
:
m
)
m
1
−
I
0
(
x
1
:
m
)
m
0
]
T
ϕ
(
x
1
:
m
)
=
ϕ
T
[
⋅
]
[
⋅
]
T
ϕ
(简记)
\begin{align} \mathbf{S}_{\mathrm{b}}^{\phi } & =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_1}\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )-\frac{1}{m_0}\mathbb{I}_0 (\boldsymbol{x}_{1:\,m} )\right]\left((\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\left[\frac{1}{m_1}\mathbb{I}_1 -\frac{1}{m_0}\mathbb{I}_0 \right]\right)^\mathrm{T}\notag \\ & =(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T} \left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0} \right] \left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0}\right]^\mathrm{T}\phi (\boldsymbol{x}_{1:\,m} )\notag \\ & ={\phi}^\mathrm{T}[\cdot][\cdot]^\mathrm{T}{\phi}\qquad \text{(简记)} \tag{6.37} \end{align}
Sbϕ=(ϕ(x1:m))T[m11I1(x1:m)−m01I0(x1:m)]((ϕ(x1:m))T[m11I1−m01I0])T=(ϕ(x1:m))T[m1I1(x1:m)−m0I0(x1:m)][m1I1(x1:m)−m0I0(x1:m)]Tϕ(x1:m)=ϕT[⋅][⋅]Tϕ(简记)(6.37)
由式(6.31)改写式(6.29):
w
=
ϕ
(
x
1
:
m
)
T
α
,
(
α
=
(
α
1
;
α
2
;
⋯
;
α
m
)
)
\begin{align} \boldsymbol{w}=\phi (\boldsymbol{x}_{1:\,m} )^\mathrm{T}\boldsymbol{\alpha},\quad (\boldsymbol{\alpha}=({\alpha}_1;{\alpha}_2;\cdots;{\alpha}_m)) \tag{6.38} \end{align}
w=ϕ(x1:m)Tα,(α=(α1;α2;⋯;αm))(6.38)
由式(6.37)、式(6.38)有(必要时采用简记)
w
T
S
b
ϕ
w
=
(
ϕ
(
x
1
:
m
)
T
α
)
T
S
b
ϕ
(
ϕ
(
x
1
:
m
)
)
T
α
(由式(6.38))
=
α
T
ϕ
[
ϕ
T
[
⋅
]
[
⋅
]
T
ϕ
]
ϕ
T
α
(由式(6.37))
=
α
T
(
ϕ
ϕ
T
)
[
⋅
]
[
⋅
]
T
(
ϕ
ϕ
T
)
α
=
α
T
K
[
⋅
]
[
⋅
]
T
K
α
(由式(6.33))
=
α
T
(
K
[
⋅
]
)
(
[
⋅
]
T
K
T
)
α
(由
K
的对称性)
=
α
T
(
K
[
⋅
]
)
(
K
[
⋅
]
)
T
α
\begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }\boldsymbol{w} & =\left(\phi (\boldsymbol{x}_{1:\,m} )^\mathrm{T}\boldsymbol{\alpha}\right)^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }(\phi (\boldsymbol{x}_{1:\,m} ))^\mathrm{T}\boldsymbol{\alpha}\quad \text{(由式(6.38))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\phi[{\phi}^\mathrm{T}[\cdot][\cdot]^\mathrm{T}{\phi}]{\phi}^\mathrm{T}\boldsymbol{\alpha}\quad \text{(由式(6.37))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\phi{\phi}^\mathrm{T})[\cdot][\cdot]^\mathrm{T}({\phi}{\phi}^\mathrm{T})\boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{K}[\cdot][\cdot]^\mathrm{T}\mathbf{K}\boldsymbol{\alpha}\quad \text{(由式(6.33))}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\mathbf{K}[\cdot])([\cdot]^\mathrm{T}\mathbf{K}^\mathrm{T})\boldsymbol{\alpha}\quad \text{(由$\mathbf{K}$的对称性)}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\mathbf{K}[\cdot])(\mathbf{K}[\cdot])^\mathrm{T}\boldsymbol{\alpha}\quad \tag{6.39} \end{align}
wTSbϕw=(ϕ(x1:m)Tα)TSbϕ(ϕ(x1:m))Tα(由式(6.38))=αTϕ[ϕT[⋅][⋅]Tϕ]ϕTα(由式(6.37))=αT(ϕϕT)[⋅][⋅]T(ϕϕT)α=αTK[⋅][⋅]TKα(由式(6.33))=αT(K[⋅])([⋅]TKT)α(由K的对称性)=αT(K[⋅])(K[⋅])Tα(6.39)
其中
K
[
⋅
]
=
K
[
I
1
(
x
1
:
m
)
m
1
−
I
0
(
x
1
:
m
)
m
0
]
=
1
m
1
K
I
1
(
x
1
:
m
)
−
1
m
0
K
I
0
(
x
1
:
m
)
\begin{align} \mathbf{K}[\cdot] & =\mathbf{K}\left[\frac{\mathbb{I}_1 (\boldsymbol{x}_{1:\,m} )}{m_1}-\frac{\mathbb{I}_0 (\boldsymbol{x}_{1:\,m})}{m_0}\right]\notag \\ & =\frac{1}{m_1}\mathbf{K}\mathbb{I}_1 (\boldsymbol{x}_{1:\,m})-\frac{1}{m_0}\mathbf{K}\mathbb{I}_0 (\boldsymbol{x}_{1:\,m} ) \end{align}
K[⋅]=K[m1I1(x1:m)−m0I0(x1:m)]=m11KI1(x1:m)−m01KI0(x1:m)
引入【西瓜书式(6.66)
∼
\,\thicksim
∼(6.69)】定义及记号
1
i
=
d
e
f
I
i
(
x
1
:
m
)
\boldsymbol{1}_i\mathop{=} \limits^{\mathrm{def}} \mathbb{I}_i (\boldsymbol{x}_{1:\,m})
1i=defIi(x1:m),则式(6.39)变为
w
T
S
b
ϕ
w
=
α
T
[
1
m
1
K
1
1
−
1
m
0
K
1
0
]
[
1
m
1
K
1
1
−
1
m
0
K
1
0
]
T
α
=
α
T
(
μ
^
1
−
μ
^
0
)
(
μ
^
1
−
μ
^
0
)
T
α
=
α
T
M
α
\begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{b}}^{\phi }\boldsymbol{w} & =\boldsymbol{\alpha}^\mathrm{T} \left[\frac{1}{m_1}\mathbf{K}\boldsymbol{1}_1-\frac{1}{m_0}\mathbf{K}\boldsymbol{1}_0\right] \left[\frac{1}{m_1}\mathbf{K}\boldsymbol{1}_1-\frac{1}{m_0}\mathbf{K}\boldsymbol{1}_0\right]^\mathrm{T} \boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}(\hat{\boldsymbol{\mu} }_1-\hat{\boldsymbol{\mu} }_0)(\hat{\boldsymbol{\mu} }_1-\hat{\boldsymbol{\mu} }_0)^\mathrm{T}\boldsymbol{\alpha}\notag \\ & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{M}\boldsymbol{\alpha} \tag{6.40} \end{align}
wTSbϕw=αT[m11K11−m01K10][m11K11−m01K10]Tα=αT(μ^1−μ^0)(μ^1−μ^0)Tα=αTMα(6.40)
与上述推导式(6.40)过程类似,有
w
T
S
w
ϕ
w
=
α
T
N
α
\begin{align} \boldsymbol{w}^\mathrm{T}\mathbf{S}_{\mathrm{w}}^{\phi }\boldsymbol{w} & =\boldsymbol{\alpha}^\mathrm{T}\mathbf{N}\boldsymbol{\alpha} \tag{6.41} \end{align}
wTSwϕw=αTNα(6.41)
由式(6.40)、式(6.41),优化目标由【西瓜书式(6.60)】变为【西瓜书式(6.70)】,这样,就可以使用第3章的线性判别分析(LDA)求解(参照【西瓜书式(3.35)】的求解过程)。
问题来了:【西瓜书式(6.60)】与【西瓜书式(6.70)】这两个式子形式上差不多,为什么不直接求前者?
因为,前者是求 w \boldsymbol{w} w,由式(6.29)知,它与 ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数关连,而该函数通常是不知道的。 转化成后者之后, ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数相关的内容成了核矩阵(式(6.33)),核矩阵 K \mathbf{K} K体现在 M \mathbf{M} M和 N \mathbf{N} N中,而核矩阵 K \mathbf{K} K通常是已知的,也就是【西瓜书式(6.70)】避开了未知的 ϕ ( x i ) \phi (\boldsymbol{x}_i) ϕ(xi)函数,这就是目标表达式转换的原因。
本文为原创,您可以:
- 点赞(支持博主)
- 收藏(待以后看)
- 转发(他考研或学习,正需要)
- 评论(或讨论)
- 引用(支持原创)
- 不侵权