【机器学习基础】系列博客为参考周志华老师的《机器学习》一书,自己所做的读书笔记。
1.表示定理
无论SVM还是SVR,学得的模型总能表示成核函数 κ ( x , x i ) \kappa (\mathbf x,\mathbf x_i) κ(x,xi)的线性组合。不仅如此,事实上我们有下面这个称为“表示定理”(representer theorem)的更一般的结论:
表示定理: 令 H \mathbb{H} H为核函数 κ \kappa κ对应的再生核希尔伯特空间, ∥ ∥ h ∥ ∥ H \|\| h\|\|_{\mathbb{H}} ∥∥h∥∥H表示 H \mathbb{H} H空间中关于 h h h的范数,对于任意单调递增函数 Ω : [ 0 , ∞ ] ↦ R \Omega:[0,\infty] \mapsto \mathbb{R} Ω:[0,∞]↦R和任意非负损失函数 ℓ : R m ↦ [ 0 , ∞ ] \ell : \mathbb{R}^m \mapsto [0,\infty] ℓ:Rm↦[0,∞],优化问题
min h ∈ H F ( h ) = Ω ( ∣ ∣ h ∣ ∣ H ) + ℓ ( h ( x 1 ) , h ( x 2 ) , . . . , h ( x m ) ) (1) \min \limits_{h \in \mathbb{H}} F(h)=\Omega(|| h||_{\mathbb{H}})+\ell(h(\mathbf x_1),h(\mathbf x_2),...,h(\mathbf x_m)) \tag{1} h∈HminF(h)=Ω(∣∣h∣∣H)+ℓ(h(x1),h(x2),...,h(xm))(1)
的解总可写为
h ∗ ( x ) = ∑ i = 1 m α i κ ( x , x i ) (2) h^*(\mathbf x)=\sum^m_{i=1}\alpha_i \kappa (\mathbf x,\mathbf x_i) \tag{2} h∗(x)=i=1∑mαiκ(x,xi)(2)
表示定理对损失函数没有限制,对正则化项 Ω \Omega Ω仅要求单调递增,甚至不要求 Ω \Omega Ω是凸函数,意味着对于一般的损失函数和正则化项,优化问题(1)的最优解 h ∗ ( x ) h^*(\mathbf x) h∗(x)都可表示为核函数 κ ( x , x i ) \kappa (\mathbf x,\mathbf x_i) κ(x,xi)的线性组合;这显示出核函数的巨大威力。
2.核方法
人们发展出一系列基于核函数的学习方法,统称为 “核方法”(kernel methods) 。最常见的,是通过“核化”(即引入核函数)来将线性学习器拓展为非线性学习器。下面以线性判别分析为例来演示如何通过核化来对其进行非线性拓展,从而得到 “核线性判别分析”(Kernelized Linear Discriminant Analysis,简称KLDA)。
我们先假设可通过某种映射 ϕ : χ ↦ F \phi : \chi \mapsto \mathbb{F} ϕ:χ↦F将样本映射到一个特征空间 F \mathbb{F} F,然后在 F \mathbb{F} F中执行线性判别分析,以求得:
h ( x ) = w T ϕ ( x ) (3) h(\mathbf x)=\mathbf w^T \phi (\mathbf x) \tag{3} h(x)=wTϕ(x)(3)
KLDA的学习目标是:
max w J ( w ) = w T S b ϕ w w T S w ϕ w (4) \max \limits_{\mathbf w} J(\mathbf w)=\frac{\mathbf w^T \mathbf S_b^{\phi} \mathbf w}{\mathbf w^T \mathbf S_w^{\phi} \mathbf w} \tag{4} wmaxJ(w)=wTSwϕwwTSbϕw(4)
其中 S b ϕ \mathbf S_b^{\phi} Sbϕ和 S w ϕ \mathbf S_w^{\phi} Swϕ分别为训练样本在特征空间 F \mathbb{F} F中的类间散度矩阵和类内散度矩阵。令 X i \mathbf{X}_i Xi表示第 i ∈ { 0 , 1 } i \in \{0,1\} i∈{0,1}类样本的集合,其样本数为 m i m_i mi;总样本数 m = m 0 + m 1 m=m_0+m_1 m=m0+m1。第 i i i类样本在特征空间 F \mathbb{F} F中的均值为:
μ i ϕ = 1 m i ∑ x ∈ X i ϕ ( x ) (5) \mathbf{\mu}_i^{\phi}=\frac{1}{m_i} \sum_{\mathbf x \in \mathbf X_i} \phi(\mathbf x) \tag{5} μiϕ=mi1x∈Xi∑ϕ(x)(5)
两个散度矩阵分别为:
S b ϕ = ( μ 1 ϕ − μ 0 ϕ ) ( μ 1 ϕ − μ 0 ϕ ) T (6) \mathbf{S}_b^{\phi}=(\mathbf{\mu}_1^{\phi} - \mathbf{\mu}_0^{\phi})(\mathbf{\mu}_1^{\phi} - \mathbf{\mu}_0^{\phi})^T \tag{6} Sbϕ=(μ1ϕ−μ0ϕ)(μ1ϕ−μ0ϕ)T(6)
S w ϕ = ∑ i = 0 1 ∑ x ∈ X i ( ϕ ( x ) − μ i ϕ ) ( ϕ ( x ) − μ i ϕ ) T (7) \mathbf{S}_w^{\phi}=\sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i}(\phi (\mathbf{x})-\mathbf{\mu}_i^{\phi})(\phi (\mathbf{x})-\mathbf{\mu}_i^{\phi})^T \tag{7} Swϕ=i=0∑1x∈Xi∑(ϕ(x)−μiϕ)(ϕ(x)−μiϕ)T(7)
通常我们难以知道映射 ϕ \phi ϕ的具体形式,因此使用核函数 κ ( x , x i ) = ϕ ( x i ) T ϕ ( x ) \kappa(\mathbf x,\mathbf x_i)=\phi(\mathbf x_i)^T \phi(\mathbf x) κ(x,xi)=ϕ(xi)Tϕ(x)来隐式地表达这个映射和特征空间 F \mathbb{F} F。把 J ( w ) J(\mathbf w) J(w)作为式(1)中的损失函数 ℓ \ell ℓ,再令 Ω ≡ 0 \Omega \equiv 0 Ω≡0,由表示定理,函数 h ( x ) h(\mathbf x) h(x)可写为:
h ( x ) = ∑ i = 1 m α i κ ( x , x i ) (8) h(\mathbf x)=\sum^m_{i=1}\alpha_i \kappa (\mathbf x,\mathbf x_i) \tag{8} h(x)=i=1∑mαiκ(x,xi)(8)
≡ \equiv ≡为恒等于,即无论条件如何变化,等式始终保持不变。
因为有式(3)等于式(8):
w T ϕ ( x ) = ∑ i = 1 m α i κ ( x , x i ) (9) \mathbf w^T \phi (\mathbf x) = \sum^m_{i=1}\alpha_i \kappa (\mathbf x,\mathbf x_i) \tag{9} wTϕ(x)=i=1∑mαiκ(x,xi)(9)
将 κ ( x , x i ) = ϕ ( x ) T ϕ ( x i ) \kappa(\mathbf x,\mathbf x_i)=\phi(\mathbf x)^T \phi(\mathbf x_i) κ(x,xi)=ϕ(x)Tϕ(xi)代入式(9):
w T ϕ ( x ) = ∑ i = 1 m α i ϕ ( x ) T ϕ ( x i ) (10) \mathbf w^T \phi (\mathbf x) = \sum^m_{i=1}\alpha_i \phi(\mathbf x)^T \phi(\mathbf x_i) \tag{10} wTϕ(x)=i=1∑mαiϕ(x)Tϕ(xi)(10)
w T ϕ ( x ) = ϕ ( x ) T ∑ i = 1 m α i ϕ ( x i ) (11) \mathbf w^T \phi (\mathbf x) = \phi(\mathbf x)^T \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \tag{11} wTϕ(x)=ϕ(x)Ti=1∑mαiϕ(xi)(11)
由于 w T ϕ ( x ) \mathbf w^T \phi (\mathbf x) wTϕ(x)的计算结果为标量,而标量的转置等于其本身,所以:
w T ϕ ( x ) = ( w T ϕ ( x ) ) T = ϕ ( x ) T ∑ i = 1 m α i ϕ ( x i ) (12) \mathbf w^T \phi (\mathbf x) = (\mathbf w^T \phi (\mathbf x))^T = \phi(\mathbf x)^T \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \tag{12} wTϕ(x)=(wTϕ(x))T=ϕ(x)Ti=1∑mαiϕ(xi)(12)
w T ϕ ( x ) = ϕ ( x ) T w = ϕ ( x ) T ∑ i = 1 m α i ϕ ( x i ) (13) \mathbf w^T \phi (\mathbf x) = \phi(\mathbf x)^T \mathbf w = \phi(\mathbf x)^T \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \tag{13} wTϕ(x)=ϕ(x)Tw=ϕ(x)Ti=1∑mαiϕ(xi)(13)
w = ∑ i = 1 m α i ϕ ( x i ) (14) \mathbf w = \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \tag{14} w=i=1∑mαiϕ(xi)(14)
令 K ∈ R m × m \mathbf K \in \mathbb{R}^{m\times m} K∈Rm×m为核函数 κ \kappa κ所对应的核矩阵, ( K ) i j = κ ( x i , x j ) (\mathbf K)_{ij}=\kappa(\mathbf x_i,\mathbf x_j) (K)ij=κ(xi,xj)。令 l i ∈ { 1 , 0 } m × 1 \mathbf l_i \in \{ 1,0 \}^{m\times 1} li∈{1,0}m×1为第 i i i类样本的指示向量,即 l i \mathbf l_i li的第 j j j个分量为1当且仅当 x j ∈ X i \mathbf x_j \in \mathbf X_i xj∈Xi,否则 l i \mathbf l_i li的第 j j j个分量为0。再令:
μ ^ 0 = 1 m 0 K l 0 (15) \hat{\mathbf{\mu}}_0=\frac{1}{m_0} \mathbf{Kl}_0 \tag{15} μ^0=m01Kl0(15)
μ ^ 1 = 1 m 1 K l 1 (16) \hat{\mathbf{\mu}}_1=\frac{1}{m_1} \mathbf{Kl}_1 \tag{16} μ^1=m11Kl1(16)
M = ( μ ^ 0 − μ ^ 1 ) ( μ ^ 0 − μ ^ 1 ) T (17) \mathbf{M}=(\hat{\mathbf \mu}_0-\hat{\mathbf{\mu}}_1)(\hat{\mathbf \mu}_0-\hat{\mathbf{\mu}}_1)^T \tag{17} M=(μ^0−μ^1)(μ^0−μ^1)T(17)
N = K K T − ∑ i = 0 1 m i μ ^ i μ ^ i T (18) \mathbf{N}=\mathbf{KK}^T-\sum_{i=0}^1 m_i \hat{\mathbf{\mu}}_i \hat{\mathbf{\mu}}_i^T \tag{18} N=KKT−i=0∑1miμ^iμ^iT(18)
于是,式(4)等价为:
max α J ( α ) α T M α α T N α (19) \max \limits_{\mathbf{\alpha}} J(\mathbf{\alpha}) \frac{\mathbf{\alpha}^T \mathbf{M} \mathbf{\alpha}}{\mathbf{\alpha}^T \mathbf{N} \mathbf{\alpha}} \tag{19} αmaxJ(α)αTNααTMα(19)
显然,使用线性判别分析求解方法即可得到 α \mathbf{\alpha} α,进而可由式(8)得到投影函数 h ( x ) h(\mathbf x) h(x)。
2.1.式(15)、式(16)的推导
为了详细地说明此公式的计算原理,下面首先先举例说明,然后再在例子的基础上延展出其一般形式。假设此时仅有4个样本,其中第1和第3个样本的标记为0,第2和第4个样本的标记为1,那么此时:
m = 4 (20) m=4 \tag{20} m=4(20)
m 0 = 2 , m 1 = 2 (21) m_0=2,m_1=2 \tag{21} m0=2,m1=2(21)
X 0 = { x 1 , x 3 } , X 1 = { x 2 , x 4 } (22) X_0=\{\mathbf x_1,\mathbf x_3 \},X_1=\{\mathbf x_2,\mathbf x_4 \} \tag{22} X0={x1,x3},X1={x2,x4}(22)
K = [ κ ( x 1 , x 1 ) κ ( x 1 , x 2 ) κ ( x 1 , x 3 ) κ ( x 1 , x 4 ) κ ( x 2 , x 1 ) κ ( x 2 , x 2 ) κ ( x 2 , x 3 ) κ ( x 2 , x 4 ) κ ( x 3 , x 1 ) κ ( x 3 , x 2 ) κ ( x 3 , x 3 ) κ ( x 3 , x 4 ) κ ( x 4 , x 1 ) κ ( x 4 , x 2 ) κ ( x 4 , x 3 ) κ ( x 4 , x 4 ) ] ∈ R 4 × 4 (23) \mathbf{K}=\begin{bmatrix} \kappa(\mathbf{x}_1,\mathbf{x}_1) & \kappa(\mathbf{x}_1,\mathbf{x}_2) & \kappa(\mathbf{x}_1,\mathbf{x}_3) & \kappa(\mathbf{x}_1,\mathbf{x}_4) \\ \kappa(\mathbf{x}_2,\mathbf{x}_1) & \kappa(\mathbf{x}_2,\mathbf{x}_2) & \kappa(\mathbf{x}_2,\mathbf{x}_3) & \kappa(\mathbf{x}_2,\mathbf{x}_4) \\ \kappa(\mathbf{x}_3,\mathbf{x}_1) & \kappa(\mathbf{x}_3,\mathbf{x}_2) & \kappa(\mathbf{x}_3,\mathbf{x}_3) & \kappa(\mathbf{x}_3,\mathbf{x}_4) \\ \kappa(\mathbf{x}_4,\mathbf{x}_1) & \kappa(\mathbf{x}_4,\mathbf{x}_2) & \kappa(\mathbf{x}_4,\mathbf{x}_3) & \kappa(\mathbf{x}_4,\mathbf{x}_4) \\ \end{bmatrix} \in \mathbb{R}^{4\times 4} \tag{23} K= κ(x1,x1)κ(x2,x1)κ(x3,x1)κ(x4,x1)κ(x1,x2)κ(x2,x2)κ(x3,x2)κ(x4,x2)κ(x1,x3)κ(x2,x3)κ(x3,x3)κ(x4,x3)κ(x1,x4)κ(x2,x4)κ(x3,x4)κ(x4,x4) ∈R4×4(23)
l 0 = [ 1 0 1 0 ] ∈ R 4 × 1 (24) \mathbf{l}_0 = \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \\ \end{bmatrix} \in \mathbb{R}^{4\times 1} \tag{24} l0= 1010 ∈R4×1(24)
l 1 = [ 0 1 0 1 ] ∈ R 4 × 1 (25) \mathbf{l}_1 = \begin{bmatrix} 0 \\ 1 \\ 0 \\ 1 \\ \end{bmatrix} \in \mathbb{R}^{4\times 1} \tag{25} l1= 0101 ∈R4×1(25)
所以:
μ ^ 0 = 1 m 0 K l 0 = 1 2 [ κ ( x 1 , x 1 ) + κ ( x 1 , x 3 ) κ ( x 2 , x 1 ) + κ ( x 2 , x 3 ) κ ( x 3 , x 1 ) + κ ( x 3 , x 3 ) κ ( x 4 , x 1 ) + κ ( x 4 , x 3 ) ] ∈ R 4 × 1 (26) \hat{\mathbf{\mu}}_0=\frac{1}{m_0} \mathbf{Kl}_0=\frac{1}{2} \begin{bmatrix} \kappa(\mathbf{x}_1,\mathbf{x}_1)+\kappa(\mathbf{x}_1,\mathbf{x}_3) \\ \kappa(\mathbf{x}_2,\mathbf{x}_1)+\kappa(\mathbf{x}_2,\mathbf{x}_3) \\ \kappa(\mathbf{x}_3,\mathbf{x}_1)+\kappa(\mathbf{x}_3,\mathbf{x}_3) \\ \kappa(\mathbf{x}_4,\mathbf{x}_1)+\kappa(\mathbf{x}_4,\mathbf{x}_3) \\\end{bmatrix} \in \mathbb{R}^{4\times 1} \tag{26} μ^0=m01Kl0=21 κ(x1,x1)+κ(x1,x3)κ(x2,x1)+κ(x2,x3)κ(x3,x1)+κ(x3,x3)κ(x4,x1)+κ(x4,x3) ∈R4×1(26)
μ ^ 1 = 1 m 1 K l 1 = 1 2 [ κ ( x 1 , x 2 ) + κ ( x 1 , x 4 ) κ ( x 2 , x 2 ) + κ ( x 2 , x 4 ) κ ( x 3 , x 2 ) + κ ( x 3 , x 4 ) κ ( x 4 , x 2 ) + κ ( x 4 , x 4 ) ] ∈ R 4 × 1 (27) \hat{\mathbf{\mu}}_1=\frac{1}{m_1} \mathbf{Kl}_1=\frac{1}{2} \begin{bmatrix} \kappa(\mathbf{x}_1,\mathbf{x}_2)+\kappa(\mathbf{x}_1,\mathbf{x}_4) \\ \kappa(\mathbf{x}_2,\mathbf{x}_2)+\kappa(\mathbf{x}_2,\mathbf{x}_4) \\ \kappa(\mathbf{x}_3,\mathbf{x}_2)+\kappa(\mathbf{x}_3,\mathbf{x}_4) \\ \kappa(\mathbf{x}_4,\mathbf{x}_2)+\kappa(\mathbf{x}_4,\mathbf{x}_4) \\\end{bmatrix} \in \mathbb{R}^{4\times 1} \tag{27} μ^1=m11Kl1=21 κ(x1,x2)+κ(x1,x4)κ(x2,x2)+κ(x2,x4)κ(x3,x2)+κ(x3,x4)κ(x4,x2)+κ(x4,x4) ∈R4×1(27)
根据此结果易得 μ ^ 0 , μ ^ 1 \hat{\mathbf{\mu}}_0,\hat{\mathbf{\mu}}_1 μ^0,μ^1的一般形式为:
μ ^ 0 = 1 m 0 K l 0 = 1 m 0 [ ∑ x ∈ X 0 κ ( x 1 , x ) ∑ x ∈ X 0 κ ( x 2 , x ) ⋮ ∑ x ∈ X 0 κ ( x m , x ) ] ∈ R m × 1 (28) \hat{\mathbf{\mu}}_0=\frac{1}{m_0} \mathbf{Kl}_0=\frac{1}{m_0} \begin{bmatrix} \sum_{\mathbf{x}\in X_0} \kappa(\mathbf{x}_1,\mathbf{x}) \\ \sum_{\mathbf{x}\in X_0} \kappa(\mathbf{x}_2,\mathbf{x}) \\ \vdots \\ \sum_{\mathbf{x}\in X_0} \kappa(\mathbf{x}_m,\mathbf{x}) \end{bmatrix} \in \mathbb{R}^{m\times 1} \tag{28} μ^0=m01Kl0=m01 ∑x∈X0κ(x1,x)∑x∈X0κ(x2,x)⋮∑x∈X0κ(xm,x) ∈Rm×1(28)
μ ^ 1 = 1 m 1 K l 1 = 1 m 1 [ ∑ x ∈ X 1 κ ( x 1 , x ) ∑ x ∈ X 1 κ ( x 2 , x ) ⋮ ∑ x ∈ X 1 κ ( x m , x ) ] ∈ R m × 1 (29) \hat{\mathbf{\mu}}_1=\frac{1}{m_1} \mathbf{Kl}_1=\frac{1}{m_1} \begin{bmatrix} \sum_{\mathbf{x}\in X_1} \kappa(\mathbf{x}_1,\mathbf{x}) \\ \sum_{\mathbf{x}\in X_1} \kappa(\mathbf{x}_2,\mathbf{x}) \\ \vdots \\ \sum_{\mathbf{x}\in X_1} \kappa(\mathbf{x}_m,\mathbf{x}) \end{bmatrix} \in \mathbb{R}^{m\times 1} \tag{29} μ^1=m11Kl1=m11 ∑x∈X1κ(x1,x)∑x∈X1κ(x2,x)⋮∑x∈X1κ(xm,x) ∈Rm×1(29)
2.2.式(19)的推导
首先将式(14)代入式(4)的分子可得:
w T S b ϕ w = ( ∑ i = 1 m α i ϕ ( x i ) ) T ⋅ S b ϕ ⋅ ∑ i = 1 m α i ϕ ( x i ) = ∑ i = 1 m α i ϕ ( x i ) T ⋅ S b ϕ ⋅ ∑ i = 1 m α i ϕ ( x i ) (30) \begin{align} \mathbf w^T \mathbf S_b^{\phi} \mathbf w &= \left( \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \right) ^T \cdot \mathbf S_b^{\phi} \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \\&= \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) ^T \cdot \mathbf S_b^{\phi} \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i)\end{align} \tag{30} wTSbϕw=(i=1∑mαiϕ(xi))T⋅Sbϕ⋅i=1∑mαiϕ(xi)=i=1∑mαiϕ(xi)T⋅Sbϕ⋅i=1∑mαiϕ(xi)(30)
其中:
S b ϕ = ( μ 1 ϕ − μ 0 ϕ ) ( μ 1 ϕ − μ 0 ϕ ) T = ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ) ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ) T = ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ) ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) T − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) T ) (31) \begin{align} \mathbf{S}_b^{\phi}&=(\mathbf{\mu}_1^{\phi} - \mathbf{\mu}_0^{\phi})(\mathbf{\mu}_1^{\phi} - \mathbf{\mu}_0^{\phi})^T \\&= \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x}) -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x}) \right) \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x}) -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x}) \right)^T \\&= \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x}) -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x}) \right) \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x})^T -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x})^T \right) \end{align} \tag{31} Sbϕ=(μ1ϕ−μ0ϕ)(μ1ϕ−μ0ϕ)T=(m11x∈X1∑ϕ(x)−m01x∈X0∑ϕ(x))(m11x∈X1∑ϕ(x)−m01x∈X0∑ϕ(x))T=(m11x∈X1∑ϕ(x)−m01x∈X0∑ϕ(x))(m11x∈X1∑ϕ(x)T−m01x∈X0∑ϕ(x)T)(31)
将式(31)代入式(30):
w T S b ϕ w = ∑ i = 1 m α i ϕ ( x i ) T ⋅ ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ) ( 1 m 1 ∑ x ∈ X 1 ϕ ( x ) T − 1 m 0 ∑ x ∈ X 0 ϕ ( x ) T ) ⋅ ∑ i = 1 m α i ϕ ( x i ) = ( 1 m 1 ∑ x ∈ X 1 ∑ i = 1 m α i ϕ ( x i ) T ϕ ( x ) − 1 m 0 ∑ x ∈ X 0 ∑ i = 1 m α i ϕ ( x i ) T ϕ ( x ) ) ( 1 m 1 ∑ x ∈ X 1 ∑ i = 1 m α i ϕ ( x ) T ϕ ( x i ) − 1 m 0 ∑ x ∈ X 0 ∑ i = 1 m α i ϕ ( x ) T ϕ ( x i ) ) (32) \begin{align} \mathbf w^T \mathbf S_b^{\phi} \mathbf w &= \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) ^T \cdot \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x}) -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x}) \right) \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \phi (\mathbf{x})^T -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \phi (\mathbf{x})^T \right) \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \\&= \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \sum^m_{i=1} \alpha_i \phi (\mathbf{x}_i)^T \phi (\mathbf{x}) -\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \sum^m_{i=1} \alpha_i \phi (\mathbf{x}_i)^T \phi (\mathbf{x}) \right) \left( \frac{1}{m_1}\sum_{\mathbf{x}\in X_1} \sum^m_{i=1} \alpha_i \phi (\mathbf{x})^T \phi (\mathbf{x}_i)-\frac{1}{m_0}\sum_{\mathbf{x}\in X_0} \sum^m_{i=1} \alpha_i \phi (\mathbf{x})^T \phi (\mathbf{x}_i) \right) \end{align} \tag{32} wTSbϕw=i=1∑mαiϕ(xi)T⋅(m11x∈X1∑ϕ(x)−m01x∈X0∑ϕ(x))(m11x∈X1∑ϕ(x)T−m01x∈X0∑ϕ(x)T)⋅i=1∑mαiϕ(xi)=(m11x∈X1∑i=1∑mαiϕ(xi)Tϕ(x)−m01x∈X0∑i=1∑mαiϕ(xi)Tϕ(x))(m11x∈X1∑i=1∑mαiϕ(x)Tϕ(xi)−m01x∈X0∑i=1∑mαiϕ(x)Tϕ(xi))(32)
由于 κ ( x i , x ) = ϕ ( x i ) T ϕ ( x ) \kappa(\mathbf x_i,\mathbf x)=\phi (\mathbf x_i)^T \phi (\mathbf x) κ(xi,x)=ϕ(xi)Tϕ(x)为标量,所以其转置等于本身,也即 κ ( x i , x ) = ϕ ( x i ) T ϕ ( x ) = ( ϕ ( x i ) T ϕ ( x ) ) T = ϕ ( x ) T ϕ ( x i ) = κ ( x i , x ) T \kappa(\mathbf x_i,\mathbf x)=\phi (\mathbf x_i)^T \phi (\mathbf x)=(\phi (\mathbf x_i)^T \phi (\mathbf x))^T=\phi (\mathbf x)^T \phi (\mathbf x_i)=\kappa (\mathbf x_i , \mathbf x)^T κ(xi,x)=ϕ(xi)Tϕ(x)=(ϕ(xi)Tϕ(x))T=ϕ(x)Tϕ(xi)=κ(xi,x)T,将其代入式(32)可得:
w T S b ϕ w = ( 1 m 1 ∑ i = 1 m ∑ x ∈ X 1 α i κ ( x i , x ) − 1 m 0 ∑ i = 1 m ∑ x ∈ X 0 α i κ ( x i , x ) ) ( 1 m 1 ∑ i = 1 m ∑ x ∈ X 1 α i κ ( x i , x ) − 1 m 0 ∑ i = 1 m ∑ x ∈ X 0 α i κ ( x i , x ) ) (33) \mathbf w^T \mathbf S_b^{\phi} \mathbf w =\left( \frac{1}{m_1} \sum^m_{i=1} \sum_{\mathbf{x}\in X_1} \alpha_i \kappa(\mathbf x_i,\mathbf x) -\frac{1}{m_0} \sum^m_{i=1} \sum_{\mathbf{x}\in X_0} \alpha_i \kappa(\mathbf x_i,\mathbf x) \right) \left( \frac{1}{m_1} \sum^m_{i=1} \sum_{\mathbf{x}\in X_1} \alpha_i \kappa(\mathbf x_i,\mathbf x)-\frac{1}{m_0} \sum^m_{i=1} \sum_{\mathbf{x}\in X_0} \alpha_i \kappa(\mathbf x_i,\mathbf x) \right) \tag{33} wTSbϕw=(m11i=1∑mx∈X1∑αiκ(xi,x)−m01i=1∑mx∈X0∑αiκ(xi,x))(m11i=1∑mx∈X1∑αiκ(xi,x)−m01i=1∑mx∈X0∑αiκ(xi,x))(33)
令 α = ( α 1 ; α 2 ; ⋯ ; α m ) T ∈ R m × 1 \mathbf \alpha=(\alpha_1;\alpha_2;\cdots;\alpha_m)^T \in \mathbb{R}^{m\times 1} α=(α1;α2;⋯;αm)T∈Rm×1,同时代入式(28)和式(29),则式(33)可化简为:
w T S b ϕ w = ( α T μ ^ 1 − α T μ ^ 0 ) ⋅ ( μ ^ 1 T α − μ ^ 0 T α ) = α T ⋅ ( μ ^ 1 − μ ^ 0 ) ⋅ ( μ ^ 1 T − μ ^ 0 T ) ⋅ α = α T ⋅ ( μ ^ 1 − μ ^ 0 ) ⋅ ( μ ^ 1 − μ ^ 0 ) T ⋅ α = α T M α (34) \begin{align} \mathbf w^T \mathbf S_b^{\phi} \mathbf w &= (\mathbf \alpha^T \hat{\mathbf \mu}_1 - \mathbf \alpha^T \hat{\mathbf \mu}_0) \cdot (\hat{\mathbf \mu}_1^T \mathbf \alpha -\hat{\mathbf \mu}_0^T \mathbf \alpha) \\&= \mathbf \alpha^T \cdot (\hat{\mathbf \mu}_1 -\hat{\mathbf \mu}_0) \cdot (\hat{\mathbf \mu}_1^T - \hat{\mathbf \mu}_0^T) \cdot \mathbf \alpha \\&= \mathbf \alpha^T \cdot (\hat{\mathbf \mu}_1 -\hat{\mathbf \mu}_0) \cdot (\hat{\mathbf \mu}_1 - \hat{\mathbf \mu}_0)^T \cdot \mathbf \alpha \\&= \mathbf \alpha^T \mathbf M \mathbf \alpha \end{align} \tag{34} wTSbϕw=(αTμ^1−αTμ^0)⋅(μ^1Tα−μ^0Tα)=αT⋅(μ^1−μ^0)⋅(μ^1T−μ^0T)⋅α=αT⋅(μ^1−μ^0)⋅(μ^1−μ^0)T⋅α=αTMα(34)
以上便是式(19)分子部分的推导,下面继续推导式(19)的分母部分。将式(14)代入式(4)的分母可得:
w T S w ϕ w = ( ∑ i = 1 m α i ϕ ( x i ) ) T ⋅ S w ϕ ⋅ ∑ i = 1 m α i ϕ ( x i ) = ∑ i = 1 m α i ϕ ( x i ) T ⋅ S w ϕ ⋅ ∑ i = 1 m α i ϕ ( x i ) (35) \begin{align} \mathbf w^T \mathbf S_w^{\phi} \mathbf w &= \left( \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \right) ^T \cdot \mathbf S_w^{\phi} \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \\&= \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) ^T \cdot \mathbf S_w^{\phi} \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i)\end{align} \tag{35} wTSwϕw=(i=1∑mαiϕ(xi))T⋅Swϕ⋅i=1∑mαiϕ(xi)=i=1∑mαiϕ(xi)T⋅Swϕ⋅i=1∑mαiϕ(xi)(35)
其中:
S w ϕ = ∑ i = 0 1 ∑ x ∈ X i ( ϕ ( x ) − μ i ϕ ) ( ϕ ( x ) − μ i ϕ ) T = ∑ i = 0 1 ∑ x ∈ X i ( ϕ ( x ) − μ i ϕ ) ( ϕ ( x ) T − ( μ i ϕ ) T ) = ∑ i = 0 1 ∑ x ∈ X i ( ϕ ( x ) ϕ ( x ) T − ϕ ( x ) ( μ i ϕ ) T − μ i ϕ ϕ ( x ) T + μ i ϕ ( μ i ϕ ) T ) = ∑ i = 0 1 ∑ x ∈ X i ϕ ( x ) ϕ ( x ) T − ∑ i = 0 1 ∑ x ∈ X i ϕ ( x ) ( μ i ϕ ) T − ∑ i = 0 1 ∑ x ∈ X i μ i ϕ ϕ ( x ) T + ∑ i = 0 1 ∑ x ∈ X i μ i ϕ ( μ i ϕ ) T (36) \begin{align} \mathbf S^{\phi}_w &= \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i}\left( \phi (\mathbf{x})-\mathbf{\mu}_i^{\phi} \right) \left( \phi (\mathbf{x})-\mathbf{\mu}_i^{\phi} \right) ^T \\&= \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i}\left( \phi (\mathbf{x})-\mathbf{\mu}_i^{\phi} \right) \left( \phi (\mathbf{x})^T-\left( \mathbf{\mu}_i^{\phi} \right)^T \right) \\&= \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \left( \phi (\mathbf{x}) \phi (\mathbf{x})^T- \phi (\mathbf{x}) \left( \mathbf{\mu}_i^{\phi} \right)^T -\mathbf{\mu}_i^{\phi} \phi (\mathbf{x})^T +\mathbf{\mu}_i^{\phi} \left( \mathbf{\mu}_i^{\phi} \right)^T \right) \\&= \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \phi (\mathbf{x}) \phi (\mathbf{x})^T-\sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \phi (\mathbf{x}) \left( \mathbf{\mu}_i^{\phi} \right)^T -\sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \mathbf{\mu}_i^{\phi} \phi (\mathbf{x})^T +\sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \mathbf{\mu}_i^{\phi} \left( \mathbf{\mu}_i^{\phi} \right)^T \end{align} \tag{36} Swϕ=i=0∑1x∈Xi∑(ϕ(x)−μiϕ)(ϕ(x)−μiϕ)T=i=0∑1x∈Xi∑(ϕ(x)−μiϕ)(ϕ(x)T−(μiϕ)T)=i=0∑1x∈Xi∑(ϕ(x)ϕ(x)T−ϕ(x)(μiϕ)T−μiϕϕ(x)T+μiϕ(μiϕ)T)=i=0∑1x∈Xi∑ϕ(x)ϕ(x)T−i=0∑1x∈Xi∑ϕ(x)(μiϕ)T−i=0∑1x∈Xi∑μiϕϕ(x)T+i=0∑1x∈Xi∑μiϕ(μiϕ)T(36)
由于:
∑ i = 0 1 ∑ x ∈ X i ϕ ( x ) ( μ i ϕ ) T = ∑ x ∈ X 0 ϕ ( x ) ( μ 0 ϕ ) T + ∑ x ∈ X 1 ϕ ( x ) ( μ 1 ϕ ) T = m 0 μ 0 ϕ ( μ 0 ϕ ) T + m 1 μ 1 ϕ ( μ 1 ϕ ) T (37) \begin{align} \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \phi (\mathbf{x}) \left( \mathbf{\mu}_i^{\phi} \right)^T &= \sum_{\mathbf{x}\in \mathbf{X}_0} \phi (\mathbf{x}) \left( \mathbf{\mu}_0^{\phi} \right)^T+\sum_{\mathbf{x}\in \mathbf{X}_1} \phi (\mathbf{x}) \left( \mathbf{\mu}_1^{\phi} \right)^T \\&= m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T +m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \end{align} \tag{37} i=0∑1x∈Xi∑ϕ(x)(μiϕ)T=x∈X0∑ϕ(x)(μ0ϕ)T+x∈X1∑ϕ(x)(μ1ϕ)T=m0μ0ϕ(μ0ϕ)T+m1μ1ϕ(μ1ϕ)T(37)
∑ i = 0 1 ∑ x ∈ X i μ i ϕ ϕ ( x ) T = ∑ i = 0 1 μ i ϕ ∑ x ∈ X i ϕ ( x ) T = μ 0 ϕ ∑ x ∈ X 0 ϕ ( x ) T + μ 1 ϕ ∑ x ∈ X 1 ϕ ( x ) T = m 0 μ 0 ϕ ( μ 0 ϕ ) T + m 1 μ 1 ϕ ( μ 1 ϕ ) T (38) \begin{align} \sum_{i=0}^1 \sum_{\mathbf{x}\in \mathbf{X}_i} \mathbf{\mu}_i^{\phi} \phi (\mathbf{x})^T &= \sum_{i=0}^1 \mathbf \mu_i^{\phi} \sum_{\mathbf{x}\in \mathbf{X}_i} \phi (\mathbf{x})^T \\&= \mathbf \mu_0^{\phi} \sum_{\mathbf{x}\in \mathbf{X}_0} \phi (\mathbf{x})^T+ \mathbf \mu_1^{\phi} \sum_{\mathbf{x}\in \mathbf{X}_1} \phi (\mathbf{x})^T \\&= m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T +m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \end{align} \tag{38} i=0∑1x∈Xi∑μiϕϕ(x)T=i=0∑1μiϕx∈Xi∑ϕ(x)T=μ0ϕx∈X0∑ϕ(x)T+μ1ϕx∈X1∑ϕ(x)T=m0μ0ϕ(μ0ϕ)T+m1μ1ϕ(μ1ϕ)T(38)
将式(37)和式(38)代入式(36):
S w ϕ = ∑ x ∈ D ϕ ( x ) ϕ ( x ) T − 2 [ m 0 μ 0 ϕ ( μ 0 ϕ ) T + m 1 μ 1 ϕ ( μ 1 ϕ ) T ] + m 0 μ 0 ϕ ( μ 0 ϕ ) T + m 1 μ 1 ϕ ( μ 1 ϕ ) T = ∑ x ∈ D ϕ ( x ) ϕ ( x ) T − m 0 μ 0 ϕ ( μ 0 ϕ ) T − m 1 μ 1 ϕ ( μ 1 ϕ ) T (39) \begin{align} \mathbf S^{\phi}_w &= \sum_{\mathbf{x}\in D} \phi (\mathbf{x}) \phi (\mathbf{x})^T-2\left[ m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T +m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \right] +m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T +m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \\&= \sum_{\mathbf{x}\in D} \phi (\mathbf{x}) \phi (\mathbf{x})^T - m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T -m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \end{align} \tag{39} Swϕ=x∈D∑ϕ(x)ϕ(x)T−2[m0μ0ϕ(μ0ϕ)T+m1μ1ϕ(μ1ϕ)T]+m0μ0ϕ(μ0ϕ)T+m1μ1ϕ(μ1ϕ)T=x∈D∑ϕ(x)ϕ(x)T−m0μ0ϕ(μ0ϕ)T−m1μ1ϕ(μ1ϕ)T(39)
将式(39)带回式(35):
w T S w ϕ w = ∑ i = 1 m α i ϕ ( x i ) T ⋅ S w ϕ ⋅ ∑ i = 1 m α i ϕ ( x i ) = ∑ i = 1 m α i ϕ ( x i ) T ⋅ ( ∑ x ∈ D ϕ ( x ) ϕ ( x ) T − m 0 μ 0 ϕ ( μ 0 ϕ ) T − m 1 μ 1 ϕ ( μ 1 ϕ ) T ) ⋅ ∑ i = 1 m α i ϕ ( x i ) = ∑ i = 1 m ∑ j = 1 m ∑ x ∈ D α i ϕ ( x i ) T ϕ ( x ) ϕ ( x ) T α j ϕ ( x j ) − ∑ i = 1 m ∑ j = 1 m α i ϕ ( x i ) T m 0 μ 0 ϕ ( μ 0 ϕ ) T α j ϕ ( x j ) − ∑ i = 1 m ∑ j = 1 m α i ϕ ( x i ) T m 1 μ 1 ϕ ( μ 1 ϕ ) T α j ϕ ( x j ) (40) \begin{align} \mathbf w^T \mathbf S_w^{\phi} \mathbf w &= \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) ^T \cdot \mathbf S_w^{\phi} \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \\&= \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) ^T \cdot \left( \sum_{\mathbf{x}\in D} \phi (\mathbf{x}) \phi (\mathbf{x})^T - m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T -m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \right) \cdot \sum^m_{i=1}\alpha_i \phi(\mathbf x_i) \\&= \sum^m_{i=1} \sum_{j=1}^m \sum_{\mathbf x \in D} \alpha_i \phi(\mathbf x_i) ^T \phi (\mathbf{x}) \phi (\mathbf{x})^T \alpha_j \phi(\mathbf x_j) - \sum^m_{i=1} \sum_{j=1}^m \alpha_i \phi(\mathbf x_i) ^T m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T \alpha_j \phi(\mathbf x_j) - \sum^m_{i=1} \sum_{j=1}^m \alpha_i \phi(\mathbf x_i) ^T m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \alpha_j \phi(\mathbf x_j) \end{align} \tag{40} wTSwϕw=i=1∑mαiϕ(xi)T⋅Swϕ⋅i=1∑mαiϕ(xi)=i=1∑mαiϕ(xi)T⋅(x∈D∑ϕ(x)ϕ(x)T−m0μ0ϕ(μ0ϕ)T−m1μ1ϕ(μ1ϕ)T)⋅i=1∑mαiϕ(xi)=i=1∑mj=1∑mx∈D∑αiϕ(xi)Tϕ(x)ϕ(x)Tαjϕ(xj)−i=1∑mj=1∑mαiϕ(xi)Tm0μ0ϕ(μ0ϕ)Tαjϕ(xj)−i=1∑mj=1∑mαiϕ(xi)Tm1μ1ϕ(μ1ϕ)Tαjϕ(xj)(40)
其中,式(40)的第一项可化简为:
∑ i = 1 m ∑ j = 1 m ∑ x ∈ D α i ϕ ( x i ) T ϕ ( x ) ϕ ( x ) T α j ϕ ( x j ) = ∑ i = 1 m ∑ j = 1 m ∑ x ∈ D α i α j κ ( x i , x ) κ ( x j , x ) = α T K K T α (41) \begin{align} \sum^m_{i=1} \sum_{j=1}^m \sum_{\mathbf x \in D} \alpha_i \phi(\mathbf x_i) ^T \phi (\mathbf{x}) \phi (\mathbf{x})^T \alpha_j \phi(\mathbf x_j) &= \sum^m_{i=1} \sum_{j=1}^m \sum_{\mathbf x \in D} \alpha_i \alpha_j \kappa(\mathbf x_i,\mathbf x) \kappa(\mathbf x_j,\mathbf x) \\&= \mathbf \alpha^T \mathbf{KK}^T \mathbf{\alpha} \end{align}\tag{41} i=1∑mj=1∑mx∈D∑αiϕ(xi)Tϕ(x)ϕ(x)Tαjϕ(xj)=i=1∑mj=1∑mx∈D∑αiαjκ(xi,x)κ(xj,x)=αTKKTα(41)
式(40)的第二项可化简为:
∑ i = 1 m ∑ j = 1 m α i ϕ ( x i ) T m 0 μ 0 ϕ ( μ 0 ϕ ) T α j ϕ ( x j ) = m 0 ∑ i = 1 m ∑ j = 1 m α i α j ϕ ( x i ) T μ 0 ϕ ( μ 0 ϕ ) T ϕ ( x j ) = m 0 ∑ i = 1 m ∑ j = 1 m α i α j ϕ ( x i ) T [ 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ] [ 1 m 0 ∑ x ∈ X 0 ϕ ( x ) ] T ϕ ( x j ) = m 0 ∑ i = 1 m ∑ j = 1 m α i α j [ 1 m 0 ∑ x ∈ X 0 ϕ ( x i ) T ϕ ( x ) ] [ 1 m 0 ∑ x ∈ X 0 ϕ ( x ) T ϕ ( x j ) ] = m 0 ∑ i = 1 m ∑ j = 1 m α i α j [ 1 m 0 ∑ x ∈ X 0 κ ( x i , x ) ] [ 1 m 0 ∑ x ∈ X 0 κ ( x j , x ) ] = m 0 α T μ ^ 0 μ ^ 0 T α (42) \begin{align} \sum^m_{i=1} \sum_{j=1}^m \alpha_i \phi(\mathbf x_i) ^T m_0 \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T \alpha_j \phi(\mathbf x_j) &= m_0 \sum^m_{i=1} \sum_{j=1}^m \alpha_i \alpha_j \phi(\mathbf x_i) ^T \mathbf{\mu}_0^{\phi} \left( \mathbf{\mu}_0^{\phi} \right)^T \phi(\mathbf x_j) \\&= m_0 \sum^m_{i=1} \sum_{j=1}^m \alpha_i \alpha_j \phi(\mathbf x_i) ^T \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \phi (\mathbf x) \right] \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \phi (\mathbf x) \right]^T \phi(\mathbf x_j) \\&= m_0 \sum^m_{i=1} \sum_{j=1}^m \alpha_i \alpha_j \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \phi(\mathbf x_i) ^T \phi (\mathbf x) \right] \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \phi (\mathbf x)^T \phi(\mathbf x_j) \right] \\&= m_0 \sum^m_{i=1} \sum_{j=1}^m \alpha_i \alpha_j \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \kappa (\mathbf x_i,\mathbf x) \right] \left[ \frac{1}{m_0} \sum_{\mathbf x \in X_0} \kappa (\mathbf x_j,\mathbf x) \right] \\&= m_0 \mathbf \alpha^T \hat{\mathbf \mu}_0 \hat{\mathbf \mu}_0^T \mathbf \alpha \end{align} \tag{42} i=1∑mj=1∑mαiϕ(xi)Tm0μ0ϕ(μ0ϕ)Tαjϕ(xj)=m0i=1∑mj=1∑mαiαjϕ(xi)Tμ0ϕ(μ0ϕ)Tϕ(xj)=m0i=1∑mj=1∑mαiαjϕ(xi)T[m01x∈X0∑ϕ(x)][m01x∈X0∑ϕ(x)]Tϕ(xj)=m0i=1∑mj=1∑mαiαj[m01x∈X0∑ϕ(xi)Tϕ(x)][m01x∈X0∑ϕ(x)Tϕ(xj)]=m0i=1∑mj=1∑mαiαj[m01x∈X0∑κ(xi,x)][m01x∈X0∑κ(xj,x)]=m0αTμ^0μ^0Tα(42)
同理,式(40)的第三项可化简为:
∑ i = 1 m ∑ j = 1 m α i ϕ ( x i ) T m 1 μ 1 ϕ ( μ 1 ϕ ) T α j ϕ ( x j ) = m 1 α T μ ^ 1 μ ^ 1 T α (43) \sum^m_{i=1} \sum_{j=1}^m \alpha_i \phi(\mathbf x_i) ^T m_1 \mathbf{\mu}_1^{\phi} \left( \mathbf{\mu}_1^{\phi} \right)^T \alpha_j \phi(\mathbf x_j) = m_1 \mathbf \alpha^T \hat{\mathbf \mu}_1 \hat{\mathbf \mu}_1^T \mathbf \alpha \tag{43} i=1∑mj=1∑mαiϕ(xi)Tm1μ1ϕ(μ1ϕ)Tαjϕ(xj)=m1αTμ^1μ^1Tα(43)
将式(41)、式(42)、式(43)带回到式(40):
w T S w ϕ w = α T K K T α − m 0 α T μ ^ 0 μ ^ 0 T α − m 1 α T μ ^ 1 μ ^ 1 T α = α T ⋅ ( K K T − m 0 μ ^ 0 μ ^ 0 T − m 1 μ ^ 1 μ ^ 1 T ) ⋅ α = α T ⋅ ( K K T − ∑ i = 0 1 m i μ ^ i μ ^ i T ) ⋅ α = α T N α (44) \begin{align} \mathbf w^T \mathbf S_w^{\phi} \mathbf w &= \mathbf \alpha^T \mathbf{KK}^T \mathbf{\alpha} - m_0 \mathbf \alpha^T \hat{\mathbf \mu}_0 \hat{\mathbf \mu}_0^T \mathbf \alpha - m_1 \mathbf \alpha^T \hat{\mathbf \mu}_1 \hat{\mathbf \mu}_1^T \mathbf \alpha \\&= \mathbf \alpha^T \cdot \left( \mathbf{KK}^T - m_0 \hat{\mathbf \mu}_0 \hat{\mathbf \mu}_0^T- m_1 \hat{\mathbf \mu}_1 \hat{\mathbf \mu}_1^T \right) \cdot \mathbf \alpha \\&= \mathbf \alpha^T \cdot \left( \mathbf{KK}^T - \sum_{i=0}^1 m_i \hat{\mathbf \mu}_i \hat{\mathbf \mu}_i^T \right) \cdot \mathbf \alpha \\&= \mathbf \alpha^T \mathbf{N \alpha} \end{align} \tag{44} wTSwϕw=αTKKTα−m0αTμ^0μ^0Tα−m1αTμ^1μ^1Tα=αT⋅(KKT−m0μ^0μ^0T−m1μ^1μ^1T)⋅α=αT⋅(KKT−i=0∑1miμ^iμ^iT)⋅α=αTNα(44)
3.参考资料
想要获取最新文章推送或者私聊谈人生,请关注我的个人微信公众号:⬇️x-jeff的AI工坊⬇️
个人博客网站:https://shichaoxin.com
GitHub:https://github.com/x-jeff