(《机器学习》完整版系列)第13章 半监督学习——13.4 正则式框架(标记传播算法迭代式的另一个推导)

由“能量函数”部分和监督部分组成正则表达式,即可从正则框架出发推出标记传播算法迭代式(另一个途径)

正则式框架

观察13.3 标记传播算法的表13.3 ,记矩阵 F \mathbf{F} F的第 i i i行向量为
F i = ( f i 1   ,   f i 2   ,   f i 3   ,   ⋯   ,   f i n ) \begin{align} %\boldsymbol{F}^\mathrm{T}_i=(f_i^1\,,\,f_i^2\,,\,f_i^3\,,\,\cdots\,,\,f_i^n) \boldsymbol{F}_i=(f_i^1\,,\,f_i^2\,,\,f_i^3\,,\,\cdots\,,\,f_i^n) \tag{13.48} \end{align} Fi=(fi1,fi2,fi3,,fin)(13.48)
则矩阵 F \mathbf{F} F
F = ( F 1   ;   F 2   ;   ⋯   ;   F l + u ) \begin{align} \mathbf{F}=(\boldsymbol{F}_1\,;\,\boldsymbol{F}_2\,;\,\cdots\,;\,\boldsymbol{F}_{l+u}) \tag{13.49} \end{align} F=(F1;F2;;Fl+u)(13.49)
表13.1 对应的矩阵 Y \mathbf{Y} Y的第 i i i行向量为 Y i \boldsymbol{Y}_i Yi

我们将 f f f的“能量函数”式(13.40)推广到矩阵 F \mathbf{F} F的“能量函数”
1 2 ( ∑ i = 1 l + u ∑ j = 1 l + u w i j ∣ ∣ 1 d i F i − 1 d j F j ∣ ∣ 2 2 ) \begin{align} \frac{1}{2}\left(\sum_{i=1}^{l+u}\sum_{j=1}^{l+u}w_{ij}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\right) \tag{13.50} \end{align} 21(i=1l+uj=1l+uwij di 1Fidj 1Fj 22)(13.50)
又,监督部分为
∑ i = 1 l ∣ ∣ F i − Y i ∣ ∣ 2 2 \begin{align} \sum_{i=1}^l||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2 \tag{13.51} \end{align} i=1l∣∣FiYi22(13.51)

再将监督部分(式(13.51)度量)纳入正则化中,用系数 μ > 0 \mu >0 μ>0平衡式(13.50)与式(13.51),即
L =   式(13.50) + μ   式(13.51) \begin{align} L=\,\text{式(13.50)}+\mu \,\text{式(13.51)} \tag{13.52} \end{align} L=(13.50)+μ(13.51)(13.52)

其目标是对式(13.52)求最小值,即得正则式框架【西瓜书式(13.21)】。

注意到式(13.50)与式(13.51)的求和范围不一样,我们将式(13.51)调整为
∑ i = 1 l + u ∣ ∣ F i − Y i ∣ ∣ 2 2 \begin{align} \sum_{i=1}^{l+u}||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2 \tag{13.53} \end{align} i=1l+u∣∣FiYi22(13.53)
当然,调整会损失一些合理性,机器学习中常通过“近似”、“次优”等变通手段,实现计算的可行性。

由此有
L =   式(13.50) + μ   式(13.53) \begin{align} L=\,\text{式(13.50)}+\mu \,\text{式(13.53)} \tag{13.54} \end{align} L=(13.50)+μ(13.53)(13.54)
∂   式(13.50) ∂ F k = 1 2 ( ∂ ∂ F k ∑ i ≠ k ∑ j ≠ k [ ⋅ ] + ∂ ∂ F k ∑ i = k ∑ j ≠ k [ ⋅ ] + ∂ ∂ F k ∑ i ≠ k ∑ j = k [ ⋅ ] + ∂ ∂ F k ∑ i = k ∑ j = k [ ⋅ ] ) = 1 2 ( 0 + ∂ ∂ F k ∑ i = k ∑ j ≠ k [ ⋅ ] + ∂ ∂ F k ∑ i ≠ k ∑ j = k [ ⋅ ] + 0 ) = 1 2 ( ∂ ∂ F k ∑ j ≠ k w k j ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 + ∂ ∂ F k ∑ i ≠ k w i k ∣ ∣ 1 d i F i − 1 d k F k ∣ ∣ 2 2 ) = 1 2 ( ∂ ∂ F k ∑ j ≠ k w k j ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 + ∂ ∂ F k ∑ i ≠ k w k i ∣ ∣ 1 d i F i − 1 d k F k ∣ ∣ 2 2 ) (由 W 的对称性 w i k = w k i ) = 1 2 ( ∂ ∂ F k ∑ j ≠ k w k j ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 + ∂ ∂ F k ∑ j ≠ k w k j ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 ) = ∂ ∂ F k ∑ j ≠ k w k j ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 = ∑ j ≠ k w k j ∂ ∂ F k ∣ ∣ 1 d k F k − 1 d j F j ∣ ∣ 2 2 = 2 ∑ j ≠ k w k j ( 1 d k F k − 1 d j F j ) 1 d k (由于【西瓜书附录式(A.30)】) = 2 ∑ j ≠ k w k j ( 1 d k F k − 1 d j F j ) 1 d k + 2 ∑ j = k w k j ( 1 d k F k − 1 d j F j ) 1 d k (由于后项 j = k ,故实际上后项为0) = 2 ∑ j = 1 l + u w k j ( 1 d k F k − 1 d j F j ) 1 d k = 2 ∑ j = 1 l + u w k j 1 d k 1 d k F k − ∑ j = 1 l + u w k j 1 d j 1 d k F j = 2 d k 1 d k 1 d k F k − 2 ∑ j = 1 l + u w k j 1 d j 1 d k F j = 2 F k − 2 ∑ j = 1 l + u 1 d k w k j 1 d j F j \begin{align} & \quad \frac{\partial \,\text{式(13.50)}}{\partial \boldsymbol{F}_k}\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j= k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j= k}[\cdot]\right)\notag \\ & =\frac{1}{2}\left(0+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j= k}[\cdot]+0\right)\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}w_{ik}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k\bigg|\bigg|_2^2\right)\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}w_{ki}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k\bigg|\bigg|_2^2\right)\notag \\ & \quad \quad \quad \text{(由$\mathbf{W}$的对称性$w_{ik}=w_{ki}$)}\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\right)\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\notag \\ & =\sum_{j\neq k}w_{kj}\frac{\partial }{\partial \boldsymbol{F}_k}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\notag \\ & =2\sum_{j\neq k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\quad \text{(由于【西瓜书附录式(A.30)】)}\notag \\ & =2\sum_{j\neq k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}+2\sum_{j=k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\notag \\ & \quad \quad \quad \text{(由于后项$j=k$,故实际上后项为0)}\notag \\ & =2\sum_{j=1}^{l+u}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\notag \\ & =2\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_k}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_j}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_j\notag \\ & =2d_k\frac{1}{\sqrt{d_k}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-2\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_j}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_j\notag \\ & =2\boldsymbol{F}_k-2\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j \tag{13.55} \end{align} Fk(13.50)=21 Fki=kj=k[]+Fki=kj=k[]+Fki=kj=k[]+Fki=kj=k[] =21 0+Fki=kj=k[]+Fki=kj=k[]+0 =21 Fkj=kwkj dk 1Fkdj 1Fj 22+Fki=kwik di 1Fidk 1Fk 22 =21 Fkj=kwkj dk 1Fkdj 1Fj 22+Fki=kwki di 1Fidk 1Fk 22 (由W的对称性wik=wki=21 Fkj=kwkj dk 1Fkdj 1Fj 22+Fkj=kwkj dk 1Fkdj 1Fj 22 =Fkj=kwkj dk 1Fkdj 1Fj 22=j=kwkjFk dk 1Fkdj 1Fj 22=2j=kwkj(dk 1Fkdj 1Fj)dk 1(由于【西瓜书附录式(A.30)】)=2j=kwkj(dk 1Fkdj 1Fj)dk 1+2j=kwkj(dk 1Fkdj 1Fj)dk 1(由于后项j=k,故实际上后项为0=2j=1l+uwkj(dk 1Fkdj 1Fj)dk 1=2j=1l+uwkjdk 1dk 1Fkj=1l+uwkjdj 1dk 1Fj=2dkdk 1dk 1Fk2j=1l+uwkjdj 1dk 1Fj=2Fk2j=1l+udk 1wkjdj 1Fj(13.55)
其中, [ ⋅ ] = w i j ∣ ∣ 1 d i F i − 1 d j F j ∣ ∣ 2 2 [\cdot]=w_{ij}||\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j||_2^2 []=wij∣∣di 1Fidj 1Fj22
k = 1 , 2 , ⋯   , l + u k=1,2,\cdots, l+u k=1,2,,l+u
∂   式(13.53) ∂ F k = ∂ ∂ F k ∑ i = 1 l + u ∣ ∣ F i − Y i ∣ ∣ 2 2 = ∂ ∂ F k ( ∑ i = k + ∑ i ≠ k ) ∣ ∣ F i − Y i ∣ ∣ 2 2 = ∂ ∂ F k ∣ ∣ F k − Y k ∣ ∣ 2 2 + 0 = 2 ( F k − Y k ) \begin{align} \frac{\partial \,\text{式(13.53)}}{\partial \boldsymbol{F}_k} & =\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i=1}^{l+u}||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}\left(\sum_{i=k}+\sum_{i\neq k}\right)||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}||\boldsymbol{F}_k-\boldsymbol{Y}_k||_2^2+0\notag \\ & =2(\boldsymbol{F}_k-\boldsymbol{Y}_k) \tag{13.56} \end{align} Fk(13.53)=Fki=1l+u∣∣FiYi22=Fk i=k+i=k ∣∣FiYi22=Fk∣∣FkYk22+0=2(FkYk)(13.56)

∂ L ∂ F k = 0 \frac{\partial L}{\partial \boldsymbol{F}_k}=\boldsymbol{0} FkL=0,即 式 ( 13.55 ) + 式 ( 13.56 ) = 0 式(13.55)+式(13.56)=0 (13.55)+(13.56)=0
2 F k − 2 ∑ j = 1 l + u 1 d k w k j 1 d j F j + 2 μ ( F k − Y k ) = 0 ( 1 + μ ) F k = ∑ j = 1 l + u 1 d k w k j 1 d j F j + μ Y k \begin{align} 2\boldsymbol{F}_k-2\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j +2\mu (\boldsymbol{F}_k-\boldsymbol{Y}_k)=0\notag \\ (1+\mu )\boldsymbol{F}_k=\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j+\mu \boldsymbol{Y}_k \tag{13.57} \end{align} 2Fk2j=1l+udk 1wkjdj 1Fj+2μ(FkYk)=0(1+μ)Fk=j=1l+udk 1wkjdj 1Fj+μYk(13.57)

由式(13.48),将式(13.57)改写为元素的形式
( 1 + μ ) f k q = ∑ j = 1 l + u 1 d k w k j 1 d j f j q + μ y k q ( 1 + μ ) f k q = ∑ j = 1 l + u s j k f j q + μ y k q \begin{align} (1+\mu )f^q_k=\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}f^q_j+\mu y^q_k\notag \\ (1+\mu )f^q_k=\sum_{j=1}^{l+u}s^k_jf^q_j+\mu y^q_k \tag{13.58} \end{align} (1+μ)fkq=j=1l+udk 1wkjdj 1fjq+μykq(1+μ)fkq=j=1l+usjkfjq+μykq(13.58)
其中, s j k = 1 d k w k j 1 d j s^k_j=\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}} sjk=dk 1wkjdj 1 k = 1 , 2 , ⋯   , l + u ;   q = 1 , 2 , ⋯   , n k=1,2,\cdots, l+u;\,q=1,2,\cdots, n k=1,2,,l+u;q=1,2,,n

再根据向量与矩阵(学习一些公式及其推导技巧)中的式(A21)及(A22),可将式(13.58)表示成矩阵
( 1 + μ ) F = S F + μ Y \begin{align} (1+\mu )\mathbf{F} & =\mathbf{S}\mathbf{F}+\mu \mathbf{Y} \tag{13.59} \end{align} (1+μ)F=SF+μY(13.59)
其中, S \mathbf{S} S
S = ( [ s j k ] k j ) = ( [ 1 d k w k j 1 d j ] ) = U d i a g W   V d i a g (由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15)) \begin{align} \mathbf{S} & =([s^k_j]_{kj})\notag \\ & =([\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}])\notag \\ & =\mathbf{U }_{\mathrm{diag}}\mathbf{W}\,\mathbf{V }_{\mathrm{diag}}\quad \text{(由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15))} \tag{13.60} \end{align} S=([sjk]kj)=([dk 1wkjdj 1])=UdiagWVdiag(由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15)(13.60)
其中,对角阵为
U d i a g = d i a g ( 1 d 1 , 1 d 2 , ⋯   , 1 d l + u ) = D − 1 2 V d i a g = D − 1 2 D d i a g = d i a g ( d 1 , d 2 , ⋯   , d l + u ) \begin{align} \mathbf{U }_{\mathrm{diag}} & =\mathrm{diag}(\frac{1}{\sqrt{d_1}},\frac{1}{\sqrt{d_2}},\cdots,\frac{1}{\sqrt{d_{l+u}}})\notag \\ & =\mathbf{D }^{-\frac{1}{2}} \tag{13.61} \\ \mathbf{V }_{\mathrm{diag}} & =\mathbf{D }^{-\frac{1}{2}} \tag{13.62} \\ \mathbf{D }_{\mathrm{diag}} & =\mathrm{diag}({d_1},{d_2},\cdots,{d_{l+u}}) \tag{13.63} \end{align} UdiagVdiagDdiag=diag(d1 1,d2 1,,dl+u 1)=D21=D21=diag(d1,d2,,dl+u)(13.61)(13.62)(13.63)

再令
α = 1 1 + μ \begin{align} \alpha =\frac{1}{1+\mu} \tag{13.64} \end{align} α=1+μ1(13.64)

由式(13.59)至式(13.64)得
F = α S F + ( 1 − α ) Y \begin{align} \mathbf{F}=\alpha \mathbf{S}\mathbf{F}+(1-\alpha ) \mathbf{Y} \tag{13.65} \end{align} F=αSF+(1α)Y(13.65)
其中, S = D − 1 2 W D − 1 2 \mathbf{S}=\mathbf{D }^{-\frac{1}{2}}\mathbf{W}\mathbf{D }^{-\frac{1}{2}} S=D21WD21

将等式(13.65)化为迭代式
F t + 1 = α S F t + ( 1 − α ) Y \begin{align} \mathbf{F}^{t+1}=\alpha \mathbf{S}\mathbf{F}^t+(1-\alpha ) \mathbf{Y} \tag{13.66} \end{align} Ft+1=αSFt+(1α)Y(13.66)

式(13.66)即为【西瓜书式(13.19)】,这说明从正则框架出发也可以推出迭代式(另一个途径),即推导了【西瓜书式(13.21)】的最优解恰为【西瓜书图13.5】算法的解 F ∗ \mathbf{F}^* F,其 μ \mu μ α \alpha α的关系为式(13.64): μ = 1 − α α \mu=\frac{1-\alpha} {\alpha} μ=α1α

本文为原创,您可以:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值