由“能量函数”部分和监督部分组成正则表达式,即可从正则框架出发推出标记传播算法迭代式(另一个途径)
正则式框架
观察13.3 标记传播算法的表13.3 ,记矩阵
F
\mathbf{F}
F的第
i
i
i行向量为
F
i
=
(
f
i
1
,
f
i
2
,
f
i
3
,
⋯
,
f
i
n
)
\begin{align} %\boldsymbol{F}^\mathrm{T}_i=(f_i^1\,,\,f_i^2\,,\,f_i^3\,,\,\cdots\,,\,f_i^n) \boldsymbol{F}_i=(f_i^1\,,\,f_i^2\,,\,f_i^3\,,\,\cdots\,,\,f_i^n) \tag{13.48} \end{align}
Fi=(fi1,fi2,fi3,⋯,fin)(13.48)
则矩阵
F
\mathbf{F}
F为
F
=
(
F
1
;
F
2
;
⋯
;
F
l
+
u
)
\begin{align} \mathbf{F}=(\boldsymbol{F}_1\,;\,\boldsymbol{F}_2\,;\,\cdots\,;\,\boldsymbol{F}_{l+u}) \tag{13.49} \end{align}
F=(F1;F2;⋯;Fl+u)(13.49)
表13.1 对应的矩阵
Y
\mathbf{Y}
Y的第
i
i
i行向量为
Y
i
\boldsymbol{Y}_i
Yi。
我们将
f
f
f的“能量函数”式(13.40)推广到矩阵
F
\mathbf{F}
F的“能量函数”
1
2
(
∑
i
=
1
l
+
u
∑
j
=
1
l
+
u
w
i
j
∣
∣
1
d
i
F
i
−
1
d
j
F
j
∣
∣
2
2
)
\begin{align} \frac{1}{2}\left(\sum_{i=1}^{l+u}\sum_{j=1}^{l+u}w_{ij}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\right) \tag{13.50} \end{align}
21(i=1∑l+uj=1∑l+uwij
di1Fi−dj1Fj
22)(13.50)
又,监督部分为
∑
i
=
1
l
∣
∣
F
i
−
Y
i
∣
∣
2
2
\begin{align} \sum_{i=1}^l||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2 \tag{13.51} \end{align}
i=1∑l∣∣Fi−Yi∣∣22(13.51)
再将监督部分(式(13.51)度量)纳入正则化中,用系数
μ
>
0
\mu >0
μ>0平衡式(13.50)与式(13.51),即
L
=
式(13.50)
+
μ
式(13.51)
\begin{align} L=\,\text{式(13.50)}+\mu \,\text{式(13.51)} \tag{13.52} \end{align}
L=式(13.50)+μ式(13.51)(13.52)
其目标是对式(13.52)求最小值,即得正则式框架【西瓜书式(13.21)】。
注意到式(13.50)与式(13.51)的求和范围不一样,我们将式(13.51)调整为
∑
i
=
1
l
+
u
∣
∣
F
i
−
Y
i
∣
∣
2
2
\begin{align} \sum_{i=1}^{l+u}||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2 \tag{13.53} \end{align}
i=1∑l+u∣∣Fi−Yi∣∣22(13.53)
当然,调整会损失一些合理性,机器学习中常通过“近似”、“次优”等变通手段,实现计算的可行性。
由此有
L
=
式(13.50)
+
μ
式(13.53)
\begin{align} L=\,\text{式(13.50)}+\mu \,\text{式(13.53)} \tag{13.54} \end{align}
L=式(13.50)+μ式(13.53)(13.54)
∂
式(13.50)
∂
F
k
=
1
2
(
∂
∂
F
k
∑
i
≠
k
∑
j
≠
k
[
⋅
]
+
∂
∂
F
k
∑
i
=
k
∑
j
≠
k
[
⋅
]
+
∂
∂
F
k
∑
i
≠
k
∑
j
=
k
[
⋅
]
+
∂
∂
F
k
∑
i
=
k
∑
j
=
k
[
⋅
]
)
=
1
2
(
0
+
∂
∂
F
k
∑
i
=
k
∑
j
≠
k
[
⋅
]
+
∂
∂
F
k
∑
i
≠
k
∑
j
=
k
[
⋅
]
+
0
)
=
1
2
(
∂
∂
F
k
∑
j
≠
k
w
k
j
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
+
∂
∂
F
k
∑
i
≠
k
w
i
k
∣
∣
1
d
i
F
i
−
1
d
k
F
k
∣
∣
2
2
)
=
1
2
(
∂
∂
F
k
∑
j
≠
k
w
k
j
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
+
∂
∂
F
k
∑
i
≠
k
w
k
i
∣
∣
1
d
i
F
i
−
1
d
k
F
k
∣
∣
2
2
)
(由
W
的对称性
w
i
k
=
w
k
i
)
=
1
2
(
∂
∂
F
k
∑
j
≠
k
w
k
j
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
+
∂
∂
F
k
∑
j
≠
k
w
k
j
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
)
=
∂
∂
F
k
∑
j
≠
k
w
k
j
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
=
∑
j
≠
k
w
k
j
∂
∂
F
k
∣
∣
1
d
k
F
k
−
1
d
j
F
j
∣
∣
2
2
=
2
∑
j
≠
k
w
k
j
(
1
d
k
F
k
−
1
d
j
F
j
)
1
d
k
(由于【西瓜书附录式(A.30)】)
=
2
∑
j
≠
k
w
k
j
(
1
d
k
F
k
−
1
d
j
F
j
)
1
d
k
+
2
∑
j
=
k
w
k
j
(
1
d
k
F
k
−
1
d
j
F
j
)
1
d
k
(由于后项
j
=
k
,故实际上后项为0)
=
2
∑
j
=
1
l
+
u
w
k
j
(
1
d
k
F
k
−
1
d
j
F
j
)
1
d
k
=
2
∑
j
=
1
l
+
u
w
k
j
1
d
k
1
d
k
F
k
−
∑
j
=
1
l
+
u
w
k
j
1
d
j
1
d
k
F
j
=
2
d
k
1
d
k
1
d
k
F
k
−
2
∑
j
=
1
l
+
u
w
k
j
1
d
j
1
d
k
F
j
=
2
F
k
−
2
∑
j
=
1
l
+
u
1
d
k
w
k
j
1
d
j
F
j
\begin{align} & \quad \frac{\partial \,\text{式(13.50)}}{\partial \boldsymbol{F}_k}\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j= k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j= k}[\cdot]\right)\notag \\ & =\frac{1}{2}\left(0+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i= k}\sum_{j\neq k}[\cdot]+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}\sum_{j= k}[\cdot]+0\right)\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}w_{ik}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k\bigg|\bigg|_2^2\right)\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i\neq k}w_{ki}\bigg|\bigg|\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k\bigg|\bigg|_2^2\right)\notag \\ & \quad \quad \quad \text{(由$\mathbf{W}$的对称性$w_{ik}=w_{ki}$)}\notag \\ & =\frac{1}{2}\left(\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2+\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\right)\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{j\neq k}w_{kj}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\notag \\ & =\sum_{j\neq k}w_{kj}\frac{\partial }{\partial \boldsymbol{F}_k}\bigg|\bigg|\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j\bigg|\bigg|_2^2\notag \\ & =2\sum_{j\neq k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\quad \text{(由于【西瓜书附录式(A.30)】)}\notag \\ & =2\sum_{j\neq k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}+2\sum_{j=k}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\notag \\ & \quad \quad \quad \text{(由于后项$j=k$,故实际上后项为0)}\notag \\ & =2\sum_{j=1}^{l+u}w_{kj}(\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j)\frac{1}{\sqrt{d_k}}\notag \\ & =2\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_k}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_j}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_j\notag \\ & =2d_k\frac{1}{\sqrt{d_k}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_k-2\sum_{j=1}^{l+u}w_{kj}\frac{1}{\sqrt{d_j}}\frac{1}{\sqrt{d_k}}\boldsymbol{F}_j\notag \\ & =2\boldsymbol{F}_k-2\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j \tag{13.55} \end{align}
∂Fk∂式(13.50)=21
∂Fk∂i=k∑j=k∑[⋅]+∂Fk∂i=k∑j=k∑[⋅]+∂Fk∂i=k∑j=k∑[⋅]+∂Fk∂i=k∑j=k∑[⋅]
=21
0+∂Fk∂i=k∑j=k∑[⋅]+∂Fk∂i=k∑j=k∑[⋅]+0
=21
∂Fk∂j=k∑wkj
dk1Fk−dj1Fj
22+∂Fk∂i=k∑wik
di1Fi−dk1Fk
22
=21
∂Fk∂j=k∑wkj
dk1Fk−dj1Fj
22+∂Fk∂i=k∑wki
di1Fi−dk1Fk
22
(由W的对称性wik=wki)=21
∂Fk∂j=k∑wkj
dk1Fk−dj1Fj
22+∂Fk∂j=k∑wkj
dk1Fk−dj1Fj
22
=∂Fk∂j=k∑wkj
dk1Fk−dj1Fj
22=j=k∑wkj∂Fk∂
dk1Fk−dj1Fj
22=2j=k∑wkj(dk1Fk−dj1Fj)dk1(由于【西瓜书附录式(A.30)】)=2j=k∑wkj(dk1Fk−dj1Fj)dk1+2j=k∑wkj(dk1Fk−dj1Fj)dk1(由于后项j=k,故实际上后项为0)=2j=1∑l+uwkj(dk1Fk−dj1Fj)dk1=2j=1∑l+uwkjdk1dk1Fk−j=1∑l+uwkjdj1dk1Fj=2dkdk1dk1Fk−2j=1∑l+uwkjdj1dk1Fj=2Fk−2j=1∑l+udk1wkjdj1Fj(13.55)
其中,
[
⋅
]
=
w
i
j
∣
∣
1
d
i
F
i
−
1
d
j
F
j
∣
∣
2
2
[\cdot]=w_{ij}||\frac{1}{\sqrt{d_i}}\boldsymbol{F}_i-\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j||_2^2
[⋅]=wij∣∣di1Fi−dj1Fj∣∣22
,
k
=
1
,
2
,
⋯
,
l
+
u
k=1,2,\cdots, l+u
k=1,2,⋯,l+u。
∂
式(13.53)
∂
F
k
=
∂
∂
F
k
∑
i
=
1
l
+
u
∣
∣
F
i
−
Y
i
∣
∣
2
2
=
∂
∂
F
k
(
∑
i
=
k
+
∑
i
≠
k
)
∣
∣
F
i
−
Y
i
∣
∣
2
2
=
∂
∂
F
k
∣
∣
F
k
−
Y
k
∣
∣
2
2
+
0
=
2
(
F
k
−
Y
k
)
\begin{align} \frac{\partial \,\text{式(13.53)}}{\partial \boldsymbol{F}_k} & =\frac{\partial }{\partial \boldsymbol{F}_k}\sum_{i=1}^{l+u}||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}\left(\sum_{i=k}+\sum_{i\neq k}\right)||\boldsymbol{F}_i-\boldsymbol{Y}_i||_2^2\notag \\ & =\frac{\partial }{\partial \boldsymbol{F}_k}||\boldsymbol{F}_k-\boldsymbol{Y}_k||_2^2+0\notag \\ & =2(\boldsymbol{F}_k-\boldsymbol{Y}_k) \tag{13.56} \end{align}
∂Fk∂式(13.53)=∂Fk∂i=1∑l+u∣∣Fi−Yi∣∣22=∂Fk∂
i=k∑+i=k∑
∣∣Fi−Yi∣∣22=∂Fk∂∣∣Fk−Yk∣∣22+0=2(Fk−Yk)(13.56)
令
∂
L
∂
F
k
=
0
\frac{\partial L}{\partial \boldsymbol{F}_k}=\boldsymbol{0}
∂Fk∂L=0,即
式
(
13.55
)
+
式
(
13.56
)
=
0
式(13.55)+式(13.56)=0
式(13.55)+式(13.56)=0
2
F
k
−
2
∑
j
=
1
l
+
u
1
d
k
w
k
j
1
d
j
F
j
+
2
μ
(
F
k
−
Y
k
)
=
0
(
1
+
μ
)
F
k
=
∑
j
=
1
l
+
u
1
d
k
w
k
j
1
d
j
F
j
+
μ
Y
k
\begin{align} 2\boldsymbol{F}_k-2\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j +2\mu (\boldsymbol{F}_k-\boldsymbol{Y}_k)=0\notag \\ (1+\mu )\boldsymbol{F}_k=\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}\boldsymbol{F}_j+\mu \boldsymbol{Y}_k \tag{13.57} \end{align}
2Fk−2j=1∑l+udk1wkjdj1Fj+2μ(Fk−Yk)=0(1+μ)Fk=j=1∑l+udk1wkjdj1Fj+μYk(13.57)
由式(13.48),将式(13.57)改写为元素的形式
(
1
+
μ
)
f
k
q
=
∑
j
=
1
l
+
u
1
d
k
w
k
j
1
d
j
f
j
q
+
μ
y
k
q
(
1
+
μ
)
f
k
q
=
∑
j
=
1
l
+
u
s
j
k
f
j
q
+
μ
y
k
q
\begin{align} (1+\mu )f^q_k=\sum_{j=1}^{l+u}\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}f^q_j+\mu y^q_k\notag \\ (1+\mu )f^q_k=\sum_{j=1}^{l+u}s^k_jf^q_j+\mu y^q_k \tag{13.58} \end{align}
(1+μ)fkq=j=1∑l+udk1wkjdj1fjq+μykq(1+μ)fkq=j=1∑l+usjkfjq+μykq(13.58)
其中,
s
j
k
=
1
d
k
w
k
j
1
d
j
s^k_j=\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}
sjk=dk1wkjdj1,
k
=
1
,
2
,
⋯
,
l
+
u
;
q
=
1
,
2
,
⋯
,
n
k=1,2,\cdots, l+u;\,q=1,2,\cdots, n
k=1,2,⋯,l+u;q=1,2,⋯,n。
再根据向量与矩阵(学习一些公式及其推导技巧)中的式(A21)及(A22),可将式(13.58)表示成矩阵
(
1
+
μ
)
F
=
S
F
+
μ
Y
\begin{align} (1+\mu )\mathbf{F} & =\mathbf{S}\mathbf{F}+\mu \mathbf{Y} \tag{13.59} \end{align}
(1+μ)F=SF+μY(13.59)
其中,
S
\mathbf{S}
S为
S
=
(
[
s
j
k
]
k
j
)
=
(
[
1
d
k
w
k
j
1
d
j
]
)
=
U
d
i
a
g
W
V
d
i
a
g
(由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15))
\begin{align} \mathbf{S} & =([s^k_j]_{kj})\notag \\ & =([\frac{1}{\sqrt{d_k}}w_{kj}\frac{1}{\sqrt{d_j}}])\notag \\ & =\mathbf{U }_{\mathrm{diag}}\mathbf{W}\,\mathbf{V }_{\mathrm{diag}}\quad \text{(由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15))} \tag{13.60} \end{align}
S=([sjk]kj)=([dk1wkjdj1])=UdiagWVdiag(由[向量与矩阵(学习一些公式及其推导技巧)]中的式(A15))(13.60)
其中,对角阵为
U
d
i
a
g
=
d
i
a
g
(
1
d
1
,
1
d
2
,
⋯
,
1
d
l
+
u
)
=
D
−
1
2
V
d
i
a
g
=
D
−
1
2
D
d
i
a
g
=
d
i
a
g
(
d
1
,
d
2
,
⋯
,
d
l
+
u
)
\begin{align} \mathbf{U }_{\mathrm{diag}} & =\mathrm{diag}(\frac{1}{\sqrt{d_1}},\frac{1}{\sqrt{d_2}},\cdots,\frac{1}{\sqrt{d_{l+u}}})\notag \\ & =\mathbf{D }^{-\frac{1}{2}} \tag{13.61} \\ \mathbf{V }_{\mathrm{diag}} & =\mathbf{D }^{-\frac{1}{2}} \tag{13.62} \\ \mathbf{D }_{\mathrm{diag}} & =\mathrm{diag}({d_1},{d_2},\cdots,{d_{l+u}}) \tag{13.63} \end{align}
UdiagVdiagDdiag=diag(d11,d21,⋯,dl+u1)=D−21=D−21=diag(d1,d2,⋯,dl+u)(13.61)(13.62)(13.63)
再令
α
=
1
1
+
μ
\begin{align} \alpha =\frac{1}{1+\mu} \tag{13.64} \end{align}
α=1+μ1(13.64)
由式(13.59)至式(13.64)得
F
=
α
S
F
+
(
1
−
α
)
Y
\begin{align} \mathbf{F}=\alpha \mathbf{S}\mathbf{F}+(1-\alpha ) \mathbf{Y} \tag{13.65} \end{align}
F=αSF+(1−α)Y(13.65)
其中,
S
=
D
−
1
2
W
D
−
1
2
\mathbf{S}=\mathbf{D }^{-\frac{1}{2}}\mathbf{W}\mathbf{D }^{-\frac{1}{2}}
S=D−21WD−21
将等式(13.65)化为迭代式
F
t
+
1
=
α
S
F
t
+
(
1
−
α
)
Y
\begin{align} \mathbf{F}^{t+1}=\alpha \mathbf{S}\mathbf{F}^t+(1-\alpha ) \mathbf{Y} \tag{13.66} \end{align}
Ft+1=αSFt+(1−α)Y(13.66)
式(13.66)即为【西瓜书式(13.19)】,这说明从正则框架出发也可以推出迭代式(另一个途径),即推导了【西瓜书式(13.21)】的最优解恰为【西瓜书图13.5】算法的解 F ∗ \mathbf{F}^* F∗,其 μ \mu μ与 α \alpha α的关系为式(13.64): μ = 1 − α α \mu=\frac{1-\alpha} {\alpha} μ=α1−α。
本文为原创,您可以:
- 点赞(支持博主)
- 收藏(待以后看)
- 转发(他考研或学习,正需要)
- 评论(或讨论)
- 引用(支持原创)
- 不侵权
。
上一篇:13.3 标记传播算法(亲和矩阵、伪概率标记矩阵、能量函数)
下一篇:13.5 基于分歧的方法(多学习器间的差异、协同训练算法)