word2vec中有CBOW和Skip-Gram模型,对于两个模型中的参数如何学习的公式推导,在《word2vec Parameter Learning Explained》中有详细解释,我在阅读1.1节One-word context时对于公式(8)的推导感到不解,花了些时间,原文如下:
“Let us now derive the update equation of the weights between hidden and output layers. Take the derivative of E with regard to
j
j
j-th unit’s net input
u
j
u_j
uj, we obtain
∂
E
∂
u
j
=
y
j
−
t
j
:
=
e
j
\frac{\partial E}{\partial u_j}=y_j-t_j:=e_j
∂uj∂E=yj−tj:=ej where
t
j
=
1
(
j
=
j
∗
)
,
i.e
,
t
j
t_j=\mathbb{1}(j=j^*),\text{i.e},t_j
tj=1(j=j∗),i.e,tj will only be 1 when the
j
j
j-th unit is the output word, otherwise
t
j
=
0.
t_j=0.
tj=0.”
我一开始不明白是怎么推到这一步的,后来发现过程很显然:
E
=
log
∑
j
′
=
1
V
exp
(
u
j
′
)
−
u
j
∗
e
j
=
∂
E
∂
u
j
=
exp
(
u
j
)
∑
j
′
=
1
V
exp
(
u
j
′
)
−
u
j
∗
=
y
j
−
u
j
∗
=
y
j
−
t
i
\begin{aligned} E & =\text{log}\sum_{j'=1}^V{\text{exp}(u_{j'})-u_{j*}} \\ e_j=\frac{\partial E}{\partial u_j} & =\frac{\text{exp}(u_j)}{\sum_{j'=1}^V{\text{exp}(u_{j'})}}-u_{j*} \\ & =y_j-u_{j*} \\ & =y_j-t_i \end{aligned}
Eej=∂uj∂E=logj′=1∑Vexp(uj′)−uj∗=∑j′=1Vexp(uj′)exp(uj)−uj∗=yj−uj∗=yj−ti
论文解惑《word2vec Parameter Learning Explained》1.1--CBOW模型中One-word context情况公式推导问题
最新推荐文章于 2022-11-26 21:38:17 发布