Embeding后的batch的理解方式:以股票数据分析为例,[b, word num, word vec]表示有b组股票数据(曲线图),在曲线图上采样word num个点,每个点上的数据是word vec;[word num, b, word vec]表示在每个采样点上选取b组数据的值,每组曲线图分别得到对应点上的数据(word vec)
import tensorflow as tf
from tensorflow.keras import layers
x = tf.range(5)
x = tf.random.shuffle(x)# 其中10代表理解10个单词,4表示将一个单词表示为1x4数据
net = layers.Embedding(10,4)
a = net.trainable
print(a)
b = net.trainable_variables
print(b)
RNN相关公式推到:x为输入,y为输出,h表示每个词经过全连层(或其他层)的输出
h
t
=
f
W
(
h
t
−
1
,
x
t
)
h_t=f_W(h_{t-1},x_t)
ht=fW(ht−1,xt)
W
h
h
W_{hh}
Whh表示输出与输出之间的更新参数w
W
x
h
W_{xh}
Wxh表示输出与输入之间的更新参数w
h
t
=
t
a
n
h
(
W
h
h
h
t
−
1
+
W
x
h
x
t
)
h_t = tanh(W_{hh}h_{t-1}+W_{xh}x_t)
ht=tanh(Whhht−1+Wxhxt)
y
t
=
W
h
y
h
t
+
b
y_t = W_{hy}h_t+b
yt=Whyht+b求导,其中E是损失函数
∂
E
t
∂
W
h
h
=
∑
i
=
0
t
∂
E
t
∂
y
t
∂
y
t
∂
h
t
∂
h
t
∂
h
i
∂
h
i
∂
W
h
h
\frac{\partial E_t}{\partial W_{hh}} = \sum_{i=0}^t\frac{\partial E_t}{\partial y_t}\frac{\partial y_t}{\partial h_t}\frac{\partial h_t}{\partial h_i}\frac{\partial h_i}{\partial W_{hh}}
∂Whh∂Et=i=0∑t∂yt∂Et∂ht∂yt∂hi∂ht∂Whh∂hi
∂
h
k
∂
h
1
=
∏
i
k
d
i
a
g
(
f
′
(
W
x
h
x
i
+
W
h
h
h
i
−
1
)
)
W
h
h
\frac{\partial h_k}{\partial h_1} = \prod_i^kdiag(f\prime(W_{xh}x_i+W_{hh}h_{i-1}))W_{hh}
∂h1∂hk=i∏kdiag(f′(Wxhxi+Whhhi−1))Whh
3. TensorFlow中RNN实现
W
h
h
@
h
t
−
1
+
W
x
h
@
x
t
W_{hh}@h_{t-1}+W_{xh}@x_t
Whh@ht−1+Wxh@xt在代码中可以理解为
[
b
a
t
c
h
,
f
e
a
t
u
r
e
l
e
n
]
@
[
f
e
a
t
u
r
e
l
e
n
,
h
i
d
d
e
n
l
e
n
]
+
[
b
a
t
c
h
,
h
i
d
d
e
n
l
e
n
]
@
[
h
i
d
d
e
n
l
e
n
,
h
i
d
d
e
n
l
e
n
]
[batch, feature\ len]@[feature\ len, hidden\ len]+[batch, hidden\ len]@[hidden\ len, hidden\ len]
[batch,featurelen]@[featurelen,hiddenlen]+[batch,hiddenlen]@[hiddenlen,hiddenlen]其中 feature len为一个‘单词’的特征向量长度,hidden len为输出的维数。输出中kernel是指
W
x
h
W_{xh}
Wxh,recurrent_kernel是指
W
h
h
W_{hh}
Whh