init
__init__(
units,
activation='tanh',
use_bias=True,
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros',
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0,
return_sequences=False,
return_state=False,
go_backwards=False,
stateful=False,
unroll=False,
**kwargs
)
初始化参数
参数 | 描述 |
---|---|
units | Positive integer, dimensionality of the output space. |
activation | Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no activation is applied (ie. “linear” activation: a(x) = x). |
use_bias | Boolean, whether the layer uses a bias vector. |
kernel_initializer | Initializer for the kernel weights matrix, used for the linear transformation of the inputs. |
recurrent_initializer | Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. |
bias_initializer | Initializer for the bias vector. |
kernel_regularizer | Regularizer function applied to the kernel weights matrix. |
recurrent_regularizer | Regularizer function applied to the recurrent_kernel weights matrix. |
bias_regularizer | Regularizer function applied to the bias vector. |
activity_regularizer | Regularizer function applied to the output of the layer (its “activation”)… |
kernel_constraint | Constraint function applied to the kernel weights matrix. |
recurrent_constraint | Constraint function applied to the recurrent_kernel weights matrix. |
bias_constraint | Constraint function applied to the bias vector. |
dropout | Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. |
recurrent_dropout | Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. |
return_sequences | Boolean. Whether to return the last output in the output sequence, or the full sequence. |
return_state | Boolean. Whether to return the last state in addition to the output. |
go_backwards | Boolean (default False). If True, process the input sequence backwards and return the reversed sequence. |
stateful | Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch. |
unroll | Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences. |
对象参数
参数 | 描述 |
---|---|
inputs | A 3D tensor. |
mask | Binary tensor of shape (samples, timesteps) indicating whether a given timestep should be masked. |
training | Python boolean indicating whether the layer should behave in training mode or in inference mode. This argument is passed to the cell when calling it. This is only relevant if dropout or recurrent_dropout is used. |
initial_state | List of initial state tensors to be passed to the first call of the cell. |
理论
W
a
a
W_{aa}
Waa是指上一个时间步的权重矩阵,形状是(
u
n
i
t
s
∗
u
n
i
t
s
units*units
units∗units),就是该循环层的隐藏单元(输入是上一时间步的输出units,输出是这一个时间步的units);
W
a
x
W_{ax}
Wax是这个时间步特征的权重矩阵(和全连接层相同),形状是(
f
e
a
t
u
r
e
s
∗
u
n
i
t
s
features*units
features∗units).features是上一层的输入,比如一句话中一个单词的词向量维度,
u
n
i
t
s
units
units是本层的隐藏单元
b
a
b_a
ba是偏置量,形状是(
u
n
i
t
s
units
units),有多少个隐藏单元就有多少个偏置量
所以权重的个数计算是(特征权重+上一个时间步输出权重+偏置量)
u
n
i
t
s
∗
f
e
a
t
u
r
e
s
+
u
n
i
t
s
∗
u
n
i
t
s
+
u
n
i
t
s
units*features+units*units+units
units∗features+units∗units+units
当然这个公式也可以简化为(
w
a
[
a
<
t
−
1
>
,
x
<
t
>
]
+
b
a
w_a[a^{<t-1>},x^{<t>}]+b_a
wa[a<t−1>,x<t>]+ba):
u
n
i
t
s
∗
(
f
e
a
t
u
r
e
s
+
u
n
i
t
s
)
+
u
n
i
t
s
units*(features+units)+units
units∗(features+units)+units
参考:
https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/SimpleRNN
https://stackoverflow.com/questions/50134334/number-of-parameters-for-keras-simplernn
https://bbs.pinggu.org/thread-6874907-1-1.html