Tensorflow tf.keras.layers.GRU

最新推荐文章于 2024-05-03 11:40:16 发布

Claroja

最新推荐文章于 2024-05-03 11:40:16 发布

阅读量7.3k

点赞数 2

分类专栏： TensorFlow 文章标签： tensorflow

本文链接：https://blog.csdn.net/claroja/article/details/99961532

版权

TensorFlow 专栏收录该内容

43 篇文章 3 订阅

订阅专栏

init

__init__(
    units,
    activation='tanh',
    recurrent_activation='sigmoid',
    use_bias=True,
    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros',
    kernel_regularizer=None,
    recurrent_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    recurrent_constraint=None,
    bias_constraint=None,
    dropout=0.0,
    recurrent_dropout=0.0,
    implementation=2,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    time_major=False,
    reset_after=True,
    **kwargs
)

参数

参数	描述
units	Positive integer, dimensionality of the output space.
activation	Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
recurrent_activation	Activation function to use for the recurrent step. Default: sigmoid (sigmoid). If you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias	Boolean, whether the layer uses a bias vector.
kernel_initializer	Initializer for the kernel weights matrix, used for the linear transformation of the inputs.
recurrent_initializer	Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state.
bias_initializer	Initializer for the bias vector.
kernel_regularizer	Regularizer function applied to the kernel weights matrix.
recurrent_regularizer	Regularizer function applied to the recurrent_kernel weights matrix.
bias_regularizer	Regularizer function applied to the bias vector.
activity_regularizer	Regularizer function applied to the output of the layer (its “activation”)…
kernel_constraint	Constraint function applied to the kernel weights matrix.
recurrent_constraint	Constraint function applied to the recurrent_kernel weights matrix.
bias_constraint	Constraint function applied to the bias vector.
dropout	Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout	Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
implementation	Implementation mode, either 1 or 2. Mode 1 will structure its operations as a larger number of smaller dot products and additions, whereas mode 2 will batch them into fewer, larger operations. These modes will have different performance profiles on different hardware and for different applications.
return_sequences	Boolean. Whether to return the last output in the output sequence, or the full sequence.
return_state	Boolean. Whether to return the last state in addition to the output.
go_backwards	Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful	Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll	Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
reset_after	GRU convention (whether to apply reset gate after or before matrix multiplication). False = “before”, True = “after” (default and CuDNN compatible).

理论

在这里插入图片描述

$\hat{c}^{<t>}$ 是记忆状态,对应矩阵形状( $u n i t s * f e a t u r e s + u n i t s * u n i t s + b i a s$ )

$\Gamma_u$ 为更新门(update)，式中的δ为sigmoid函数，这让 $\Gamma_u$ 趋向于0或者1。当 $\Gamma_u$ 为0时 $\hat{c}^{<t>}$ = $\hat{c}^{<t-1>}$ ,不更新,记忆前一步,反之,更新
$\Gamma_u$ 的矩阵形状是( $u n i t s * f e a t u r e s + u n i t s * u n i t s + b i a s$ )

$\Gamma_r$ 为记忆门(remember)控制前一时刻的状态被带入到当前状态中的程度, $\Gamma_r$ 为1,带入信息大,重置门用于控制忽略前一时刻状态信息的程度,越小忽略越多
$\Gamma_r$ 的矩阵形状是( $u n i t s * f e a t u r e s + u n i t s * u n i t s + b i a s$ )

所以GRU层的总参数量为 $(u n i t s * f e a t u r e s + u n i t s * u n i t s + u n i t s) * 3$

注意:
tensorflow2.0中默认reset_after=True,所以separate biases for input and recurrent kernels因此总参数量为 $(u n i t s * f e a t u r e s + u n i t s * u n i t s + u n i t s + u n i t s) * 3$
将input和recurrent kernels的bias分开计算了

参考:
官网
https://www.imooc.com/article/36743
https://stackoverflow.com/questions/57318930/calculating-the-number-of-parameters-of-a-gru-layer-keras

Claroja

关注

2
点赞
踩
14

收藏

觉得还不错? 一键收藏
1
评论
Tensorflow tf.keras.layers.GRU

init__init__( units, activation='tanh', recurrent_activation='sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initia...
复制链接

扫一扫

专栏目录