RNNCell 源码分析

最新推荐文章于 2023-12-14 13:19:59 发布

小智rando

最新推荐文章于 2023-12-14 13:19:59 发布

阅读量339

点赞数

分类专栏： NLP 文章标签： RNN

本文链接：https://blog.csdn.net/qq_38016957/article/details/98472506

版权

NLP 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

在TensorFlow中，RNN相关的源码主要分为两类，一类是表示基础Cell实现逻辑的类，这些类都继承自RNNCell类，主要包括BasicRNNCell、BasicLSTMCell、GRUCell等。另外一类就是让cell在不同时间轴上运转起来的循环流程控制类，包括动态单向RNN流程类tf.nn.dynamic_rnn、动态双向RNN流程类tf.nn.bidirectional_dynamic_rnn等。

1、BasicRNNCell

BasicRNNCell最基本的RNN单元

其他的RNN单元也就是__init__ 和 __call__有些不同
大多数RNN函数的工作原理如下：

outputs = []
cell = RNNCell(hidden_size)
for i in range(rnn_steps):
    output, state = cell(input, state) 
outputs.append(output)
return  outputs, state

1) Initialize an empty list 
2) Create an RNNCell class
3) Run the RNNCell for rnn_steps times
4) Collect all the intermediate outputs in list
5) Return list of all outputs and the last state

我们来看一下BasicRNNCell内部工作：
初始化：

Cell中num_units神经元个数
activation：默认使用的激活函数-tanh
reuse 代表该 Cell 是否可以被重新使用

class BasicRNNCell(RNNCell):
  """The most basic RNN cell."""

  def __init__(self, num_units, input_size=None, activation=tf.nn.tanh, reuse=None):
    if input_size is not None:
      logging.warn("%s: The input_size parameter is deprecated.", self)
    self._num_units = num_units
    self._activation = activation
    self._reuse = reuse

  @property
  def state_size(self):
    return self._num_units

  @property
  def output_size(self):
    return self._num_units

  def __call__(self, inputs, state, scope=None):
    """Most basic RNN: output = new_state = act(W * input + U * state + B)."""
    with _checked_scope(self, scope or "basic_rnn_cell", reuse=self._reuse):
      output = self._activation(
          _linear([inputs, state], self._num_units, True))
    return output, output

在call中，调用了_linear，一个 ‘线性映射函数’。经过线性函数得到的结果在通过激活函数_activation
所以output 的计算：
output = new_state = act(W * input + U * state + B)

它在内部创建权重变量+ bias，（如果’ bias '参数设置为True)

将输入变量的维数更改为’ output_size ’
在这里插入图片描述
看一下_linear内部工作：

Args:
  args: a 2D Tensor or a list of 2D, batch x n, Tensors.
  output_size: int, second dimension of W[i].
  bias: boolean, whether to add a bias term or not.
  bias_start: starting value to initialize the bias; 0 by default.
Returns:
  A 2D Tensor with shape [batch x output_size] equal to
  sum_i(args[i] * W[i]), where W[i]s are newly created matrices.

_BIAS_VARIABLE_NAME = "biases"
_WEIGHTS_VARIABLE_NAME = "weights"

def _linear(args, output_size, bias, bias_start=0.0):
  if args is None or (nest.is_sequence(args) and not args):
    raise ValueError("`args` must be specified")
  if not nest.is_sequence(args):
    args = [args]

  # Calculate the total size of arguments on dimension 1.
  total_arg_size = 0
  shapes = [a.get_shape() for a in args]
  for shape in shapes:
    if shape.ndims != 2:
      raise ValueError("linear is expecting 2D arguments: %s" % shapes)
    if shape[1].value is None:
      raise ValueError("linear expects shape[1] to be provided for shape %s, "
                       "but saw %s" % (shape, shape[1]))
    else:
      total_arg_size += shape[1].value

  dtype = [a.dtype for a in args][0]

  # Now the computation.
  scope = tf.get_variable_scope()
  with tf.variable_scope(scope) as outer_scope:
    weights = tf.get_variable(
        _WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
    if len(args) == 1:
      res = tf.matmul(args[0], weights)
    else:
      res = tf.matmul(tf.concat(args, 1), weights)
    if not bias:
      return res
    with tf.variable_scope(outer_scope) as inner_scope:
      inner_scope.set_partitioner(None)
      biases = tf.get_variable(
          _BIAS_VARIABLE_NAME, [output_size],
          dtype=dtype,
          initializer=tf.constant_initializer(bias_start, dtype=dtype))
    return tf.bias_add(res, biases)

在定义w的时候，将其name为 “weights”，训练时候将会用w与输入相乘
_BIAS_VARIABLE_NAME = “biases”
_WEIGHTS_VARIABLE_NAME = “weights”
在这里插入图片描述
_linear简单举例：
*顺便解释了W * input + U * state + B的计算，可将input和state、W和U拼接在一起，再进行相乘是一样的结果
首先定义x，y输入 _linear 函数

tf.reset_default_graph()
x = tf.constant([[1,2,3,4,5,6],
                 [2,4,6,8,10,12],
                 [1,2,3,4,5,6]], dtype=tf.float32)

y = tf.constant([[1,2,3,4,5,6],
                 [2,4,6,8,10,12],
                 [1,2,3,4,5,6]], dtype=tf.float32)
                 
w = tf.get_variable('weights',
                    initializer=tf.ones(shape=[12 ,4]),
                    dtype=tf.float32)
scope = tf.get_variable_scope()
scope.reuse_variables()

将数据输入函数：
*这时候会将取名为’weights’的参数进行相乘

x_mapped = _linear(
    args=[x,y],
    output_size=4,
    bias=False)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
x_mapped.eval()

输出:

array([[42., 42., 42., 42.],
       [84., 84., 84., 84.],
       [42., 42., 42., 42.]], dtype=float32)

2、BasicLSTMCell

在这里插入图片描述
初始化：

num_unit（隐藏层的输入神经元数）
activation（激活函数，默认采用tanh）
forget_bias是给遗忘门加的偏置，可以减少过拟合
state_is_tuple是state格式控制的，一般用True即可

  def call(self, inputs, state):
    sigmoid = math_ops.sigmoid
    # Parameters of gates are concatenated into one multiply for efficiency.
    if self._state_is_tuple:
      c, h = state
    else:
      c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)
    ## 这里体现了参数state_is_tuple的作用了，如果为True，则传入的状态(c,h)需要为一个元组传入，如果False，则需要传入一个Tensor，其中分别是c和h层叠而成，建议采用第一种为True的方案，减少split带来的开销。

    concat = _linear([inputs, h], 4 * self._num_units, True)
    ## 这里将inputs和上一个输出状态拼接起来，然后进行线性映射，见公式(1.1)。输出为4倍的隐藏层神经元数，是为了后面直接分割得到i,j,f,o(其中的j为公式中的g，代表gate)
    ## 其中的_linear()是rnn_cell_impl.py中的一个函数，作用就是线性映射，有兴趣各位可以移步去看看，其实很简单的。

    i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)
    # 分割
    new_c = (
        c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j))
    new_h = self._activation(new_c) * sigmoid(o)
    # 核心计算，更新状态得到new_c和new_h

    if self._state_is_tuple:
      new_state = LSTMStateTuple(new_c, new_h)
    else:
      new_state = array_ops.concat([new_c, new_h], 1)
    return new_h, new_state

_linear方法中，神经元个数为4*num_units，这里把神经元个数设置成num_units的4倍的原因，是因为在lstm的公式中，遗忘门、输入门、输出门及状态层的计算基数都是在这里插入图片描述我们可以把这四个计算放到一起，在计算完以后再拆开即可。
BasicLSTMCell的call方法，gate_inputs即使把四个通用计算整合在一起进行计算，完后通过array_ops.split把计算结果分拆成输入门、状态层、遗忘门及输出门，完后根据公式生成新的状态向量及输出向量，根据state_is_tuple参数的值决定是把这两个向量封装进LSTMStateTuple返回，还是拼接在一起返回

3、GRUCell

在这里插入图片描述
GRUCell的状态向量state和输出向量output的size都是num_units

def __call__(self, inputs, state, scope=None):
  """Gated recurrent unit (GRU) with nunits cells."""
  with _checked_scope(self, scope or "gru_cell", reuse=self._reuse):
    with vs.variable_scope("gates"):  # Reset gate and update gate.
      # We start with bias of 1.0 to not reset and not update.
      # 一次计算出两个gate的值
      value = sigmoid(_linear(
        [inputs, state], 2 * self._num_units, True, 1.0))
      # 这里的u就是上面的z
      r, u = array_ops.split(
          value=value,
          num_or_size_splits=2,
          axis=1)
    with vs.variable_scope("candidate"):
      c = self._activation(_linear([inputs, r * state],
                                   self._num_units, True))
    new_h = u * state + (1 - u) * c
  # GRU里面输出和state都是一个h
  return new_h, new_h

小智rando

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
RNNCell 源码分析

RNNCellsBasicRNNCell最基本的RNN单元其他的RNN单元也就是__init__ and __call__有些不同大多数RNN函数的工作原理如下：outputs = []cell = RNNCell(hidden_size)for i in range(rnn_steps): output, state = cell(input, state) outpu...
复制链接

扫一扫

专栏目录