在TensorFlow中,RNN相关的源码主要分为两类,一类是表示基础Cell实现逻辑的类,这些类都继承自RNNCell类,主要包括BasicRNNCell、BasicLSTMCell、GRUCell等。另外一类就是让cell在不同时间轴上运转起来的循环流程控制类,包括动态单向RNN流程类tf.nn.dynamic_rnn、动态双向RNN流程类tf.nn.bidirectional_dynamic_rnn等。
1、BasicRNNCell
BasicRNNCell
最基本的RNN单元
- 其他的RNN单元也就是
__init__
和__call__
有些不同 - 大多数RNN函数的工作原理如下:
outputs = []
cell = RNNCell(hidden_size)
for i in range(rnn_steps):
output, state = cell(input, state)
outputs.append(output)
return outputs, state
1) Initialize an empty list
2) Create an RNNCell class
3) Run the RNNCell for rnn_steps times
4) Collect all the intermediate outputs in list
5) Return list of all outputs and the last state
我们来看一下BasicRNNCell内部工作:
初始化:
- Cell中num_units神经元个数
- activation:默认使用的激活函数-tanh
- reuse 代表该 Cell 是否可以被重新使用
class BasicRNNCell(RNNCell):
"""The most basic RNN cell."""
def __init__(self, num_units, input_size=None, activation=tf.nn.tanh, reuse=None):
if input_size is not None:
logging.warn("%s: The input_size parameter is deprecated.", self)
self._num_units = num_units
self._activation = activation
self._reuse = reuse
@property
def state_size(self):
return self._num_units
@property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Most basic RNN: output = new_state = act(W * input + U * state + B)."""
with _checked_scope(self, scope or "basic_rnn_cell", reuse=self._reuse):
output = self._activation(
_linear([inputs, state], self._num_units, True))
return output, output
在call中,调用了_linear
,一个 ‘线性映射函数’。经过线性函数得到的结果在通过激活函数_activation
所以output 的计算:
output = new_state = act(W * input + U * state + B)
-
它在内部创建权重变量+ bias,(如果’ bias '参数设置为True)
-
将输入变量的维数更改为’ output_size ’
看一下_linear内部工作:Args: args: a 2D Tensor or a list of 2D, batch x n, Tensors. output_size: int, second dimension of W[i]. bias: boolean, whether to add a bias term or not. bias_start: starting value to initialize the bias; 0 by default. Returns: A 2D Tensor with shape [batch x output_size] equal to sum_i(args[i] * W[i]), where W[i]s are newly created matrices.
_BIAS_VARIABLE_NAME = "biases"
_WEIGHTS_VARIABLE_NAME = "weights"
def _linear(args, output_size, bias, bias_start=0.0):
if args is None or (nest.is_sequence(args) and not args):
raise ValueError("`args` must be specified")
if not nest.is_sequence(args):
args = [args]
# Calculate the total size of arguments on dimension 1.
total_arg_size = 0
shapes = [a.get_shape() for a in args]
for shape in shapes:
if shape.ndims != 2:
raise ValueError("linear is expecting 2D arguments: %s" % shapes)
if shape[1].value is None:
raise ValueError("linear expects shape[1] to be provided for shape %s, "
"but saw %s" % (shape, shape[1]))
else:
total_arg_size += shape[1].value
dtype = [a.dtype for a in args][0]
# Now the computation.
scope = tf.get_variable_scope()
with tf.variable_scope(scope) as outer_scope:
weights = tf.get_variable(
_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
if len(args) == 1:
res = tf.matmul(args[0], weights)
else:
res = tf.matmul(tf.concat(args, 1), weights)
if not bias:
return res
with tf.variable_scope(outer_scope) as inner_scope:
inner_scope.set_partitioner(None)
biases = tf.get_variable(
_BIAS_VARIABLE_NAME, [output_size],
dtype=dtype,
initializer=tf.constant_initializer(bias_start, dtype=dtype))
return tf.bias_add(res, biases)
在定义w的时候,将其name为 “weights”,训练时候将会用w与输入相乘
_BIAS_VARIABLE_NAME = “biases”
_WEIGHTS_VARIABLE_NAME = “weights”
_linear
简单举例:
*顺便解释了W * input + U * state + B
的计算,可将input和state、W和U拼接在一起,再进行相乘是一样的结果
首先定义x,y输入 _linear 函数
tf.reset_default_graph()
x = tf.constant([[1,2,3,4,5,6],
[2,4,6,8,10,12],
[1,2,3,4,5,6]], dtype=tf.float32)
y = tf.constant([[1,2,3,4,5,6],
[2,4,6,8,10,12],
[1,2,3,4,5,6]], dtype=tf.float32)
w = tf.get_variable('weights',
initializer=tf.ones(shape=[12 ,4]),
dtype=tf.float32)
scope = tf.get_variable_scope()
scope.reuse_variables()
将数据输入函数:
*这时候会将取名为’weights’的参数进行相乘
x_mapped = _linear(
args=[x,y],
output_size=4,
bias=False)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
x_mapped.eval()
输出:
array([[42., 42., 42., 42.],
[84., 84., 84., 84.],
[42., 42., 42., 42.]], dtype=float32)
2、BasicLSTMCell
初始化:
- num_unit(隐藏层的输入神经元数)
- activation(激活函数,默认采用tanh)
- forget_bias是给遗忘门加的偏置,可以减少过拟合
- state_is_tuple是state格式控制的,一般用True即可
def call(self, inputs, state):
sigmoid = math_ops.sigmoid
# Parameters of gates are concatenated into one multiply for efficiency.
if self._state_is_tuple:
c, h = state
else:
c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)
## 这里体现了参数state_is_tuple的作用了,如果为True,则传入的状态(c,h)需要为一个元组传入,如果False,则需要传入一个Tensor,其中分别是c和h层叠而成,建议采用第一种为True的方案,减少split带来的开销。
concat = _linear([inputs, h], 4 * self._num_units, True)
## 这里将inputs和上一个输出状态拼接起来,然后进行线性映射,见公式(1.1)。输出为4倍的隐藏层神经元数,是为了后面直接分割得到i,j,f,o(其中的j为公式中的g,代表gate)
## 其中的_linear()是rnn_cell_impl.py中的一个函数,作用就是线性映射,有兴趣各位可以移步去看看,其实很简单的。
i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)
# 分割
new_c = (
c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j))
new_h = self._activation(new_c) * sigmoid(o)
# 核心计算,更新状态得到new_c和new_h
if self._state_is_tuple:
new_state = LSTMStateTuple(new_c, new_h)
else:
new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state
_linear
方法中,神经元个数为4*num_units,这里把神经元个数设置成num_units的4倍的原因,是因为在lstm的公式中,遗忘门、输入门、输出门及状态层的计算基数都是 我们可以把这四个计算放到一起,在计算完以后再拆开即可。
BasicLSTMCell的call方法,gate_inputs即使把四个通用计算整合在一起进行计算,完后通过array_ops.split把计算结果分拆成输入门、状态层、遗忘门及输出门,完后根据公式生成新的状态向量及输出向量,根据state_is_tuple参数的值决定是把这两个向量封装进LSTMStateTuple返回,还是拼接在一起返回
3、GRUCell
GRUCell的状态向量state和输出向量output的size都是num_units
def __call__(self, inputs, state, scope=None):
"""Gated recurrent unit (GRU) with nunits cells."""
with _checked_scope(self, scope or "gru_cell", reuse=self._reuse):
with vs.variable_scope("gates"): # Reset gate and update gate.
# We start with bias of 1.0 to not reset and not update.
# 一次计算出两个gate的值
value = sigmoid(_linear(
[inputs, state], 2 * self._num_units, True, 1.0))
# 这里的u就是上面的z
r, u = array_ops.split(
value=value,
num_or_size_splits=2,
axis=1)
with vs.variable_scope("candidate"):
c = self._activation(_linear([inputs, r * state],
self._num_units, True))
new_h = u * state + (1 - u) * c
# GRU里面输出和state都是一个h
return new_h, new_h