tensorflow中多层lstm项目代码详解(进度:1/4)

1. 项目地址

多层LSTM项目

2. 项目数据

使用text8.zip
Linux下下载指令
curl http://mattmahoney.net/dc/text8.zip > text8.zip

3. 命令行运行指令

python3.5 ptb_word_lm.py --data_path=simple-examples/data/

4. 程序入口

项目由ptb_word_lm.py文件中第526-527行开始:
if __name__ == "__main__":
tf.app.run()
函数tf.app.run()是首先加载flags的参数项,然后执行main函数。其中参数是使用tf.app.flags.FLAGS定义的。

5. 开始main()函数

main函数位于ptb_word_lm.py文件中第444行,其中第445-454行为判断是否传入数据文件地址以及GPU信息。

if not FLAGS.data_path:
    raise ValueError("Must set --data_path to PTB data directory")
  gpus = [
      x.name for x in device_lib.list_local_devices() if x.device_type == "GPU"
  ]
  if FLAGS.num_gpus > len(gpus):
    raise ValueError(
        "Your machine has only %d gpus "
        "which is less than the requested --num_gpus=%d."
        % (len(gpus), FLAGS.num_gpus))

6. 获取配置信息

ptb_word_lm.py文件中第456-462行为获取配置信息

  raw_data = reader.ptb_raw_data(FLAGS.data_path)
  train_data, valid_data, test_data, _ = raw_data

  config = get_config()
  eval_config = get_config()
  eval_config.batch_size = 1
  eval_config.num_steps = 1

配置信息包含:

init_scale
learning_rate
max_grad_norm
num_layers
num_steps
hidden_size
max_epoch
max_max_epoch
keep_prob
lr_decay
batch_size
vocab_size
rnn_mode

7. 初始化
7.1 initializer = tf.random_uniform_initializer(-config.init_scale,config.init_scale)

ptb_word_lm.py文件中第465行
作用为生成具有均匀分布的张量的初始化器,参数包括[2]

minval:一个 python 标量或一个标量张量。要生成的随机值范围的下限。
maxval:一个 python 标量或一个标量张量。要生成的随机值范围的上限。对于浮点类型默认为1。
seed:一个 Python 整数。用于创建随机种子。查看 tf.set_random_seed 的行为。
dtype:数据类型。

7.2 Train模块(468-473)、Valid模块(475-479)、Test模块(481-486)初始化

由于三个模块的代码基本一致,所以以位于 ptb_word_lm.py文件中第468-473行的Train模块为例:

    with tf.name_scope("Train"):
      train_input = PTBInput(config=config, data=train_data, name="TrainInput")
      with tf.variable_scope("Model", reuse=None, initializer=initializer):
        m = PTBModel(is_training=True, config=config, input_=train_input)
      tf.summary.scalar("Training Loss", m.cost)
      tf.summary.scalar("Learning Rate", m.lr)
7.2.1 train_input = PTBInput(config=config, data=train_data, name=“TrainInput”)

PTBInput类位于 ptb_word_lm.py文件中第102-110行,仅包含一个__init__函数用于设置各项参数,PTBInput类包含参数为:

    self.batch_size = batch_size = config.batch_size
    self.num_steps = num_steps = config.num_steps
    self.epoch_size = ((len(data) // batch_size) - 1) // num_steps
    self.input_data, self.targets = reader.ptb_producer(
        data, batch_size, num_steps, name=name)

其中reader.ptb_producer方法位于reader.py中第86行,参数为

raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
name: the name of this operation (optional).

返回值为[batch_size, num_steps]大小的input_data数据和[batch_size, num_steps]大小的label数据 (对于为什么最后形状是[batch_size, num_steps]还没弄明白) 。对于input_data和label数据的内容,在未id化之前的形式举例来说就是,假设有一句话

They also do not focus on a theoretical justification and fit accu-racy estimates to a library of hand-designed learning curves.

如果我们取They also do not focus 作为input_data,那么对应得到的label就是also do not focus on

7.2.2 m = PTBModel(is_training=True, config=config, input_=train_input)

PTBModel类位于ptb_word_lm.py文件中第113-317行,初始化参数包括:

    self._is_training = is_training
    self._input = input_
    self._rnn_params = None
    self._cell = None
    self.batch_size = input_.batch_size
    self.num_steps = input_.num_steps
    size = config.hidden_size
    vocab_size = config.vocab_size

包含方法(仅列举,用到时再分析):

_build_rnn_graph(self, inputs, config, is_training)
_build_rnn_graph_cudnn(self, inputs, config, is_training)
_get_lstm_cell(self, config, is_training)
_build_rnn_graph_lstm(self, inputs, config, is_training)
make_cell()
assign_lr(self, session, lr_value)
export_ops(self, name)
import_ops(self)

初始化过程:
①初始化上述参数
②对输入数据进行向量化

    with tf.device("/cpu:0"):
      embedding = tf.get_variable(
          "embedding", [vocab_size, size], dtype=data_type())
      inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

其中embedding为输入文本中所有单词(上限为设定好的vocab_size)向量表示矩阵,一般来说每一行代表一个单词。
tf.get_variable函数的作用是获取具有传入参数参数的现有变量或创建一个新变量,原型为[3]

tf.get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=None,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None,
    synchronization=tf.VariableSynchronization.AUTO,
    aggregation=tf.VariableAggregation.NONE
)

参数包括[4]

name:新变量或现有变量的名称。
shape:新变量或现有变量的形状。
dtype:新变量或现有变量的类型(默认为 DT_FLOAT)。
initializer:创建变量的初始化器。
regularizer:一个函数(张量 - >张量或无);将其应用于新创建的变量的结果将被添加到集合 tf.GraphKeys.REGULARIZATION_LOSSES 中,并可用于正则化。
trainable:如果为 True,还将变量添加到图形集合:GraphKeys.TRAINABLE_VARIABLES。
collections:要将变量添加到其中的图形集合键的列表。默认为 [GraphKeys.LOCAL_VARIABLES]。
caching_device:可选的设备字符串或函数,描述变量应该被缓存以读取的位置。默认为变量的设备,如果不是 None,则在其他设备上进行缓存。典型的用法的在使用该变量的操作所在的设备上进行缓存,通过 Switch 和其他条件语句来复制重复数据删除。
partitioner:(可选)可调用性,它接受要创建的变量的完全定义的 TensorShape 和 dtype,并且返回每个坐标轴的分区列表(当前只能对一个坐标轴进行分区)。
validate_shape:如果为假,则允许使用未知形状的值初始化变量。如果为真,则默认情况下,initial_value 的形状必须是已知的。
use_resource:如果为假,则创建一个常规变量。如果为真,则创建一个实验性的 ResourceVariable,而不是具有明确定义的语义。默认为假(稍后将更改为真)。
custom_getter:可调用的,将第一个参数作为真正的 getter,并允许覆盖内部的 get_variable 方法。custom_getter 的签名应该符合这种方法,但最经得起未来考验的版本将允许更改:def custom_getter(getter, *args, **kwargs)。还允许直接访问所有 get_variable 参数:def custom_getter(getter, name, *args, **kwargs)。创建具有修改的名称的变量的简单标识自定义 getter 是:python def custom_getter(getter, name, *args, **kwargs): return getter(name + ‘_suffix’, *args, **kwargs)

其中initializer的可选参数包括[5]:

tf.constant_initializer:常量初始化函数
tf.random_normal_initializer:正态分布
tf.truncated_normal_initializer:截取的正态分布
tf.random_uniform_initializer:均匀分布
tf.zeros_initializer:全部是0
tf.ones_initializer:全是1
tf.uniform_unit_scaling_initializer:满足均匀分布,但不影响输出数量级的随机值

当得到所有单词的向量化表示之后从中取出当前输入的单词对应的向量:

inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

tf.nn.embedding_lookup函数的用法主要是选取一个张量里面索引对应的元素,原型为[6]

tf.nn.embedding_lookup(
    params,
    ids,
    partition_strategy='mod',
    name=None,
    validate_indices=True,
    max_norm=None
)

参数包括:

params: A single tensor representing the complete embedding tensor, or a list of P tensors all of same shape except for the first dimension, representing sharded embedding tensors. Alternatively, a PartitionedVariable, created by partitioning along dimension 0. Each element must be appropriately sized for the given partition_strategy.
ids: A Tensor with type int32 or int64 containing the ids to be looked up in params.
partition_strategy: A string specifying the partitioning strategy, relevant if len(params) > 1. Currently “div” and “mod” are supported. Default is “mod”.
name: A name for the operation (optional).
validate_indices: DEPRECATED. If this operation is assigned to CPU, values in indices are always validated to be within range. If assigned to GPU, out-of-bound indices result in safe but unspecified behavior, which may include raising an error.
max_norm: If not None, each embedding is clipped if its l2-norm is larger than this value.

③判断是否需要dropout(这块我还没大搞明白):

    if is_training and config.keep_prob < 1:
      inputs = tf.nn.dropout(inputs, config.keep_prob)

④获取初始化的LSTM的输出及细胞状态

output, state = self._build_rnn_graph(inputs, config, is_training)

④-a 选择使用LSTM还是cudnn(ptb_word_lm.py文件中第171-175行):
_build_rnn_graph方法的定义为:

  def _build_rnn_graph(self, inputs, config, is_training):
    if config.rnn_mode == CUDNN:
      return self._build_rnn_graph_cudnn(inputs, config, is_training)
    else:
      return self._build_rnn_graph_lstm(inputs, config, is_training)

_build_rnn_graph_lstm方法的定义为(ptb_word_lm.py文件中第211-244行):

  def _build_rnn_graph_lstm(self, inputs, config, is_training):
    def make_cell():
      cell = self._get_lstm_cell(config, is_training)
      if is_training and config.keep_prob < 1:
        cell = tf.contrib.rnn.DropoutWrapper(
            cell, output_keep_prob=config.keep_prob)
      return cell

    cell = tf.contrib.rnn.MultiRNNCell(
        [make_cell() for _ in range(config.num_layers)], state_is_tuple=True)

    self._initial_state = cell.zero_state(config.batch_size, data_type())
    state = self._initial_state
    # Simplified version of tf.nn.static_rnn().
    # This builds an unrolled LSTM for tutorial purposes only.
    # In general, use tf.nn.static_rnn() or tf.nn.static_state_saving_rnn().
    #
    # The alternative version of the code below is:
    #
    # inputs = tf.unstack(inputs, num=self.num_steps, axis=1)
    # outputs, state = tf.nn.static_rnn(cell, inputs,
    #                                   initial_state=self._initial_state)
    outputs = []
    with tf.variable_scope("RNN"):
      for time_step in range(self.num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
    output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
    return output, state

④-b 构建lstm节点
首先来看_build_rnn_graph_lstm方法中的make_cell方法,变量cell通过_get_lstm_cell方法得到了一个lstm单元,接下来如果满足drop_out的要求则进行drop_out,否则直接返回cell即可,此时的cell是一个由TensorFlow定义好的LSTM方法的一个实例。
相关代码为(ptb_word_lm.py文件中第216-221行):

    def make_cell():
      cell = self._get_lstm_cell(config, is_training)
      if is_training and config.keep_prob < 1:
        cell = tf.contrib.rnn.DropoutWrapper(
            cell, output_keep_prob=config.keep_prob)
      return cell

_get_lstm_cell方法的定义为(ptb_word_lm.py文件中第201-209行):

  def _get_lstm_cell(self, config, is_training):
    if config.rnn_mode == BASIC:
      return tf.contrib.rnn.BasicLSTMCell(
          config.hidden_size, forget_bias=0.0, state_is_tuple=True,
          reuse=not is_training)
    if config.rnn_mode == BLOCK:
      return tf.contrib.rnn.LSTMBlockCell(
          config.hidden_size, forget_bias=0.0)
    raise ValueError("rnn_mode %s not supported" % config.rnn_mode)

代码中BasicLSTMCellLSTMBlockCell的区别在于
④-c 构建多层LSTM
相关代码为(ptb_word_lm.py文件中第223行):

 cell = tf.contrib.rnn.MultiRNNCell(
        [make_cell() for _ in range(config.num_layers)], state_is_tuple=True)

接下来使用tf.contrib.rnn.MultiRNNCell构建一个多层的RNN网络结构,MultiRNNCell的原型[8]

tf.get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=None,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None,
    synchronization=tf.VariableSynchronization.AUTO,
    aggregation=tf.VariableAggregation.NONE
)

参数为:

name: The name of the new or existing variable.
shape: Shape of the new or existing variable.
dtype: Type of the new or existing variable (defaults to DT_FLOAT).
initializer: Initializer for the variable if one is created. Can either be an initializer object or a Tensor. If it’s a Tensor, its shape must be known unless validate_shape is False.
regularizer: A (Tensor -> Tensor or None) function; the result of applying it on a newly created variable will be added to the collection tf.GraphKeys.REGULARIZATION_LOSSES and can be used for regularization.
trainable: If True also add the variable to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
collections: List of graph collections keys to add the Variable to. Defaults to [GraphKeys.GLOBAL_VARIABLES] (see tf.Variable).
caching_device: Optional device string or function describing where the Variable should be cached for reading. Defaults to the Variable’s device. If not None, caches on another device. Typical use is to cache on the device where the Ops using the Variable reside, to deduplicate copying through Switch and other conditional statements.
partitioner: Optional callable that accepts a fully defined TensorShape and dtype of the Variable to be created, and returns a list of partitions for each axis (currently only one axis can be partitioned).
validate_shape: If False, allows the variable to be initialized with a value of unknown shape. If True, the default, the shape of initial_value must be known. For this to be used the initializer must be a Tensor and not an initializer object.
use_resource: If False, creates a regular Variable. If true, creates an experimental ResourceVariable instead with well-defined semantics. Defaults to False (will later change to True). When eager execution is enabled this argument is always forced to be True.
custom_getter: Callable that takes as a first argument the true getter, and allows overwriting the internal get_variable method. The signature of custom_getter should match that of this method, but the most future-proof version will allow for changes: def custom_getter(getter, *args, **kwargs). Direct access to all get_variable parameters is also allowed: def custom_getter(getter, name, *args, **kwargs). A simple identity custom getter that simply creates variables with modified names is:
def custom_getter(getter, name, *args, **kwargs):
return getter(name + ‘_suffix’, *args, **kwargs)
constraint: An optional projection function to be applied to the variable after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected Tensor representing the value of the variable and return the Tensor for the projected value (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.
synchronization: Indicates when a distributed a variable will be aggregated. Accepted values are constants defined in the class tf.VariableSynchronization. By default the synchronization is set to AUTO and the current DistributionStrategy chooses when to synchronize. If synchronization is set to ON_READ, trainable must not be set to True.
aggregation: Indicates how a distributed variable will be aggregated. Accepted values are constants defined in the class tf.VariableAggregation.

需要注意的是[9][10]
下面这样直接用一个BasicLSTMCell复制是错误的,会导致各层共享权重

  basic_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_unit)
  multi_cell = tf.nn.rnn_cell.MultiRNNCell([basic_cell]*layer_num)

cell = tf.contrib.rnn.MultiRNNCell([cell]*2, state_is_tuple=True)

官方推荐的写法,使用列表生成器:

  num_units = [128, 64]
  cells = [BasicLSTMCell(num_units=n) for n in num_units]
  stacked_rnn_cell = MultiRNNCell(cells)

cell = tf.contrib.rnn.MultiRNNCell([cell,cell], state_is_tuple=True)

④-d 获取细胞状态及输出
相关代码为(ptb_word_lm.py文件中第237-244行)

	self._initial_state = cell.zero_state(config.batch_size, data_type())
    state = self._initial_state
    outputs = []
    with tf.variable_scope("RNN"):
      for time_step in range(self.num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
    output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
    return output, state

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值