Tensorflow中GRU和LSTM的权重初始化_gru权重初始化-CSDN博客

本文探讨了在TensorFlow中使用GRU和LSTM时如何正确地进行权重初始化，并通过代码实例验证了variable_scope中初始化方法的有效性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

GRU和LSTM权重初始化

在编写模型的时候，有时候你希望RNN用某种特别的方式初始化RNN的权重矩阵，比如xaiver或者orthogonal，这时候呢，只需要：

     
      cell = LSTMCell 
      if self.args.use_lstm 
      else GRUCell
     
     
      with tf.variable_scope(initializer=tf.orthogonal_initializer()):
     
     
          input = tf.nn.embedding_lookup(embedding, questions_bt)
     
     
          cell_fw = MultiRNNCell(cells=[cell(hidden_size) 
      for _ 
      in range(num_layers)])
     
     
          cell_bw = MultiRNNCell(cells=[cell(hidden_size) 
      for _ 
      in range(num_layers)])
     
     
          outputs, last_states = tf.nn.bidirectional_dynamic_rnn(cell_bw=cell_bw,
     
     
                                                                 cell_fw=cell_fw,
     
     
                                                                 dtype=
      "float32",
     
     
                                                                 inputs=input,
     
     
                                                                 swap_memory=
      True)

那么这么写到底是不是正确的初始化了权重呢，我们跟着bidirectional_dynamic_rnn的代码看进去，先只看forward：

     
      with vs.variable_scope(
      "fw") 
      as fw_scope:
     
     
          output_fw, output_state_fw = dynamic_rnn(
     
     
              cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
     
     
              initial_state=initial_state_fw, dtype=dtype,
     
     
              parallel_iterations=parallel_iterations, swap_memory=swap_memory,
     
     
              time_major=time_major, scope=fw_scope)

发现它增加了一个variable_scope叫做fw_scope，继续看dynamic_rnn发现这个scope只用在了缓存管理中，而dynamic_rnn实际调用了下面的内容：

     
      (outputs, final_state) = _dynamic_rnn_loop(
     
     
              cell,
     
     
              inputs,
     
     
              state,
     
     
              parallel_iterations=parallel_iterations,
     
     
              swap_memory=swap_memory,
     
     
              sequence_length=sequence_length,
     
     
              dtype=dtype)

总之，调用来调用去，最后调用到了一个语句：

1	call_cell = lambda: cell(input_t, state)

好，最后都调用了GRUCell或者LSTMCell的__call__()方法，我们顺着看进去，比如GRU的__call__()长下面这个样子：

     
      def __call__(self, inputs, state, scope=None):
     
         
      """Gated recurrent unit (GRU) with nunits cells."""
     
         
      with _checked_scope(self, scope 
      or 
      "gru_cell", reuse=self._reuse):
     
             
      with vs.variable_scope(
      "gates"):  
      # Reset gate and update gate.
     
                 
      # We start with bias of 1.0 to not reset and not update.
     
     
                  value = sigmoid(_linear(
     
     
                      [inputs, state], 
      2 * self._num_units, 
      True, 
      1.0))
     
     
                  r, u = array_ops.split(
     
     
                      value=value,
     
     
                      num_or_size_splits=
      2,
     
     
                      axis=
      1)
     
                 
      with vs.variable_scope(
      "candidate"):
     
     
                      c = self._activation(_linear([inputs, r * state],
     
     
                                                   self._num_units, 
      True))
     
     
                      new_h = u * state + (
      1 - u) * c
     
                     
      return new_h, new_h

咦？怎么没有权重和偏置呢？好像__init__()方法里也没有，看到这个_linear()了吧，其实所有的权重都在这个方法里面（LSTMCell也一样），这个方法中有玄机了：

     
      with vs.variable_scope(scope) 
      as outer_scope:
     
     
          weights = vs.get_variable(
     
     
              _WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
     
     
      # ....some code    
     
     
      with vs.variable_scope(outer_scope) 
      as inner_scope:
     
     
          inner_scope.set_partitioner(
      None)
     
     
          biases = vs.get_variable(
     
     
              _BIAS_VARIABLE_NAME, [output_size],
     
     
              dtype=dtype,
     
     
              initializer=init_ops.constant_initializer(bias_start, dtype=dtype))