GRU和LSTM权重初始化
在编写模型的时候,有时候你希望RNN用某种特别的方式初始化RNN的权重矩阵,比如xaiver
或者orthogonal
,这时候呢,只需要:
1
2
3
4
5
6
7
8
9
10
|
cell = LSTMCell
if self.args.use_lstm
else GRUCell
with tf.variable_scope(initializer=tf.orthogonal_initializer()):
input = tf.nn.embedding_lookup(embedding, questions_bt)
cell_fw = MultiRNNCell(cells=[cell(hidden_size)
for _
in range(num_layers)])
cell_bw = MultiRNNCell(cells=[cell(hidden_size)
for _
in range(num_layers)])
outputs, last_states = tf.nn.bidirectional_dynamic_rnn(cell_bw=cell_bw,
cell_fw=cell_fw,
dtype=
"float32",
inputs=input,
swap_memory=
True)
|
那么这么写到底是不是正确的初始化了权重呢,我们跟着bidirectional_dynamic_rnn的代码看进去,先只看forward:
1
2
3
4
5
6
|
with vs.variable_scope(
"fw")
as fw_scope:
output_fw, output_state_fw = dynamic_rnn(
cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
initial_state=initial_state_fw, dtype=dtype,
parallel_iterations=parallel_iterations, swap_memory=swap_memory,
time_major=time_major, scope=fw_scope)
|
发现它增加了一个variable_scope叫做fw_scope,继续看dynamic_rnn发现这个scope只用在了缓存管理中,而dynamic_rnn实际调用了下面的内容:
1
2
3
4
5
6
7
8
|
(outputs, final_state) = _dynamic_rnn_loop(
cell,
inputs,
state,
parallel_iterations=parallel_iterations,
swap_memory=swap_memory,
sequence_length=sequence_length,
dtype=dtype)
|
总之,调用来调用去,最后调用到了一个语句:
1
|
call_cell =
lambda: cell(input_t, state)
|
好,最后都调用了GRUCell或者LSTMCell的__call__()方法,我们顺着看进去,比如GRU的__call__()长下面这个样子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
def __call__(self, inputs, state, scope=None):
"""Gated recurrent unit (GRU) with nunits cells."""
with _checked_scope(self, scope
or
"gru_cell", reuse=self._reuse):
with vs.variable_scope(
"gates"):
value = sigmoid(_linear(
[inputs, state],
2 * self._num_units,
True,
1.0))
r, u = array_ops.split(
value=value,
num_or_size_splits=
2,
axis=
1)
with vs.variable_scope(
"candidate"):
c = self._activation(_linear([inputs, r * state],
self._num_units,
True))
new_h = u * state + (
1 - u) * c
return new_h, new_h
|
咦?怎么没有权重和偏置呢?好像__init__()方法里也没有,看到这个_linear()了吧,其实所有的权重都在这个方法里面(LSTMCell也一样),这个方法中有玄机了:
1
2
3
4
5
6
7
8
9
10
|
with vs.variable_scope(scope)
as outer_scope:
weights = vs.get_variable(
_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
with vs.variable_scope(outer_scope)
as inner_scope:
inner_scope.set_partitioner(
None)
biases = vs.get_variable(
_BIAS_VARIABLE_NAME, [output_size],
dtype=dtype,
initializer=init_ops.constant_initializer(bias_start, dtype=dtype))
|
所以,这个方法里面,就是又增加了一个variable_scope,然后调用get_variable()方法获取权重和偏置。所以,我们的variable_scope里面嵌套了若干层variable_scope后,我们定义的初始化方法还有没有用呢,实验一下吧:

好的,经过我们的测试,嵌套的variable_scope如果内层没有初始化方法,那么以外层的为准。所以我们的结论呼之欲出:
- RNN的两个变种在Tensorflow版本1.1.0的实现,只需要调用它们时在variable_scope加上初始化方法,它们的权重就会以该方式初始化;
- 但是无论是LSTM还是GRU,都没有提供偏置的初始化方法(不过好像可以定义初始值)。
原文地址:
http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6%9D%83%E9%87%8D%E5%88%9D%E5%A7%8B%E5%8C%96/