keras timedistributed 作用详解
- 关于其作用,Keras官方 API解释地更为简单粗暴:
- keras.layers.TimeDistributed(layer)
这个封装器将一个层应用于输入的每个时间片。
输入至少为 3D,且第一个维度应该是时间所表示的维度。
考虑 32 个样本的一个 batch, 其中每个样本是 10 个 16 维向量的序列。 那么这个 batch 的输入尺寸为 (32, 10, 16), 而 input_shape 不包含样本数量的维度,为 (10, 16)。
你可以使用 TimeDistributed 来将 Dense 层独立地应用到 这 10 个时间步的每一个:
In [2]: import keras
Using TensorFlow backend.
In [3]: from keras.models import Sequential
In [4]: from keras.layers import TimeDistributed, Dense
In [5]:
...: model = Sequential()
...: model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
In [6]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, 10, 8) 136
=================================================================
Total params: 136
Trainable params: 136
Non-trainable params: 0
_________________________________________________________________
In [7]:
...: model = Sequential()
...: model.add(TimeDistributed(Dense(89), input_shape=(10, 16)))
In [8]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_2 (TimeDist (None, 10, 89) 1513
=================================================================
Total params: 1,513
Trainable params: 1,513
Non-trainable params: 0
_________________________________________________________________
In [9]:
...: model = Sequential()
...: model.add(TimeDistributed(Dense(89), input_shape=(None, 16)))
In [10]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_3 (TimeDist (None, None, 89) 1513
=================================================================
Total params: 1,513
Trainable params: 1,513
Non-trainable params: 0
_________________________________________________________________
In [11]:
...: model = Sequential()
...: model.add(TimeDistributed(Dense(89), input_shape=(None, 16, 150)))
In [12]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_4 (TimeDist (None, None, 16, 89) 13439
=================================================================
Total params: 13,439
Trainable params: 13,439
Non-trainable params: 0
_________________________________________________________________
In [13]: 89 * 150 + 89
Out[13]: 13439
In [14]:
...: model = Sequential()
...: model.add(TimeDistributed(Dense(89), input_shape=(None, 16, 150, 100)))
...:
In [15]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_5 (TimeDist (None, None, 16, 150, 89) 8989
=================================================================
Total params: 8,989
Trainable params: 8,989
Non-trainable params: 0
_________________________________________________________________
In [16]: from keras.layers import Conv2D
In [17]:
...: model = Sequential()
...: model.add(TimeDistributed(Conv2D(64, (3, 3)), input_shape=(None, 299, 299, 3)))
In [18]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_6 (TimeDist (None, None, 297, 297, 64 1792
=================================================================
Total params: 1,792
Trainable params: 1,792
Non-trainable params: 0
_________________________________________________________________
In [19]: 3* 3*3*64 + 64
Out[19]: 1792
In [20]:
...: model = Sequential()
...: model.add(TimeDistributed(Conv2D(64, (3, 3)), input_shape=(None, 299, 299, 5)))
In [21]: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_7 (TimeDist (None, None, 297, 297, 64 2944
=================================================================
Total params: 2,944
Trainable params: 2,944
Non-trainable params: 0
_________________________________________________________________
In [22]: 5*3*3*64 + 64
Out[22]: 2944
- 为层包装器:这个包装器允许使用者对输入的每个时间步应用一个层, 且各时间步上参数共享1
- 作用对象:每个时间步, 。也就是时间步恒定(操作前操作后), 这里的时间步一定程度上类似于第二batch_size(这里称为时间步或者时间batch)2
# 作为模型第一层
#input_shape = (None, 10, 16)
model = Sequential()
model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
# 现在 model.output_shape = (None, 10, 8)
#input_shape = (None, 10, 299, 299, 3)
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3)),
input_shape=(10, 299, 299, 3)))
# out_shape = (None, 10, 299, 299, 64)
- 参数计算, 该层和LSTM一样不同时间步共享参数3
In [21]: #coding:utf-8
...: from keras.models import Input,Model
...: from keras.layers import Dense,Conv2D,TimeDistributed
...:
...: input_ = Input(shape=(12,8))
...: out = TimeDistributed(Dense(units=10))(input_)
...: #out = Dense(units=10)(input_)
...: model = Model(inputs=input_,outputs=out)
...: model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 12, 8) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 12, 10) 90
=================================================================
Total params: 90
Trainable params: 90
Non-trainable params: 0
_________________________________________________________________
In [24]: from keras.models import Input,Model
...: from keras.layers import Dense,Conv2D,TimeDistributed
...:
...: input_ = Input(shape=(12,32,32,3))
...: out = TimeDistributed(Conv2D(filters=32,kernel_size=(3,3),padding='same'))(input_)
...: model = Model(inputs=input_,outputs=out);model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, 12, 32, 32, 3) 0
_________________________________________________________________
time_distributed_9 (TimeDist (None, 12, 32, 32, 32) 896
=================================================================
Total params: 896
Trainable params: 896
Non-trainable params: 0
_________________________________________________________________
In [31]: from keras.models import Input,Model
...: from keras.layers import Dense,Conv2D,TimeDistributed
...:
...: input_ = Input(shape=(32,32,2))
...: out = Conv2D(filters=32,kernel_size=(3,3),padding='same')(input_)
...: model = Model(inputs=input_,outputs=out);model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) (None, 32, 32, 2) 0
_________________________________________________________________
conv2d_12 (Conv2D) (None, 32, 32, 32) 608
=================================================================
Total params: 608
Trainable params: 608
Non-trainable params: 0
_________________________________________________________________
t
o
t
a
l
P
a
r
a
m
e
t
e
r
s
=
f
i
l
t
e
r
s
∗
k
e
r
n
e
l
S
i
z
e
w
e
i
g
h
t
∗
k
e
r
n
e
l
S
i
z
e
h
i
g
h
t
∗
i
n
p
u
t
C
h
a
n
n
e
l
s
+
b
i
a
s
totalParameters = filters * kernelSize_{weight}*kernelSize_{hight}*inputChannels + bias
totalParameters=filters∗kernelSizeweight∗kernelSizehight∗inputChannels+bias
32
∗
3
∗
3
∗
2
+
32
32 * 3 * 3 *2 + 32
32∗3∗3∗2+32
- TimeDistributed通过在LSTMs输出上应用相同的Dense层(相同的权重),每次执行一个步骤来实现这个技巧。这样,输出层只需要一个到每个LSTM单元的连接(加上一个偏置)4。
这样相对于直接使用全连接神经网络减少了很多参数
举例:
若使用全连接 由于不能全职共享输出为
shape | 操作 | 参数 |
---|---|---|
(None, timestep, input_dim) | 输入 | 0 |
(None, timestep * input_dim) | reshape | 0 |
(None, output_dim) | dense(output_dim) | timestep × input_dim ×output_dim + output_dim |
若使用timedistributed
shape | 操作 | 参数 |
---|---|---|
(None, timestep, input_dim) | 输入 | 0 |
(None, output_dim) | timedistributed(dense(output_dim), input_shape) | input_dim ×output_dim + output_dim |
综合对比减少了timestep倍的参数