Passing output of 3DCNN layer to LSTM layer

营赢盈英

于 2024-07-23 10:11:19 发布

阅读量554

点赞数 13

分类专栏： AI 文章标签： tensorflow deep-learning lstm recurrentneural network

本文链接：https://blog.csdn.net/suiusoar/article/details/140626914

版权

AI 专栏收录该内容

73 篇文章 0 订阅

订阅专栏

题意：将3DCNN（三维卷积神经网络）层的输出传递给LSTM（长短期记忆网络）层

问题背景：

Whilst trying to learn Recurrent Neural Networks(RNNs) am trying to train an Automatic Lip Reading Model using 3DCNN + LSTM. I tried out a code I found for the same on Kaggle.

在尝试学习循环神经网络（RNNs）的过程中，我正试图使用3DCNN（三维卷积神经网络）+ LSTM来训练一个自动唇读模型。我在Kaggle上找到了一个相同的代码示例，并尝试运行了它。

model = Sequential()

# 1st layer group
model.add(Conv3D(32, (3, 3, 3), strides = 1, input_shape=(22, 100, 100, 1), activation='relu', padding='valid'))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))

model.add(Conv3D(64, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))

model.add(Conv3D(128, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))

shape = model.get_output_shape_at(0)
model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))

# LSTMS - Recurrent Network Layer
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(.5))

model.add((Flatten()))

# # FC layers group
model.add(Dense(2048, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))

model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.summary()

However, it returns the following error: 然而它返回了下面的错误：

   11 model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
     12 
---> 13 shape = model.get_output_shape_at(0)
     14 model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))
     15 
RuntimeError: The layer sequential_2 has never been called and thus has no defined output shape.

From my understanding, I see that the author of the code was trying to get the output shape of the first layer and reshape it such as to forward to the LSTM layer. Found a similar post following which I made the following changes and the error was fixed.

据我理解，代码的作者试图获取第一层的输出形状，并将其重塑以传递给LSTM层。我发现了一个类似的帖子，按照该帖子的指导，我进行了以下更改，并解决了错误。

shape = model.layers[-1].output_shape
# shape = model.get_output_shape_at(0)

Still I am confused as to what the code does to forward the input from the CNN layer to LSTM layer. Any help to make me understand the above is appreciated. Thank You!!

我仍然对代码如何将CNN层的输入转发到LSTM层感到困惑。任何帮助我理解上述内容的建议都将不胜感激。谢谢！！

问题解决：

When you are passing the code from top to bottom then the inputs are flowing in the graph from top to bottom, you are getting this error because you can't call this function on eager mode, as Tensorflow 2.0 is fully transferred to eager mode, so, once you will fit the function and train it 1 epoch then you can use model.get_output_at(0) otherwise use mode.layers[-1].output.

当你在从上到下传递代码时，输入在图中也是从上到下流动的。你遇到这个错误是因为你不能在急切执行模式下调用这个函数，因为TensorFlow 2.0已经完全转到了急切执行模式。所以，一旦你拟合了函数并训练了一个周期（epoch），你就可以使用model.get_output_at(0)，否则你可以使用model.layers[-1].output。

The CNN Layer will extract the features locally then LSTM will sequentially extract and learn the feature, using CONV with LSTM is a good approach, but I will recommend you directly using tf.keras.layers.ConvLSTM3D. Check it here https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM3D

CNN层将局部提取特征，然后LSTM将顺序地提取和学习这些特征。虽然将CONV与LSTM结合使用是一个不错的方法，但我建议你直接使用tf.keras.layers.ConvLSTM3D。你可以在这里查看它的详细信息：https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM3D

tf.keras.backend.clear_session()
model = Sequential()

# 1st layer group
model.add(Conv3D(32, (3, 3, 3), strides = 1, input_shape=(22, 100, 100, 1), activation='relu', padding='valid'))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))

model.add(Conv3D(64, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))

model.add(Conv3D(128, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
shape = model.layers[-1].output_shape
model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))

# LSTMS - Recurrent Network Layer
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(.5))

model.add((Flatten()))

# # FC layers group
model.add(Dense(2048, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))

model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.summary()