LSTM理解(附Keras代码实例)

背景:最近在用CRNN做OCR,CRNN的R就是LSTM,从开始深度学习的多年来一直在回避的东西这次躲不过去了,就试着去完整理解一下。

1、我之前的误解。当输入LSTM的Tensor shape为[None, c, n]时,我之前认为会有

c个cell,每一个cell处理一n,这是错的。

      纠正:“时间步”的概念是虚构的,看着图1,仿佛第一个cell,第二个cell,第三个cell都等价并存的,事实上,这是同一个在不同时间的表现。可以这么理解,一个cell是一扇门,过去这扇门的人是数据,门就这么一个,人可以有很多。

图1-经典LSTM示意图

2、 辅证以上说法:LSTM参数计算

输入最后一维是64,输出最后一维是256

计算方法:

4 * (256 * (64 + 256) + 256) = 328704

4:四个门

256、64:输出输入

(64 + 256):将上一个状态的输出于当前时刻输入合并

最后一个256:bias

3、小问题:t0时刻的上一个时刻的输出是什么?

默认全0 

4、一个CRNN Keras模型代码:

from keras import backend as K
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Input, Dense, Activation
from keras.layers import Reshape, Lambda, BatchNormalization
from keras.layers.merge import add, concatenate
from keras.models import Model
from keras.layers.recurrent import LSTM
# from parameter import *
K.set_learning_phase(0)

# # Loss and train functions, network architecture
def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args
    # the 2 is critical here since the first couple outputs of the RNN
    # tend to be garbage:
    y_pred = y_pred[:, 2:, :]
    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
max_text_len = 7000
num_classes = 7000
# img_w, img_h = 
def get_Model(training):
    input_shape = (None, None, 1)     # (128, 64, 1)

    # Make Networkw
    inputs = Input(name='the_input', shape=input_shape, dtype='float32')  # (None, 128, 64, 1)

    # Convolution layer (VGG)
    inner = Conv2D(64, (3, 3), padding='same', name='conv1', kernel_initializer='he_normal')(inputs)  # (None, 128, 64, 64)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = MaxPooling2D(pool_size=(2, 2), name='max1')(inner)  # (None,64, 32, 64)

    inner = Conv2D(128, (3, 3), padding='same', name='conv2', kernel_initializer='he_normal')(inner)  # (None, 64, 32, 128)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = MaxPooling2D(pool_size=(2, 2), name='max2')(inner)  # (None, 32, 16, 128)

    inner = Conv2D(256, (3, 3), padding='same', name='conv3', kernel_initializer='he_normal')(inner)  # (None, 32, 16, 256)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = Conv2D(256, (3, 3), padding='same', name='conv4', kernel_initializer='he_normal')(inner)  # (None, 32, 16, 256)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = MaxPooling2D(pool_size=(1, 2), name='max3')(inner)  # (None, 32, 8, 256)

    inner = Conv2D(512, (3, 3), padding='same', name='conv5', kernel_initializer='he_normal')(inner)  # (None, 32, 8, 512)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = Conv2D(512, (3, 3), padding='same', name='conv6')(inner)  # (None, 32, 8, 512)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)
    inner = MaxPooling2D(pool_size=(1, 2), name='max4')(inner)  # (None, 32, 4, 512)

    inner = Conv2D(512, (2, 2), padding='same', kernel_initializer='he_normal', name='con7')(inner)  # (None, 32, 4, 512)
    inner = BatchNormalization()(inner)
    inner = Activation('relu')(inner)

    # CNN to RNN
    inner = Reshape(target_shape=((32, 2048)), name='reshape')(inner)  # (None, 32, 2048)
    inner = Dense(64, activation='relu', kernel_initializer='he_normal', name='dense1')(inner)  # (None, 32, 64)

    # RNN layer
    lstm_1 = LSTM(256, return_sequences=True, kernel_initializer='he_normal', name='lstm1')(inner)  # (None, 32, 512)
    lstm_1b = LSTM(256, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='lstm1_b')(inner)
    reversed_lstm_1b = Lambda(lambda inputTensor: K.reverse(inputTensor, axes=1)) (lstm_1b)

    lstm1_merged = add([lstm_1, reversed_lstm_1b])  # (None, 32, 512)
    lstm1_merged = BatchNormalization()(lstm1_merged)
    
    lstm_2 = LSTM(256, return_sequences=True, kernel_initializer='he_normal', name='lstm2')(lstm1_merged)
    lstm_2b = LSTM(256, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='lstm2_b')(lstm1_merged)
    reversed_lstm_2b= Lambda(lambda inputTensor: K.reverse(inputTensor, axes=1)) (lstm_2b)

    lstm2_merged = concatenate([lstm_2, reversed_lstm_2b])  # (None, 32, 1024)
    lstm2_merged = BatchNormalization()(lstm2_merged)

    # transforms RNN output to character activations:
    inner = Dense(num_classes, kernel_initializer='he_normal',name='dense2')(lstm2_merged) #(None, 32, 63)
    y_pred = Activation('softmax', name='softmax')(inner)

    labels = Input(name='the_labels', shape=[max_text_len], dtype='float32') # (None ,8)
    input_length = Input(name='input_length', shape=[1], dtype='int64')     # (None, 1)
    label_length = Input(name='label_length', shape=[1], dtype='int64')     # (None, 1)

    # Keras doesn't currently support loss funcs with extra parameters
    # so CTC loss is implemented in a lambda layer
    loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length]) #(None, 1)

    if training:
        return Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
    else:
        return Model(inputs=[inputs], outputs=y_pred)

model = get_Model(1)
model.summary()

环境:

tf=2.3
keras=2.4.3

Keras是一个开源的深度学习库,可以非常方便地构建和训练神经网络模型。CNN-LSTM模型是一种将卷积神经网络(CNN)和长短期记忆网络(LSTM)结合起来的模型,适用于处理时间序列数据中的空间特征。 下面是一个使用Keras构建CNN-LSTM模型的示例代码: ```python from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, LSTM, Dense # 构建模型 model = Sequential() # 添加卷积层和池化层 model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(height, width, channels))) model.add(MaxPooling2D(pool_size=(2, 2))) # 添加LSTM层 model.add(LSTM(64)) # 添加全连接层 model.add(Dense(num_classes, activation='softmax')) # 编译模型 model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # 训练模型 model.fit(X_train, y_train, batch_size=64, epochs=10) # 在测试集上评估模型 score = model.evaluate(X_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) ``` 在这个例子中,我们首先导入所需的库。然后,我们使用Sequential模型创建一个空的神经网络模型。 接下来,我们添加一个卷积层和池化层来提取图像中的空间特征。这里的输入数据是二维图像,因此我们使用Conv2D层进行卷积操作,然后使用MaxPooling2D层进行池化操作。 然后,我们添加一个LSTM层来处理时间序列信息。这里的输入数据是经过卷积和池化处理后的特征图,因此我们使用LSTM层来捕捉时间序列特征。 最后,我们添加一个全连接层,将LSTM层的输出连接到输出层,并使用softmax激活函数进行分类。 在模型构建完成后,我们使用compile函数编译模型,指定优化器、损失函数和评估指标。然后,我们使用fit函数对模型进行训练,并使用evaluate函数在测试集上评估模型的性能。 这就是一个使用Keras构建CNN-LSTM模型的示例,通过以上步骤可以方便地构建和训练自己的CNN-LSTM模型。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值