2019/03/31创新实训------街景翻译APP

最新推荐文章于 2019-09-28 18:49:14 发布

面向窗外

最新推荐文章于 2019-09-28 18:49:14 发布

阅读量134

点赞数

分类专栏：创新实训文章标签： OCR，深度学习， DTRN，透视变换

本文链接：https://blog.csdn.net/m0_37663286/article/details/88936765

版权

创新实训专栏收录该内容

4 篇文章 0 订阅

订阅专栏

创新实训进度记录

团队进展
个人进展
- 透视变换（perspective transformation）
- DTRN

团队进展

前两周工作开展顺利，团队成员平稳有序推进进度，工作时间稳定在15小时每周水平。
XCY: https://blog.csdn.net/u013575592/article/details/88936340
XYC: https://blog.csdn.net/weixin_44633882/article/details/88938392

细化分工

项目初始阶段我们确立了第一阶段的项目分工，YC同学和CY同学负责复现CTPN并在数据集上训练以得到一个初步的文本检测模型，JY同学负责复现DTRN并在数据集上训练得到初步的文本识别模型。

文本检测进度：CTPN

研究 Faster-RCNN 和 CTPN 对应的论文，针对其中的细节进行了探讨，并且利用 TensorFlow 完成了模型中的训练部分的框架搭建工作

文本识别进度：DTRN 图像透视变换

研究 DTRN 对应论文, 利用 TensorFlow 完成了模型中的训练部分的框架搭建工作，但其中的 loss 函数还没有编写好。根据我们的数据集，对图片做了一次透视变换处理。

个人进展

透视变换（perspective transformation）

在进行复现 DTRN 的工作中，我发现论文中的数据集都是以标准矩形形式标出，文本以较为规整的形式存在于矩形框中，而我们的数据集中则以四边形形式标出，如果用单纯矩形代替则会产生框大（纳入不需要的背景信息）或框小（文本图像未被纳入）的问题，因此需要将不规则的四边形映射到规则的矩形中，做一次透视变换即可达到需求，效果如下图。

Alt

def regular_rectangle(image, orig, width, height):

    """
    :param image: just the image
    :param orig: orig is a list which contains 4 points's coordinates in the dataset
    :param width: the new image's width
    :param height: the new image's height
    :return: the image which has been converted to a rectangel not a polygon
    """

    img = image

    pts1 = np.float32([[orig[0][0], orig[1][0], orig[3][0], orig[2][0]]])
    pts2 = np.float32([[0, 0], [width, 0], [0, height], [width, height]])

    M = cv2.getPerspectiveTransform(pts1, pts2)
    dst = cv2.warpPerspective(img, M, (width, height))
    return dst

DTRN

DTRN 是一个END-END模型，输入图片即产生识别文字结果，主要架构是 maxoutCNN + BiLSTM+ CTC , 训练步骤如下：

resize 图片，将输入的图片变成高度为 32 的图片
在新的图片上产生 32 × 32 的滑窗
将滑窗依次输入maxoutCNN， maxoutCNN结构如下，maxout是一种新的激活函数，选取多个分组中值最大的分组，可以有效提高模型的拟合能力

maxoutCNN架构图

每个滑窗产生一个经过maxoutCNN的特征图, 对于一张图片上的滑窗，它们产生了一个sequence feature map，将这个序列输入BiLSTM，产生输出。
在计算loss的过程中引入ctc，暂未实现，待完善。

import tensorflow as tf
import utils.generate_sliding_windows as sliding_windows
maxout = tf.contrib.layers.maxout
slim = tf.contrib.slim
DEBUG = False


def train(dataset, char_table, char_table_size):
    with tf.variable_scope('input'):
        input_images = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])
        labels = tf.placeholder(tf.int32, shape=[None, char_table_size])

        # Record a random 10 samples of images in monitoring
        tf.summary.image('images', input_images, 10)


    for index in range(0, len(dataset)):
        image = dataset[index][0][0]
        words = dataset[index][1]

        window_images = sliding_windows.generate(image)


        for wi in window_images:
            print("window_image.shape\t{}".format(wi.shape))
            wi = tf.expand_dims(wi, 0)
            net = maxoutCNN(input_images, debug=DEBUG)
            output = BiLSTM(net, 128, debug=DEBUG)
            print("SINGLE_WINDOW LSTM OUTPUT:\t{}".format(output.shape))
            # tf.nn.ctc_loss()


def maxoutCNN(single_window_image, debug=False):

    conv_1 = slim.conv2d(single_window_image, 96, 9, scope='conv_1',    padding='valid', reuse=tf.AUTO_REUSE)

    maxout_1 = maxout(conv_1, 48,  scope="maxout_1")

    conv_2 = slim.conv2d(maxout_1,            128, 9, scope='conv_2',   padding='valid', reuse=tf.AUTO_REUSE)

    maxout_2 = maxout(conv_2, 64,  scope='maxout_2')

    conv_3 = slim.conv2d(maxout_2,            256,  9, scope='conv_3',  padding='valid', reuse=tf.AUTO_REUSE)

    maxout_3 = maxout(conv_3, 128, scope='maxout_3')

    conv_4 = slim.conv2d(maxout_3,            512,  8, scope='conv_4',  padding='valid', reuse=tf.AUTO_REUSE)

    maxout_4 = maxout(conv_4, 128, scope='maxout_4')

    conv_5 = slim.conv2d(maxout_4,            144,  1, scope='conv_5',  padding='valid', reuse=tf.AUTO_REUSE)

    maxout_5 = maxout(conv_5, 36, scope='maxout_5')

    softmax = slim.softmax(logits=maxout_5)


    if debug:
        print('Input shape:', single_window_image.shape)
        print('After conv_1:', conv_1.shape)
        print('After maxout_1:', maxout_1.shape)
        print('After conv_2:', conv_2.shape)
        print('After maxout_2:', maxout_2.shape)
        print('After conv_3:', conv_3.shape)
        print('After maxout_3:', maxout_3.shape)
        print('After conv_4:', conv_4.shape)
        print('After maxout_4:', maxout_4.shape)
        print('After conv_5:', conv_5.shape)
        print('After maxout_5:', maxout_5.shape)
        print('After softmax:', softmax.shape)
    return softmax


def BiLSTM(sequence, hidden_unit_num, debug=False):

    if debug:
        print("sequence:\t{}".format(sequence))


    sequence = tf.convert_to_tensor(sequence)
    B, W, H, C = sequence.shape

    sequence = tf.reshape(sequence, [W, H, C])

    lstm_fw_cell = tf.keras.layers.LSTMCell(hidden_unit_num)
    lstm_bw_cell = tf.keras.layers.LSTMCell(hidden_unit_num)

    lstm_out, last_state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell, lstm_bw_cell, sequence, dtype=tf.float32)

    lstm_out = tf.concat(lstm_out, axis=-1)

    return lstm_out

面向窗外

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
2019/03/31创新实训------街景翻译APP

创新实训进度记录团队进展细化分工文本检测进度：CTPN文本识别进度：DTRN 图像透视变换个人进展透视变换（perspective transformation）DTRN团队进展前两周工作开展顺利，团队成员平稳有序推进进度，工作时间稳定在15小时每周水平。细化分工项目初始阶段我们确立了第一阶段的项目分工，YC同学和CY同学负责复现CTPN并在数据集上训练以得到一个初步的文本检测模型，JY同...
复制链接

扫一扫