TACo:一种关于文字识别的数据增强技术

最新推荐文章于 2024-06-29 08:04:13 发布

陈壮实的搬砖生活

最新推荐文章于 2024-06-29 08:04:13 发布

阅读量580

点赞数 1

分类专栏： # OCR 文章标签：计算机视觉人工智能 python 数据增强 OCR

本文链接：https://blog.csdn.net/qq_41915623/article/details/125455631

版权

OCR 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

文章目录

1. 介绍

TACo是一种数据增强技术，通过横向或纵向污损来对原图进行污损，以提高模型的普适性。污损类型有[randon, black, white, mean]四种形式，污损方向有[vertical, horizontal]

源代码地址：https://github.com/kartikgill/taco-box

2. 示意图

（1）原图：
在这里插入图片描述
（2）污损后的图片

3. 污损步骤（以vertical、randon为例）

Step1: 先判断输入图像是否是二维的灰度图，因为只针对2维灰度图进行污损；

        if len(image.shape) < 2 or len(image.shape) > 3:    # 确保是2维的灰度输入图像
            raise Exception("Input image with Invalid Shape!")

        if len(image.shape) == 3:
            raise Exception("Only Gray Scale Images are supported!")

Step2: 然后再在预设的单片最小污损宽度和最大污损宽度之间随机选取一个数，最为污损宽度；

       if orientation =='vertical':
            tiles = []
            start = 0
            tile_width = random.randint(min_tw, max_tw)

Step3: 再根据确定的污损宽度对原图进行切片，并根据预设的污损概率判断是否污损该切片；

          while start < (img_w - 1):
                tile = image[:, start:start+min(img_w-start-1, tile_width)]
                if random.random() <= self.corruption_probability_vertical:     # 如果随机数 < 预设的概率值，则进行污损
                    tile = self._corrupted_tile(tile, corruption_type)
                tiles.append(tile)
                start = start + tile_width

Step4: 拼接各切片并返回该合成图片（即增强后的图片）

       augmented_image = np.hstack(tiles)

4. 源码

import matplotlib.pyplot as plt
import random
import numpy as np


class Taco:
    def __init__(self,
                cp_vertical=0.25,
                cp_horizontal=0.25,
                max_tw_vertical=100,
                min_tw_vertical=20,
                max_tw_horizontal=50,
                min_tw_horizontal=10
                ):
        """
        -: Creating Taco object and setting up parameters:-

        -------Arguments--------
        :cp_vertical:        corruption probability of vertical tiles       垂直切片的无损概率
        :cp_horizontal:      corruption probability for horizontal tiles    水平切片的无损概率
        :max_tw_vertical:    maximum possible tile width for vertical tiles in pixels   垂直平铺的最大可能平铺宽度（像素）
        :min_tw_vertical:    minimum tile width for vertical tiles in pixels            垂直平铺的最小平铺宽度（像素）
        :max_tw_horizontal:  maximum possible tile width for horizontal tiles in pixels 水平平铺的最大可能平铺宽度（像素）
        :min_tw_horizontal:  minimum tile width for horizontal tiles in pixels          水平平铺的最小平铺宽度（像素）

        """
        self.corruption_probability_vertical = cp_vertical
        self.corruption_probability_horizontal = cp_horizontal
        self.max_tile_width_vertical = max_tw_vertical
        self.min_tile_width_vertical = min_tw_vertical
        self.max_tile_width_horizontal = max_tw_horizontal
        self.min_tile_width_horizontal = min_tw_horizontal

    def apply_vertical_taco(self, image, corruption_type='random'):
        """
        Only applies taco augmentations in vertical direction.
        Default corruption type is 'random', other supported types are [black, white, mean].

        -------Arguments-------
        :image:            A gray scaled input image that needs to be augmented. 需要增强的 灰度 输入图像。
        :corruption_type:  Type of corruption needs to be applied [one of- black, white, random or mean]

        -------Returns--------
        A TACO augmented image. 返回增强图像

        """
        if len(image.shape) < 2 or len(image.shape) > 3:    # 确保是2维的灰度输入图像
            raise Exception("Input image with Invalid Shape!")

        if len(image.shape) == 3:
            raise Exception("Only Gray Scale Images are supported!")

        img_h, img_w = image.shape[0], image.shape[1]

        image = self._do_taco(image, img_h, img_w,
                                        self.min_tile_width_vertical,
                                        self.max_tile_width_vertical,
                                        orientation='vertical',
                                        corruption_type=corruption_type)

        return image

    def apply_horizontal_taco(self, image, corruption_type='random'):
        """
        Only applies taco augmentations in horizontal direction.
        Default corruption type is 'random', other supported types are [black, white, mean].

        -------Arguments-------
        :image:            A gray scaled input image that needs to be augmented.
        :corruption_type:  Type of corruption needs to be applied [one of- black, white, random or mean]

        -------Returns--------
        A TACO augmented image.

        """
        if len(image.shape) < 2 or len(image.shape) > 3:
            raise Exception("Input image with Invalid Shape!")

        if len(image.shape) == 3:
            raise Exception("Only Gray Scale Images are supported!")

        img_h, img_w = image.shape[0], image.shape[1]

        image = self._do_taco(image, img_h, img_w,
                                        self.min_tile_width_horizontal,
                                        self.max_tile_width_horizontal,
                                        orientation='horizontal',
                                        corruption_type=corruption_type)

        return image

    def apply_taco(self, image, corruption_type='random'):
        """
        Applies taco augmentations in both directions (vertical and horizontal).
        Default corruption type is 'random', other supported types are [black, white, mean].

        -------Arguments-------
        :image:            A gray scaled input image that needs to be augmented.
        :corruption_type:  Type of corruption needs to be applied [one of- black, white, random or mean]

        -------Returns--------
        A TACO augmented image.

        """
        image = self.apply_vertical_taco(image, corruption_type)
        image = self.apply_horizontal_taco(image, corruption_type)

        return image

    def visualize(self, image, title='example_image'):
        """
        A function to display images with given title.
        """
        plt.figure(figsize=(5, 2))
        plt.imshow(image, cmap='gray')
        plt.title(title)
        plt.tight_layout()
        plt.show()

    def _do_taco(self, image, img_h, img_w, min_tw, max_tw, orientation, corruption_type):
        """
        apply taco algorithm on image and return augmented image.
        """
        if orientation =='vertical':
            tiles = []
            start = 0
            tile_width = random.randint(min_tw, max_tw)
            while start < (img_w - 1):
                tile = image[:, start:start+min(img_w-start-1, tile_width)]
                if random.random() <= self.corruption_probability_vertical:     # 如果随机数 < 预设的概率值，则进行污损
                    tile = self._corrupted_tile(tile, corruption_type)
                tiles.append(tile)
                start = start + tile_width
            augmented_image = np.hstack(tiles)
        else:
            tiles = []
            start = 0
            tile_width = random.randint(min_tw, max_tw)
            while start < (img_h - 1):
                tile = image[start:start+min(img_h-start-1,tile_width), :]
                if random.random() <= self.corruption_probability_vertical:
                    tile = self._corrupted_tile(tile, corruption_type)
                tiles.append(tile)
                start = start + tile_width
            augmented_image = np.vstack(tiles)
        return augmented_image

    def _corrupted_tile(self, tile, corruption_type):
        """
        Return a corrupted tile with given shape and corruption type.
        """
        tile_shape = tile.shape
        if corruption_type == 'random':
            corrupted_tile = np.random.random(tile_shape)*255
        if corruption_type == 'white':
            corrupted_tile = np.ones(tile_shape)*255
        if corruption_type == 'black':
            corrupted_tile = np.zeros(tile_shape)
        if corruption_type == 'mean':
            corrupted_tile = np.ones(tile_shape)*np.mean(tile)
        return corrupted_tile

陈壮实的搬砖生活

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
TACo:一种关于文字识别的数据增强技术

TACo是一种数据增强技术，通过横向或纵向污损来对原图进行污损，以提高模型的普适性。污损类型有[randon, black, white, mean]四种形式，污损方向有[vertical, horizontal]源代码地址：https://github.com/kartikgill/taco-box（1）原图：（2）污损后的图片Step1: 先判断输入图像是否是二维的灰度图，因为只针对2维灰度图进行污损；Step2: 然后再在预设的单片最小污损宽度和最大污损宽度之间随机选取一个数，最为污损宽度；
复制链接

扫一扫

专栏目录