图像增强之字符识别

最新推荐文章于 2024-07-31 11:32:55 发布

猫猫与橙子

最新推荐文章于 2024-07-31 11:32:55 发布

阅读量2.6k

点赞数 1

分类专栏： ocr 文章标签：数据增强文本识别

本文链接：https://blog.csdn.net/qq_22764813/article/details/107663601

版权

ocr 专栏收录该内容

37 篇文章 7 订阅

订阅专栏

字符识别常用图像增强：

7.中值模糊，均值模糊，运动模糊（适用于运动目标的文字检测）

1.padding+crop

作用：图像做padding，再随机crop，可以减少检测模型在检测过程中产生的检测结果不稳定，文字目标在整图位置中的偏移带来的影响；

2.图像亮度对比度变化

作用：亮度是一种对比出的效果，它受光线影响；对比度可以简单的解释为图像矩阵中像素的最大值和最小值之差。通过改变图片的对比度和亮度，可以减少识别效果受光线的影响；

coding：

    def contrast_brightness(self, im: np.ndarray, c, b) -> tuple:#c是对比度，b是亮度， b值越大越亮
        """
        对彩色图片进行颜色变换
        :param im: 原始灰度图
        :return: 颜色进行变换后的彩色图片
        """
        h, w, c = im.shape
        blank = np.zeros([h, w, c], im.dtype)
        dst = cv2.addWeighted(im, c, blank, 1- c, b)
        return dst

3.图像直方图均衡化

作用：直方图均衡化是一种简单有效的图像增强技术，通过改变图像的直方图来改变图像中各像素的灰度，主要用于增强动态范围偏小的图像的对比度。原始图像由于其灰度分布可能集中在较窄的区间，造成图像不够清晰。

（该方法也可使用在彩色图片上）

coding：

    def histogram_equalization(self, im: np.ndarray):
        """
        rgb直方图均衡化
        :param im: 输入彩色图片
        :return: 返回均衡化的图片
        """
        # 彩色图像均衡化,需要分解通道 对每一个通道均衡化
        (b, g, r) = cv2.split(im)
        bH = cv2.equalizeHist(b)
        gH = cv2.equalizeHist(g)
        rH = cv2.equalizeHist(r)
        # 合并每一个通道
        result = cv2.merge((bH, gH, rH))
        return result

4.增加高斯噪声

作用：增加被噪声污染的文本多样性

coding：

    def add_noise(self, im: np.ndarray):
        """
        对图片加噪声
        :param img: 图像array
        :return: 加噪声后的图像array,由于输出的像素是在[0,1]之间,所以得乘以255
        """
        return (random_noise(im, mode='gaussian', clip=True) * 255).astype(im.dtype)

5.图像旋转（或仿射变换）

作用：增加多角度的文本识别

coding：

    def random_rotate_img(self, img, degrees: numbers.Number or list or tuple or np.ndarray,
                               same_size=False):
        """
        从给定的角度中选择一个角度，对图片和文本框进行旋转
        :param img: 图片
        :param degrees: 角度，可以是一个数值或者list
        :param same_size: 是否保持和原图一样大
        :return: 旋转后的图片和角度
        """
        if isinstance(degrees, numbers.Number):
            if degrees < 0:
                raise ValueError("If degrees is a single number, it must be positive.")
            degrees = (-degrees, degrees)
        elif isinstance(degrees, list) or isinstance(degrees, tuple) or isinstance(degrees, np.ndarray):
            if len(degrees) != 2:
                raise ValueError("If degrees is a sequence, it must be of len 2.")
            degrees = degrees
        else:
            raise Exception('degrees must in Number or list or tuple or np.ndarray')
        # ---------------------- 旋转图像 ----------------------
        w = img.shape[1]
        h = img.shape[0]
        angle = np.random.uniform(degrees[0], degrees[1])

        if same_size:
            nw = w
            nh = h
        else:
            # 角度变弧度
            rangle = np.deg2rad(angle)
            # 计算旋转之后图像的w, h
            nw = (abs(np.sin(rangle) * h) + abs(np.cos(rangle) * w))
            nh = (abs(np.cos(rangle) * h) + abs(np.sin(rangle) * w))
        # 构造仿射矩阵
        rot_mat = cv2.getRotationMatrix2D((nw * 0.5, nh * 0.5), angle, 1)
        # 计算原图中心点到新图中心点的偏移量
        rot_move = np.dot(rot_mat, np.array([(nw - w) * 0.5, (nh - h) * 0.5, 0]))
        # 更新仿射矩阵
        rot_mat[0, 2] += rot_move[0]
        rot_mat[1, 2] += rot_move[1]
        # 仿射变换
        rot_img = cv2.warpAffine(img, rot_mat, (int(math.ceil(nw)), int(math.ceil(nh))), flags=cv2.INTER_LANCZOS4)
        return rot_img

6.图像灰度化

作用：用于黑白颜色的文本识别

coding：

    def to_grayImg(self, im: np.ndarray) -> tuple:
        """
        对图片进行转灰度
        :param im:图片
        :return:
        """
        h, w, c = im.shape
        gray_im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
        new_img = np.ones([h, w, c], np.uint8)
        new_img[:, :, 0] = gray_im
        new_img[:, :, 1] = gray_im
        new_img[:, :, 2] = gray_im
        return new_img

7.中值模糊，均值模糊，运动模糊（适用于运动目标的文字检测）

（这种增强方式没有使用，由于训练样本本来就比较模糊，就没再使用模糊这块的增强）

coding：

    def blur_demo(self, image):
        """
         均值模糊  去随机噪声有很好的去燥效果
        :param image:
        :return:
        """
        dst = cv2.blur(image, (1, 15))  # （1, 15）是垂直方向模糊，（15， 1）还水平方向模糊
        return dst


    def median_blur_demo(self, image):
        """
        中值模糊  对椒盐噪声有很好的去燥效果
        :param image:
        :return:
        """
        dst = cv2.medianBlur(image, 5)
        return dst


    def custom_blur_demo(self, image):
        """
        用户自定义模糊
        :param image:
        :return:
        """
        kernel = np.ones([5, 5], np.float32) / 25  # 除以25是防止数值溢出
        dst = cv2.filter2D(image, -1, kernel)
        return dst

8.图像的mix_up操作

作用：由于文本区域的背景差距，相同的字符可能受背景的影响，测试结果产生差距，所以选用不同的背景图，和文字区域进行mix_up,丰富样本数据，相同的文字，但是背景却不同

coding：（我选取额背景图片尺寸大于文字图片尺寸）

 def MixupDetection(self, img1, background, h_back, w_back, lambd=0.5):
        """
        进行mixup处理
        :param img1: 文字图片
        :param background: 背景图片
        :param h_back: 背景图片的高
        :param w_back: 背景图片的宽
        :param lambd: mix_up的比例参数
        :return: mix_up后的图片
        """
        # img2 = background
        # mixup two images
        h_txt, w_txt, c_txt = img1.shape

        h_len = h_back - h_txt
        w_len = w_back - w_txt

        if h_len > 2:
            h_clip = random.randint(0, h_len - 1)
        else:
            h_clip = 0
        if w_len > 2:
            w_clip = random.randint(0, w_len - 1)
        else:
            w_clip = 0


        back_img = background[h_clip:h_clip + h_txt, w_clip:w_clip + w_txt, :]
        mix_img = np.zeros([h_txt, w_txt, 3], np.float32)
        mix_img[:, :, :] = back_img.astype(np.float32) * lambd
        mix_img[:, :, :] += img1.astype(np.float32) * (1. - lambd)
        mix_img = mix_img.astype(np.uint8)

        return mix_img

9.图像透视处理（perspective）（代码没写）

10.颜色抖动（jitter）

def jitter(img):
    """
    jitter
    """
    w, h, _ = img.shape
    if h > 10 and w > 10:
        thres = min(w, h)
        s = int(random.random() * thres * 0.01)
        src_img = img.copy()
        for i in range(s):
            img[i:, i:, :] = src_img[:w - i, :h - i, :]
        return img
    else:
        return img

11.颜色颠倒

tp = random.randint(1, 100)
        if tp >= 50:
            new_img = 255 - new_img

12.变换颜色

def cvtColor(img):
    """
    cvtColor
    """
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    delta = 0.001 * random.random() * flag()
    hsv[:, :, 2] = hsv[:, :, 2] * (1 + delta)
    new_img = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
    return new_img

13.添加噪声

def add_gasuss_noise(image, mean=0, var=0.1):
    """
    Gasuss noise
    """

    noise = np.random.normal(mean, var**0.5, image.shape)
    out = image + 0.5 * noise
    out = np.clip(out, 0, 255)
    out = np.uint8(out)
    return out

其他参考链接：

一些中文OCR训练预测技巧