python+opencv 实现文字分割（横板-小票文字分割/竖版-古文文字分割）

最新推荐文章于 2024-03-30 11:31:49 发布

告白少年

最新推荐文章于 2024-03-30 11:31:49 发布

阅读量7.3k

点赞数 27

分类专栏：图像增强文章标签： python 计算机视觉 opencv

本文链接：https://blog.csdn.net/qq_43555843/article/details/117412056

版权

图片文字分割的时候，常用的方法有两种。一种是投影法，适用于排版工整，字间距行间距比较宽裕的图像；还有一种是用OpenCV的轮廓检测，适用于文字不规则排列的图像。

投影法

对文字图片作横向和纵向投影，即通过统计出每一行像素个数，和每一列像素个数，来分割文字。
分别在水平和垂直方向对预处理（二值化）的图像某一种像素进行统计，对于二值化图像非黑即白，我们通过对其中的白点或者黑点进行统计，根据统计结果就可以判断出每一行的上下边界以及每一列的左右边界，从而实现分割的目的。

算法步骤：

使用水平投影和垂直投影的方式进行图像分割，根据投影的区域大小尺寸分割每行和每块的区域，对原始图像进行二值化处理。
投影之前进行图像灰度学调整做膨胀操作
分别进行水平投影和垂直投影
根据投影的长度和高度求取完整行和块信息

横板文字-小票文字分割

#小票水平分割
import cv2
import numpy as np

img = cv2.imread(r"C:\Users\An\Pictures\1.jpg")
cv2.imshow("Orig Image", img)
# 输出图像尺寸和通道信息
sp = img.shape
print("图像信息：", sp)
sz1 = sp[0]  # height(rows) of image
sz2 = sp[1]  # width(columns) of image
sz3 = sp[2]  # the pixels value is made up of three primary colors
print('width: %d \n height: %d \n number: %d' % (sz2, sz1, sz3))
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
retval, threshold_img = cv2.threshold(gray_img, 120, 255, cv2.THRESH_BINARY_INV)
cv2.imshow("threshold_img", threshold_img)

# 水平投影分割图像
gray_value_x = []
for i in range(sz1):
    white_value = 0
    for j in range(sz2):
        if threshold_img[i, j] == 255:
            white_value += 1
    gray_value_x.append(white_value)
print("", gray_value_x)
# 创建图像显示水平投影分割图像结果
hori_projection_img = np.zeros((sp[0], sp[1], 1), np.uint8)
for i in range(sz1):
    for j in range(gray_value_x[i]):
        hori_projection_img[i, j] = 255
cv2.imshow("hori_projection_img", hori_projection_img)
text_rect = []
# 根据水平投影分割识别行
inline_x = 0
start_x = 0
text_rect_x = []
for i in range(len(gray_value_x)):
    if inline_x == 0 and gray_value_x[i] > 10:
        inline_x = 1
        start_x = i
    elif inline_x == 1 and gray_value_x[i] < 10 and (i - start_x) > 5:
        inline_x = 0
        if i - start_x > 10:
            rect = [start_x - 1, i + 1]
            text_rect_x.append(rect)
print("分行区域，每行数据起始位置Y：", text_rect_x)
# 每行数据分段
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 3))
dilate_img = cv2.dilate(threshold_img