Python识别图像中的文字、数字等

戒酒的李白-Lisage

已于 2022-05-05 22:53:03 修改

阅读量1w

点赞数 3

分类专栏： Python技术文章标签： python 开发语言后端

于 2022-01-18 15:00:52 首次发布

本文链接：https://blog.csdn.net/Ljj9889/article/details/122542498

版权

Python技术专栏收录该内容

27 篇文章 3 订阅

订阅专栏

今天给大家分享一个生活中常见的案例，那就是识别图片、PDF文档中的数据，主要是以文字和数字为例。

需要用到两个第三方库：pyocr、cnocr

这里有个大坑：在安装这两个第三方库的时候走了很漫长的路。不是缺shapley包，就是缺python-xxx，找了很多方法，最后终于解决。

注意：安装时最好使用镜像去安装，因为安装的时候太慢会导致超时(这也是个大坑)

pip install pyocr - i https://pypi.tuna.tsinghua.edu.cn/simple

pip install cnocr - i https://pypi.tuna.tsinghua.edu.cn/simple

安装好了之后输入一下代码：

import cv2
import pyocr
from PIL import Image as PI
import io
import os
from cnocr import CnOcr
import warnings
warnings.filterwarnings("ignore")

# 识别中文或数字
ocr = CnOcr()

image_src = 'images'
img_list = []


for name in os.walk(image_src):
    img_list = name[2]
    print(img_list)

for i in img_list:
    images_url = "images/"+ i
    img = cv2.imread(images_url)
    with open(images_url,'rb') as fp:
        a = fp.read()
    new_img=PI.open(io.BytesIO(a))

    left = 600
    right = 716
    top = 53
    buttom = 77

    img_x = new_img.crop((left,top,right,buttom))
    img_x.save("img.png")
    img_text = ocr.ocr("img.png")
    new_no = ''
    for j in img_text:
        for m in j[0]:
            new_no += m
    print(f'发票号为：{new_no}')

简单解释：

一、初始化一个识别器实例。

二、第一个for循环是遍历指定路径下的所有文件，获取所有文件的文件名，并存放在列表中。

三、第二个for循环时遍历这个列表，得到单个文件名，并对单个文件进行读操作。cv2专门时python中对图像操作的一个库。

四、通过with open 读取文件的内容，这里大家可以打印出来看下，我觉得他会是个三维数组。用PIL第三方库中的Image方法，对其进行处理，通过坐标获取我们的上下左右，形成一个矩形框。（可以理解成指定位置截图）

五、最后保存我们取到的矩形图片，用ocr模块中的ocr方法来转换我们的数据，对数据做一下简单处理，得到我们的发票号。

主要源码理解：

    def crop(self, box=None):
        """
        Returns a rectangular region from this image. The box is a
        4-tuple defining the left, upper, right, and lower pixel
        coordinate. See :ref:`coordinate-system`.

        Note: Prior to Pillow 3.4.0, this was a lazy operation.

        :param box: The crop rectangle, as a (left, upper, right, lower)-tuple.
        :rtype: :py:class:`~PIL.Image.Image`
        :returns: An :py:class:`~PIL.Image.Image` object.
        """

        if box is None:
            return self.copy()

        self.load()
        return self._new(self._crop(self.im, box))

    def ocr(
        self, img_fp: Union[str, Path, torch.Tensor, np.ndarray]
    ) -> List[Tuple[List[str], float]]:
        """
        识别函数。

        Args:
            img_fp (Union[str, Path, torch.Tensor, np.ndarray]): image file path; or color image torch.Tensor or np.ndarray,
                with shape [height, width] or [height, width, channel].
                channel should be 1 (gray image) or 3 (RGB formatted color image). scaled in [0, 255].

        Returns:
            list of (list of chars, prob), such as
            [(['第', '一', '行'], 0.80), (['第', '二', '行'], 0.75), (['第', '三', '行'], 0.9)]
        """
        img = self._prepare_img(img_fp)

        if min(img.shape[0], img.shape[1]) < 2:
            return []
        if img.mean() < 145:  # 把黑底白字的图片对调为白底黑字
            img = 255 - img
        line_imgs = line_split(np.squeeze(img, axis=-1), blank=True)
        line_img_list = [np.expand_dims(line_img, axis=-1) for line_img, _ in line_imgs]
        line_chars_list = self.ocr_for_single_lines(line_img_list)
        return line_chars_list

效果如下：