基于PaddleOCR的银行卡识别

wunianwn

已于 2023-11-07 16:12:05 修改

阅读量592

点赞数

文章标签： python paddle

于 2023-09-20 15:21:54 首次发布

本文链接：https://blog.csdn.net/wunianwn/article/details/133081529

版权

文章详细描述了如何优化银行卡识别过程，包括图片预处理（如霍夫变换矫正、yolov5检测与抠图）、尺寸调整、模型识别（PaddleOCR）以及后处理（筛选卡号和有效期）。通过调整参数和使用适当数据集，提高了识别准确性和处理不同尺寸图片的能力。

摘要由CSDN通过智能技术生成

主要功能和性能

功能：识别银行卡卡号和有效期
环境：ubuntu20，paddlepaddle-gpu 2.4.2 paddleocr 2.7
硬件：GPU: T4
性能：单张1080P图片推理250ms左右
准确性：在测试集上文本检测的hmean为93%，文本识别的hmean为95%，实际使用估计不到90%综合准确率
数据集：文本检测3000张，文本识别（2000卡号+500有效期）

整个流程

先将图片摆正，歪的图片会影响文本检测。
银行卡检测，用yolov5检测出银行卡，将干净无背景的银行卡图片单独抠出来（提升识别率，降低背景干扰）。
将图片缩放到模型指定大小，不同尺寸的图片会影响文本检测的准确率，将图片处理为训练数据类似大小，减小图片尺寸带来的影响。
将缩放过的图片放入模型识别。模型识别是两阶段，先检测，再识别，检测完将文字区域切片送入识别模型。
后处理，调整文本检测的参数，以准确检测文字区域，再经过识别模型得到文字结果，再设置规则去掉不需要的文本。

图片摆正

整个思路是用霍夫变换实现文本图片倾斜矫正，大于45°旋转的图片就不进行操作，这样输出的图片能保证是0°、90°、180°或270°的。测试图片是1080P左右，计算时间大概是30ms，如果图片是4K，那么计算时间将增加到800ms，所以可先将图片resize到1080P。

import cv2
import numpy as np
def HoughTransformcv2(img):
    ori_img = img.copy()
    img = cv2.resize(img, (400, 400)) #将图片缩小，加快计算速度
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150, apertureSize=3)
    # 霍夫变换
    lines = cv2.HoughLines(edges, 1, np.pi / 180, 0)
    rotate_angle = 0
    for rho, theta in lines[0]:
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a * rho
        y0 = b * rho
        x1 = int(x0 + 1000 * (-b))
        y1 = int(y0 + 1000 * (a))
        x2 = int(x0 - 1000 * (-b))
        y2 = int(y0 - 1000 * (a))
        if x1 == x2 or y1 == y2:
            continue
        t = float(y2 - y1) / (x2 - x1)
        rotate_angle = math.degrees(math.atan(t))
        if abs(rotate_angle) >= 45:
            rotate_angle = 0
    # 对于旋转角度小的图片，不进行旋转，这类图片模型能够识别，以此减少计算时间
    if abs(rotate_angle) < 10:
        return ori_img
    h, w = ori_img.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, rotate_angle, 1.0)
    rotate_img = cv2.warpAffine(ori_img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
    return rotate_img

可以将图片
在这里插入图片描述

矫正为

可以看到文本是水平的，但是也存在其他情况，如果歪的角度在45-90之间，图片会竖过来

银行卡检测

(如果有很多场景数据（200张以上），可以直接用场景数据标记，不需要下面的操作)
用yolov5模型进行银行的检测，使用的数据集是达摩院的合成卡证数据集，该数据用于卡证矫正模型训练，包含大的box点和四个角的点，只需要大的box点，所以用脚本转换成yolo格式，且只有一个类别

import json
import os.path
# 转成yolo格式
def label_exchange(points, width, height):
    width = float(width)
    height = float(height)
    w = (points[2] - points[0]) / width
    h = (points[3] - points[1]) / height
    x = ((points[2] - points[0]) / 2 + points[0]) / width
    y = ((points[3] - points[1]) / 2 + points[1]) / height
    if w > 1:
        w = 1
    if h > 1:
        h = 1
    if x > 1:
        x = 1
    if y > 1:
        y = 1
    return [x, y, w, h]

lines = ""
with open(r"SyntheticCards_train100k\labelv2.txt", 'r', encoding='utf-8') as reader:
    lines = reader.readlines()
reader.close()
flag = True
result_list = []
result = ""
cnt = 0
# 读取所有数据
for line in lines:
    if line.__contains__("#"):
        cnt += 1
        if cnt == 2:
            flag = False
            cnt = 1
    if flag:
        result += line
    else:
        result_list.append(result)
        result = ""
        flag = True
        result += line
label_save_path = r"labels"
for item in result_list:
    try:
        data_list = item.split("\n")
        _, file_name, width, height = data_list[0].split(" ")
        label_file_path = os.path.join(label_save_path, file_name.replace("data/", "").replace("jpg", "txt"))
        with open(label_file_path, 'w', encoding='utf-8') as writer:
            for data in data_list[1:]:
                if data == "":
                    continue
                points = data.split(" ")
                boxes = points[:4]
                temp = []
                for item in boxes:
                    temp.append(float(item))
                boxes = temp
                labels = label_exchange(boxes, width, height)
                writer.write(
                    "0" + " " + str(labels[0]) + " " + str(labels[1]) + " " + str(labels[2]) + " " + str(labels[3]) + "\n")
        writer.close()
    except Exception as e:
        print(file_name)
        print(e)

用达摩院的数据集训练完后再加100张银行卡训练,结果如下在这里插入图片描述
之后再根据坐标信息切片
如果横纵比小于1, 那么银行卡就是竖的, 可对图片旋转90度, 变成0或180度的图片, OCR识别加上方向分类器可以处理这两种类型的图片, 旋转代码如下

import cv2
def rotate_image_with_aspect_ratio_less_than_one(input_path, output_path):
    try:
        # 读取图像
        image = cv2.imread(input_path)

        # 获取图像的宽度和高度
        height, width = image.shape[:2]

        # 检查横纵比是否小于1
        if width < height:
            # 如果横纵比小于1，旋转图像90度
            rotated_image = cv2.transpose(image)
            rotated_image = cv2.flip(rotated_image, flipCode=1)
            cv2.imwrite(output_path, rotated_image)
            print(f"已旋转图像：{input_path} -> {output_path}")
        else:
            # 如果横纵比不小于1，不进行旋转
            cv2.imwrite(output_path, image)
            print(f"未旋转图像：{input_path} -> {output_path}")
    except Exception as e:
        print(f"处理图像时出现错误：{e}")

将图片缩放到模型指定大小

训练的使用使用的图片尺寸是960*960，将图片按比例缩放到最大边为960，方法是

def img_resize(image):
    height, width = image.shape[0], image.shape[1]
    # 设置新的图片分辨率框架
    width_new = 960
    height_new = 960
    # 判断图片的长宽比率
    if width / height >= width_new / height_new:
        img_new = cv2.resize(image, (width_new, int(height * width_new / width)))
    else:
        img_new = cv2.resize(image, (int(width * height_new / height), height_new))
    return img_new

模型识别

用上述方法得到的图片是0°或180°，所以需要使用paddleocr的方向分类器。下面是使用paddleocr的推理脚本代码，具体参数和使用方法参见PaddleOCR推理参数解释

python3 tools/infer/predict_system.py \
    --det_model_dir=det_model_dir \
    --rec_model_dir=rec_model_dir\
    --det_limit_type=max  \
    --det_limit_side_len=960 \
    --det_db_unclip_ratio=3 \
    --det_db_box_thresh=0.01 \
    --det_db_thresh=0.01 \
    --image_dir=test.jpg \
    --draw_img_save_dir=infer/test \
    --rec_char_dict_path=/mnt/workspace/PaddleOCR/ppocr/utils/bank_keys.txt \
    --use_dilation=True \
    --use_angle_cls=True

也可以使用pip安装paddleocr，再使用paddleocr方法

from paddleocr import PaddleOCR, draw_ocr
# 模型路径下必须含有model和params文件，如果没有，现在可以自动下载了，不过是最简单的模型
# use_gpu 如果paddle是GPU版本请设置为 True, use_angle_cls是是否使用反向分类器，支持0或180度的图片
# 其他参数也可像脚本一样设置使用，如det_db_unclip_ratio=3
ocr = PaddleOCR(use_angle_cls=True, use_gpu=False, det_model_dir=det_model_dir, rec_model_dir=rec_model_dir)
img_path = 'test.jpg'  # 这个是自己的图片，自行放置在代码目录下修改名称
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')  # 结果图片保存在代码同级文件夹中。

后处理

训练的模型只检测卡号和有效期，但也可能会多检测一些其他文本，所以需要对模型结果进行处理。可以用正则筛选长度在15-21的数字作为卡号，包含“/”字符的文字作为有效日期，下面是示例代码

import re

data = [
    "Card number: 123456789012345",
    "Expiration: 12/24",
    "Card number: 9876543210987654",
    "Expiration: 01/23",
    "Some other text without card number or expiration"
]

card_numbers = []
expiration_dates = []

for item in data:
    # 匹配15-21位的数字
    card_matches = re.findall(r'\b\d{15,21}\b', item)
    card_numbers.extend(card_matches)

    # 匹配形如MM/YY形式的文本，MM、YY长度为2-4
    expiration_matches = re.findall(r'\b\d{2,4}/\d{2,4}\b', item)
    expiration_dates.extend(expiration_matches)

print("卡号:", card_numbers)
print("有效期", expiration_dates)

总结

1.图片尺寸问题（文本检测模型）。之前训练使用的是800像素图片，测试使用的是4K图片，结果根本检测不到，而将图片缩小后可以检测文本。所以干脆统一训练和推理图片尺寸，根据模型，将训练图片resize到最长边为960像素，标记数据并进行训练。对于推理图片，先缩放到1080P，再将银行卡抠出来，再进行缩放识别。
2.数据集问题，还是尽量多才好。一开始就用了1000张，识别效果很差，后来又爬取了些图片，再训练，识别结果会好一些。