【基于百度ppocr-v3实现验证码识别】

Snoopy1316

已于 2024-02-06 09:14:43 修改

阅读量1.8k

点赞数 11

文章标签： python 人工智能开发语言 flask httpx

于 2023-06-15 15:54:15 首次发布

本文链接：https://blog.csdn.net/qq_43361923/article/details/131228254

版权

基于百度ppocr-v3实现验证码识别

该项目使用的验证码数据集为最基本的数字和大小写字母的随机组合，然后加入随机干扰像素+随机位置。
如：
在这里插入图片描述

数据集：

下载链接：https://aistudio.baidu.com/aistudio/datasetdetail/159309

数据集预处理：

#生成总的标签文件 划分数据集
#划分数据集
import random
import os
train_path = r"E:\code\PaddleOCR\work\Verification_code"
SUM = []
for root,dirs,files in os.walk(train_path): # 分别代表根目录、文件夹、文件
    for file in files:
        imgpath = os.path.join(root,file)
        SUM.append(imgpath+"\t"+file.split(".")[0]+"\n")
    # 生成总标签文件
    allstr = ''.join(SUM)
    f = open('work/total_list.txt','w',encoding='utf-8')
    f.write(allstr)
    f.close
print("数据集数量：{}".format(len(SUM)))


random.shuffle(SUM)
train_len = int(len(SUM) * 0.8)
test_list = SUM[train_len:]
train_list = SUM[:train_len]
print('训练集数量: {}, 验证集数量: {}'.format(len(train_list),len(test_list)))
#生成训练集的标签文件
train_txt = ''.join(train_list)
f_train = open('work/train_list.txt','w',encoding='utf-8')
f_train.write(train_txt)
f_train.close()
#生成测试集的标签文件
test_txt = ''.join(test_list)
f_test = open('work/test_list.txt','w',encoding='utf-8')
f_test.write(test_txt)
f_test.close()

# 准备字典
import codecs

class_set = set()
lines = []
file = open("work/total_list.txt", "r", encoding="utf-8")  # 待转换文档，这里我们使用的是数据集的标签文件
for i in file:
    a = i.strip('\n').split('\t')[-1]
    lines.append(a)
file.close
for line in lines:
    for e in line:
        class_set.add(e)
class_list = list(class_set)
class_list.sort()
print("class num: {0}".format(len(class_list)))
with codecs.open("work/new_dict.txt", "w", encoding='utf-8') as label_list:
    for id, c in enumerate(class_list):
        label_list.write("{0}\n".format(c))

文字识别模型的训练使用的字典需要包含所有希望被正确识别的字，字典需要写成如下格式，一行一个字符，并以 utf-8 编码格式保存。该项目一共使用了10个数字(0-9),26个大写字母(A-Z),26个小写字母(a-z),共62个字符，在这里我们使用集合对总的数据集中的标签内容生成字典，这方法适用于绝大多数情况下的字典生成，尤其是无法知道数据集识别文字的内容时比较好用。这里生成的数据集是SimpleDataSet格式的，也就是每行是文件名和对应的标签，中间隔着分隔符’\t’。

环境配置：

Windows10，paddlepaddle-gpu=2.3.0 -cuda10.1,python3.7,paddleocr==2.6
（后续镜像打包，Ubuntu1604，拉取cuda10.1的paddle基础镜像）

可以自己拿着数据集进行训练，只需要修改一下配置文件就行，配置文件中指定数据集的路径，执行训练模型的保存路径。修改后的yml文件直接贴在这里：

Global:
  debug: false
  use_gpu: true
  epoch_num: 500
  log_smooth_window: 20
  print_batch_step: 100
  save_model_dir: ./output/v3_en_mobile
  save_epoch_step: 50
  eval_batch_step: [0, 2000]
  cal_metric_during_train: true
  pretrained_model:  ./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
  checkpoints:
  save_inference_dir:
  use_visualdl: true
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ./work/new_dict.txt
  max_text_length: &max_text_length 6
  infer_mode: false
  use_space_char: false
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3_en.txt


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  algorithm: SVTR
  Transform:
  Backbone:
    name: MobileNetV1Enhance
    scale: 0.5
    last_conv_stride: [1, 2]
    last_pool_type: avg
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 64
            depth: 2
            hidden_dims: 120
            use_guide: True
          Head:
            fc_decay: 0.00001
      - SARHead:
          enc_dim: 512
          max_text_length: *max_text_length

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    - SARLoss:

PostProcess:  
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: False

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./work/Verification_code
    ext_op_transform_idx: 1
    label_file_list:
    - ./work/train_list.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
    - RecAug:
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 64
    drop_last: true
    num_workers: 4
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./work/Verification_code
    label_file_list:
    - ./work/test_list.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 32
    num_workers: 4

使用paddle的原命令行，可以直接训练：
python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml

acc能达到0.93，说明效果还是挺好的，当然还具有一定的提升空间，仍然存在易混字符容易识别错误，0和O，l和1，w和W，x和X，z、Z和2之间，不仅限于使用基于知识蒸馏的训练，使用数据扩增方法，还可以进一步合成数据，合成各式各样的验证码来进一步提高模型的精度。

这里重点在于将训练模型转为推理模型：
.pdparams、.pdopt、*.states为训练过程中保存的模型的参数、优化器状态和训练中间信息，多用于模型指标评估和恢复训练，所以在实际的应用中需要转换成用于预测引擎推理模型inference.pdmodel、inference.pdiparams，然后基于推理模型去做进一步的部署
python tools/export_model.py -c ./en_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/v3_en_mobile/best_accuracy Global.save_inference_dir=./inference/en_PP-OCRv3_rec/

模型部署：flask框架

import os
import socket
from flask import Flask, request

app = Flask(__name__)


def host_ip():
    """
    查询本机ip地址
    :return: ip
    """
    ip = '0.0.0.0'
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    try:
        s.connect(('8.8.8.8', 80))
        ip = s.getsockname()[0]
    except OSError as ex:
        hostname = socket.gethostname()
        ip = socket.gethostbyname(hostname)
    finally:
        s.close()

    return ip

@app.route('/', methods=["GET"])
def hello_world():
    return 'hello world'


@app.route("/captcha_ocr", methods=["POST"])
def ocr_html_post():
    data = request.files
    file = data['file']
    print(file.filename)
    try:
        os.rename(file.filename, 'cache.png')
    except:
        print("have the same name")
    file.save('cache.png')
    ocr_str = ocr('cache.png')
    return str(ocr_str)




def ocr(img_path):
    from paddleocr import PaddleOCR, draw_ocr
    ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True, rec_image_shape="3, 48, 320",
                    rec_char_dict_path="./work/new_dict.txt", rec_char_type='en',
                    rec_algorithm='SVTR_LCNet',
                    rec_model_dir='./inference/en_PP-OCRv3_rec/',
                    cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/',
                    det_model_dir='./output/inference/en_PP-OCRv3_det_infer/')  # need to run only once to download and load model into memory

    # img_path = './test/W30J.png'
    result = ocr.ocr(img_path, cls=True)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)

    from PIL import Image
    result = result[0]
    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/simfang.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save('result.jpg')
    print('识别结果为：', txts,'准确率为：',scores)
    return txts,scores


if __name__ == '__main__':
    # app.run(port=5067, debug=True)
    hostip = host_ip()
    app.run(debug=True, port=5067, host=hostip)

这里在模型路径指定中：文字检测和方向检测使用的是paddleocr中已有的推理模型，官方可下载，主要的区别在于识别模型是上述模型训练自定义识别的模型。在进行参数指定时需要注意将字典对应到模型训练使用的字典，否则调用训练好的模型，结果会产生比较大的出入。

调用代码：

import requests
url = "http://X.X.X.X:5067/captcha_ocr"
files = {'file': open('./test/W45G.png', 'rb')}
r = requests.post(url, files=files)

print(r.text)

运行日志如下：

[2023/06/15 15:38:53] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='./output/inference/en_PP-OCRv3_det_infer/', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_id=0, gpu_mem=500, help='==SUPPRESS==', image_dir=None, image_orientation=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='en', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv3', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='./work/new_dict.txt', rec_char_type='en', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='./inference/en_PP-OCRv3_rec/', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-StructureV2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
[2023/06/15 15:38:57] ppocr DEBUG: dt_boxes num : 1, elapse : 1.6441783905029297
[2023/06/15 15:38:57] ppocr DEBUG: cls num  : 1, elapse : 0.036865234375
[2023/06/15 15:38:57] ppocr DEBUG: rec_res num  : 1, elapse : 0.02038741111755371
[[[6.0, 1.0], [115.0, 3.0], [114.0, 42.0], [5.0, 40.0]], ('w36a', 0.9875728487968445)]
识别结果为： ['w36a'] 准确率为： [0.9875728487968445]
172.22.188.43 - - [15/Jun/2023 15:38:57] "POST /captcha_ocr HTTP/1.1" 200 -

okk~

镜像打包：

先来给他导出依赖：
pip freeze > requirements.txt

所有的文件转到服务器：
写个dockerfile文件：

# 拉取基础镜像
FROM registry.baidubce.com/ais-public/ais2.3:cuda10.1_cudnn7-ubuntu16.04-py37
# 设置环境变量
ENV PATH=/home/bml/anaconda3/envs/py3.7.4/bin:${PATH}

# 构建工作目录
RUN mkdir -p /home/captcha-gpu
WORKDIR /home/captcha-gpu
# Copy contents
COPY . /home/captcha-gpu

# 安装python依赖模块
RUN pip install --index-url https://mirrors.ustc.edu.cn/pypi/web/simple --requirement requirements.txt
RUN python -m pip install paddlepaddle-gpu==2.3.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
RUN pip install protobuf==3.20.0  -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install onnx==1.12  -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables
CMD ["python","flask-ocr.py"]
~

okk~动用咱们的docker命令知识储备：
构建镜像：
docker build -f Dockerfile -t yolov5:v0 .
镜像导出：
docker save yolov5:v0 -o /home/yolov5_v0.tar