基于百度ppocr-v3实现验证码识别
该项目使用的验证码数据集为最基本的数字和大小写字母的随机组合,然后加入随机干扰像素+随机位置。
如:
数据集:
下载链接:https://aistudio.baidu.com/aistudio/datasetdetail/159309
数据集预处理:
#生成总的标签文件 划分数据集
#划分数据集
import random
import os
train_path = r"E:\code\PaddleOCR\work\Verification_code"
SUM = []
for root,dirs,files in os.walk(train_path): # 分别代表根目录、文件夹、文件
for file in files:
imgpath = os.path.join(root,file)
SUM.append(imgpath+"\t"+file.split(".")[0]+"\n")
# 生成总标签文件
allstr = ''.join(SUM)
f = open('work/total_list.txt','w',encoding='utf-8')
f.write(allstr)
f.close
print("数据集数量:{}".format(len(SUM)))
random.shuffle(SUM)
train_len = int(len(SUM) * 0.8)
test_list = SUM[train_len:]
train_list = SUM[:train_len]
print('训练集数量: {}, 验证集数量: {}'.format(len(train_list),len(test_list)))
#生成训练集的标签文件
train_txt = ''.join(train_list)
f_train = open('work/train_list.txt','w',encoding='utf-8')
f_train.write(train_txt)
f_train.close()
#生成测试集的标签文件
test_txt = ''.join(test_list)
f_test = open('work/test_list.txt','w',encoding='utf-8')
f_test.write(test_txt)
f_test.close()
# 准备字典
import codecs
class_set = set()
lines = []
file = open("work/total_list.txt", "r", encoding="utf-8") # 待转换文档,这里我们使用的是数据集的标签文件
for i in file:
a = i.strip('\n').split('\t')[-1]
lines.append(a)
file.close
for line in lines:
for e in line:
class_set.add(e)
class_list = list(class_set)
class_list.sort()
print("class num: {0}".format(len(class_list)))
with codecs.open("work/new_dict.txt", "w", encoding='utf-8') as label_list:
for id, c in enumerate(class_list):
label_list.write("{0}\n".format(c))
文字识别模型的训练使用的字典需要包含所有希望被正确识别的字,字典需要写成如下格式,一行一个字符,并以 utf-8 编码格式保存。该项目一共使用了10个数字(0-9),26个大写字母(A-Z),26个小写字母(a-z),共62个字符,在这里我们使用集合对总的数据集中的标签内容生成字典,这方法适用于绝大多数情况下的字典生成,尤其是无法知道数据集识别文字的内容时比较好用。这里生成的数据集是SimpleDataSet格式的,也就是每行是文件名和对应的标签,中间隔着分隔符’\t’。
环境配置:
Windows10,paddlepaddle-gpu=2.3.0 -cuda10.1,python3.7,paddleocr==2.6
(后续镜像打包,Ubuntu1604,拉取cuda10.1的paddle基础镜像)
可以自己拿着数据集进行训练,只需要修改一下配置文件就行,配置文件中指定数据集的路径,执行训练模型的保存路径。修改后的yml文件直接贴在这里:
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 100
save_model_dir: ./output/v3_en_mobile
save_epoch_step: 50
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: ./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
checkpoints:
save_inference_dir:
use_visualdl: true
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ./work/new_dict.txt
max_text_length: &max_text_length 6
infer_mode: false
use_space_char: false
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_en.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./work/Verification_code
ext_op_transform_idx: 1
label_file_list:
- ./work/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./work/Verification_code
label_file_list:
- ./work/test_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 32
num_workers: 4
使用paddle的原命令行,可以直接训练:
python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml
acc能达到0.93,说明效果还是挺好的,当然还具有一定的提升空间,仍然存在易混字符容易识别错误,0和O,l和1,w和W,x和X,z、Z和2之间,不仅限于使用基于知识蒸馏的训练,使用数据扩增方法,还可以进一步合成数据,合成各式各样的验证码来进一步提高模型的精度。
这里重点在于将训练模型转为推理模型:
.pdparams、.pdopt、*.states为训练过程中保存的模型的参数、优化器状态和训练中间信息,多用于模型指标评估和恢复训练,所以在实际的应用中需要转换成用于预测引擎推理模型inference.pdmodel、inference.pdiparams,然后基于推理模型去做进一步的部署
python tools/export_model.py -c ./en_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/v3_en_mobile/best_accuracy Global.save_inference_dir=./inference/en_PP-OCRv3_rec/
模型部署:flask框架
import os
import socket
from flask import Flask, request
app = Flask(__name__)
def host_ip():
"""
查询本机ip地址
:return: ip
"""
ip = '0.0.0.0'
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
s.connect(('8.8.8.8', 80))
ip = s.getsockname()[0]
except OSError as ex:
hostname = socket.gethostname()
ip = socket.gethostbyname(hostname)
finally:
s.close()
return ip
@app.route('/', methods=["GET"])
def hello_world():
return 'hello world'
@app.route("/captcha_ocr", methods=["POST"])
def ocr_html_post():
data = request.files
file = data['file']
print(file.filename)
try:
os.rename(file.filename, 'cache.png')
except:
print("have the same name")
file.save('cache.png')
ocr_str = ocr('cache.png')
return str(ocr_str)
def ocr(img_path):
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True, rec_image_shape="3, 48, 320",
rec_char_dict_path="./work/new_dict.txt", rec_char_type='en',
rec_algorithm='SVTR_LCNet',
rec_model_dir='./inference/en_PP-OCRv3_rec/',
cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/',
det_model_dir='./output/inference/en_PP-OCRv3_det_infer/') # need to run only once to download and load model into memory
# img_path = './test/W30J.png'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
res = result[idx]
for line in res:
print(line)
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
print('识别结果为:', txts,'准确率为:',scores)
return txts,scores
if __name__ == '__main__':
# app.run(port=5067, debug=True)
hostip = host_ip()
app.run(debug=True, port=5067, host=hostip)
这里在模型路径指定中:文字检测和方向检测使用的是paddleocr中已有的推理模型,官方可下载,主要的区别在于识别模型是上述模型训练自定义识别的模型。在进行参数指定时需要注意将字典对应到模型训练使用的字典,否则调用训练好的模型,结果会产生比较大的出入。
调用代码:
import requests
url = "http://X.X.X.X:5067/captcha_ocr"
files = {'file': open('./test/W45G.png', 'rb')}
r = requests.post(url, files=files)
print(r.text)
运行日志如下:
[2023/06/15 15:38:53] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='./output/inference/en_PP-OCRv3_det_infer/', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_id=0, gpu_mem=500, help='==SUPPRESS==', image_dir=None, image_orientation=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='en', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv3', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='./work/new_dict.txt', rec_char_type='en', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='./inference/en_PP-OCRv3_rec/', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-StructureV2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
[2023/06/15 15:38:57] ppocr DEBUG: dt_boxes num : 1, elapse : 1.6441783905029297
[2023/06/15 15:38:57] ppocr DEBUG: cls num : 1, elapse : 0.036865234375
[2023/06/15 15:38:57] ppocr DEBUG: rec_res num : 1, elapse : 0.02038741111755371
[[[6.0, 1.0], [115.0, 3.0], [114.0, 42.0], [5.0, 40.0]], ('w36a', 0.9875728487968445)]
识别结果为: ['w36a'] 准确率为: [0.9875728487968445]
172.22.188.43 - - [15/Jun/2023 15:38:57] "POST /captcha_ocr HTTP/1.1" 200 -
okk~
镜像打包:
先来给他导出依赖:
pip freeze > requirements.txt
所有的文件转到服务器:
写个dockerfile文件:
# 拉取基础镜像
FROM registry.baidubce.com/ais-public/ais2.3:cuda10.1_cudnn7-ubuntu16.04-py37
# 设置环境变量
ENV PATH=/home/bml/anaconda3/envs/py3.7.4/bin:${PATH}
# 构建工作目录
RUN mkdir -p /home/captcha-gpu
WORKDIR /home/captcha-gpu
# Copy contents
COPY . /home/captcha-gpu
# 安装python依赖模块
RUN pip install --index-url https://mirrors.ustc.edu.cn/pypi/web/simple --requirement requirements.txt
RUN python -m pip install paddlepaddle-gpu==2.3.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
RUN pip install protobuf==3.20.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install onnx==1.12 -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables
CMD ["python","flask-ocr.py"]
~
okk~动用咱们的docker命令知识储备:
构建镜像:
docker build -f Dockerfile -t yolov5:v0 .
镜像导出:
docker save yolov5:v0 -o /home/yolov5_v0.tar
okk~所有的任务完成
#★,°:.☆( ̄▽ ̄)/$:.°★ 。