ppocrv5自己数据集训练识别踩坑指南

PP-OCRv5自定义数据集训练识别踩坑指南

最新推荐文章于 2025-11-25 15:08:42 发布

原创最新推荐文章于 2025-11-25 15:08:42 发布 · 949 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #ocr #paddlepaddle

部署运行你感兴趣的模型镜像

使用版本介绍：

显卡：A5000

CUDA：11.8

cudnn=8.9.2 or 8.7

torch=2.1.1+cu118

torchaudio=2.1.2+cu118

torchvision=0.16.2+cu118

numpy=2.0.1

paddleocr=3.1.0

PaddleOCR-release-3.1

paddlepaddle-gpu=3.1.0

paddlex=3.1.3

pandas=2.3.0

检测配置：config/PP-OCRv5/PP-OCRv5_mobile_det.yml

识别配置： config/PP-OCRv5/PP-OCRv5_mobile_rec.yml

检测训练模型：PP-OCRv5_mobile_det_pretrained.pdparams

识别训练模型：PP-OCRv5_mobile_det_pretrained.pdparams

两次训练：文字检测训练模型det，文字识别训练模型rec

导出训练模型：导出名称可自定义分别位infer_det，infer_rec导出模型文件夹里面有3个文件分别是.json，.yml，.pdiparams，其中yml文件有大坑，Global：model_name有特定名称要求，model_name必须在下面这些范围内，不然会报错：Error: Model name mismatch

[STFPM,  PP-DocBee-2B,  PP-DocBee-7B,  PP-Chart2Table,  PP-DocBee2-3B,  PP-ShiTuV2_rec, 

 PP-ShiTuV2_rec_CLIP_vit_base,  PP-ShiTuV2_rec_CLIP_vit_large,  MobileFaceNet,  

ResNet50_face,  LaTeX_OCR_rec,  UniMERNet,  PP-FormulaNet-S,  PP-FormulaNet-L, 

 PP-FormulaNet_plus-S,  PP-FormulaNet_plus-M,  PP-FormulaNet_plus-L,  

CLIP_vit_base_patch16_224,  CLIP_vit_large_patch14_224,  ConvNeXt_tiny,  ConvNeXt_small,
 
 ConvNeXt_base_224,  ConvNeXt_base_384,  ConvNeXt_large_224,  ConvNeXt_large_384, 

 MobileNetV1_x0_25,  MobileNetV1_x0_5,  MobileNetV1_x0_75,  MobileNetV1_x1_0, 

 
MobileNetV2_x0_25,  MobileNetV2_x0_5,  MobileNetV2_x1_0,  MobileNetV2_x1_5,  

MobileNetV2_x2_0,  MobileNetV3_large_x0_35,  MobileNetV3_large_x0_5,  

MobileNetV3_large_x0_75,  MobileNetV3_large_x1_0,  MobileNetV3_large_x1_25, 

 MobileNetV3_small_x0_35,  MobileNetV3_small_x0_5,  MobileNetV3_small_x0_75, 

 MobileNetV3_small_x1_0,  MobileNetV3_small_x1_25,  MobileNetV4_conv_small,  

MobileNetV4_conv_medium,  MobileNetV4_conv_large,  MobileNetV4_hybrid_medium,  

MobileNetV4_hybrid_large,  PP-HGNet_tiny,  PP-HGNet_small,  PP-HGNet_base,  

PP-HGNetV2-B0,  PP-HGNetV2-B1,  PP-HGNetV2-B2,  PP-HGNetV2-B3,  PP-HGNetV2-B4,  

PP-HGNetV2-B5,  PP-HGNetV2-B6,  PP-LCNet_x0_25,  PP-LCNet_x0_25_textline_ori,  PP-LCNet_x0_35, 

 PP-LCNet_x0_5,  PP-LCNet_x0_75,  PP-LCNet_x1_0,  PP-LCNet_x1_0_doc_ori, 

 PP-LCNet_x1_0_textline_ori,  PP-LCNet_x1_5,  PP-LCNet_x2_0,  PP-LCNet_x2_5, 

 PP-LCNetV2_small,  PP-LCNetV2_base,  PP-LCNetV2_large,  ResNet101,  ResNet152,  

ResNet18,  ResNet34,  ResNet50,  ResNet200_vd,  ResNet101_vd,  ResNet152_vd,  

ResNet18_vd,  ResNet34_vd,  ResNet50_vd,  SwinTransformer_tiny_patch4_window7_224,

  SwinTransformer_small_patch4_window7_224,  SwinTransformer_base_patch4_window7_224, 

 SwinTransformer_base_patch4_window12_384,  SwinTransformer_large_patch4_window7_224, 
 
SwinTransformer_large_patch4_window12_384,  StarNet-S1,  StarNet-S2,  StarNet-S3,  

StarNet-S4,  FasterNet-L,  FasterNet-M,  FasterNet-S,  FasterNet-T0,  FasterNet-T1, 

 FasterNet-T2,  PP-LCNet_x1_0_table_cls,  ResNet50_ML,  PP-LCNet_x1_0_ML,  

PP-HGNetV2-B0_ML,  PP-HGNetV2-B4_ML,  PP-HGNetV2-B6_ML,  CLIP_vit_bE_plus-S, 

 PP-YOLOE_plus-X,  RT-DETR-H,  RT-DETR-L,  RT-DETR-R18,  RT-DETR-R50,  RT-DETR-X, 

 PicoDet_layout_1x,  PicoDet_layout_1x_table,  PicoDet-S_layout_3cls,  

PicoDet-S_layout_17cls,  PicoDet-L_layout_3cls,  PicoDet-L_layout_17cls, 

 RT-DETR-H_layout_3cls,  RT-DETR-H_layout_17cls,  YOLOv3-DarkNet53,  YOLOv3-MobileNetV3, 

 YOLOv3-ResNet50_vd_DCN,  YOLOX-L,  YOLOX-M,  YOLOX-N,  YOLOX-S,  YOLOX-T,  YOLOX-X,  

FasterRCNN-ResNet34-FPN,  FasterRCNN-ResNet50,  FasterRCNN-ResNet50-FPN,  

FasterRCNN-ResNet50-vd-FPN,  FasterRCNN-ResNet50-vd-SSLDv2-FPN,  FasterRCNN-ResNet101,

  FasterRCNN-ResNet101-FPN,  FasterRCNN-ResNeXt101-vd-FPN,  FasterRCNN-Swin-Tiny-FPN,  

Cascade-FasterRCNN-ResNet50-FPN,  Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN,  PicoDet-M,

  PicoDet-XS,  FCOS-ResNet50,  DETR-R50,  PP-ShiTuV2_det,  PP-YOLOE-L_human,  

PP-YOLOE-S_human,  PP-YOLOE-L_vehicle,  PP-YOLOE-S_vehicle,  PP-YOLOE_plus_SOD-L, 

 PP-YOLOE_plus_SOD-S,  PP-YOLOE_plus_SOD-largesize-L,  CenterNet-DLA-34, 

 CenterNet-ResNet50,  PicoDet_LCNet_x2_5_face,  BlazeFace,  BlazeFace-FPN-SSH, 

 PP-YOLOE_plus-S_face,  PP-YOLOE-R-L,  Co-Deformable-DETR-R50,  

Co-Deformable-DETR-Swin-T,  Co-DINO-R50,  Co-DINO-Swin-L, 
 RT-DETR-L_wired_table_cell_det,  RT-DETR-L_wireless_table_cell_det, 

 PP-DocLayout-L,  PP-DocLayout-M,  PP-DocLayout-S,  PP-DocLayout_plus-L,

  PP-DocBlockLayout,  Mask-RT-DETR-S,  Mask-RT-DETR-M,  Mask-RT-DETR-X,  

Mask-RT-DETR-H,  Mask-RT-DETR-L,  SOLOv2,  MaskRCNN-ResNet50,  MaskRCNN-ResNet50-FPN,

  MaskRCNN-ResNet50-vd-FPN,  MaskRCNN-ResNet101-FPN,  MaskRCNN-ResNet101-vd-FPN,  

MaskRCNN-ResNeXt101-vd-FPN,  MaskRCNN-ResNet50-vd-SSLDv2-FPN, 

 Cascade-MaskRCNN-ResNet50-FPN,  Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN, 

 PP-YOLOE_seg-S,  PP-TinyPose_128x96,  PP-TinyPose_256x192,  BEVFusion,  

whisper_large,  whisper_medium,  whisper_base,  whisper_small,  whisper_tiny,  

GroundingDINO-T,  YOLO-Worldv2-L,  SAM-H_point,  SAM-H_box,  Deeplabv3_Plus-R101, 

 Deeplabv3_Plus-R50,  Deeplabv3-R101,  Deeplabv3-R50,  OCRNet_HRNet-W48, 

 OCRNet_HRNet-W18,  PP-LiteSeg-T,  PP-LiteSeg-B,  SegFormer-B0,  SegFormer-B1, 

 SegFormer-B2,  SegFormer-B3,  SegFormer-B4,  SegFormer-B5,  SeaFormer_base,  

SeaFormer_tiny,  SeaFormer_small,  SeaFormer_large,  MaskFormer_tiny,  MaskFormer_small,
 
 SLANet,  SLANet_plus,  SLANeXt_wired,  SLANeXt_wireless,  PP-OCRv5_mobile_det, 

 PP-OCRv5_server_det,  PP-OCRv4_mobile_det,  PP-OCRv4_server_det, 

 PP-OCRv4_mobile_seal_det,  PP-OCRv4_server_seal_det,  PP-OCRv3_mobile_det, 

 PP-OCRv3_server_det,  PP-OCRv3_mobile_rec,  en_PP-OCRv3_mobile_rec,  

korean_PP-OCRv3_mobile_rec,  japan_PP-OCRv3_mobile_rec,  chinese_cht_PP-OCRv3_mobile_rec,

te_PP-OCRv3_mobile_rec,  ka_PP-OCRv3_mobile_rec,  ta_PP-OCRv3_mobile_rec, 

 latin_PP-OCRv3_mobile_rec,  arabic_PP-OCRv3_mobile_rec,  cyrillic_PP-OCRv3_mobile_rec, 

 devanagari_PP-OCRv3_mobile_rec,  PP-OCRv4_mobile_rec,  PP-OCRv4_server_rec,  

en_PP-OCRv4_mobile_rec,  PP-OCRv4_server_rec_doc,  ch_SVTRv2_rec,  ch_RepSVTR_rec,

  PP-OCRv5_server_rec,  PP-OCRv5_mobile_rec,  latin_PP-OCRv5_mobile_rec, 

 eslav_PP-OCRv5_mobile_rec,  korean_PP-OCRv5_mobile_rec,  AutoEncoder_ad,  DLinear_ad, 

 Nonstationary_ad,  PatchTST_ad,  TimesNet_ad,  TimesNet_cls,  DLinear,  NLinear,  

Nonstationary,  PatchTST,  RLinear,  TiDE,  TimesNet,  PP-TSM-R50_8frames_uniform,  

PP-TSMv2-LCNetV2_8frames_uniform,  PP-TSMv2-LCNetV2_16frames_uniform,  YOWO]

运行自己模型：

from paddleocr import PaddleOCR

ocr = PaddleOCR(

    text_detection_model_name="PP-OCRv5_mobile_det",
    text_recognition_model_name="PP-OCRv5_mobile_rec",
    
    text_detection_model_dir=r"C:\Users\Administrator\Desktop\Mo_paddleocr\PaddleOCR-release-3.1\output\output_infer_det",
    text_recognition_model_dir=r"C:\Users\Administrator\Desktop\Mo_paddleocr\PaddleOCR-release-3.1\output\output_infer_rec",

    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
) 


result = ocr.predict("train_data\\icdar2015\\text_localization\\det\\test\\28.jpg")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

text_detection_model_name="PP-OCRv5_mobile_det",
text_recognition_model_name="PP-OCRv5_mobile_rec",

这两个必须和yml文件中的Madel_name对应

模型导出

模型导出往往是利用官方提供的命令行，博主比较闲麻烦直接修改tools/export_model.py进行模型导出，修改后直接运行export_model.py文件即可

# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import sys

__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.insert(0, os.path.abspath(os.path.join(__dir__, "..")))

import argparse

from tools.program import load_config, merge_config, ArgsParser
from ppocr.utils.export_model import export


def main():
    # FLAGS = ArgsParser().parse_args()
    # config = load_config(FLAGS.config)
    # config = merge_config(config, FLAGS.opt)

    # 训练完成后的config文件
    my_config = "C:\\Users\\Administrator\\Desktop\\Mo_paddleocr\\PaddleOCR-release-3.1\\output\\Msy_test_ppocr_rec\\config.yml"
    
    config = load_config(my_config)
    my_opt = {"Global.pretrained_model":"C:\\Users\\Administrator\\Desktop\\Mo_paddleocr\\PaddleOCR-release-3.1\\output\\Msy_test_ppocr_rec\\best_model\\model.pdparams",
                "Global.save_inference_dir":"C:\\Users\\Administrator\\Desktop\\Mo_paddleocr\\PaddleOCR-release-3.1\\output_infer_rec"}
    
    config = merge_config(config, my_opt)
    # export model
    export(config)


if __name__ == "__main__":
    main()

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理