PaddleOCRv3之三:rec识别部分训练

TextRecognitionDataGenerator构造的数据集,图片名称的格式:[label]_[index].png,前缀就是label,例如:72K_123.png label就是72K
用下面的方法提取gt_train.txt and gt_test.txt

def TextRecognitionDataGenerator():
    """
    提取路径下的所有文件,保存到gt_train.txt , gt_test.txt中
    数据是用TextRecognitionDataGenerator产生的数据,图片名的前缀就是标签
    """
    extdName = ["bmp","jpg","jpeg","png"]
    root = r"\\192.168.1.247\Pictures\imageAndModel\paddle_OCR_dataset\OCR_dataset"
    train_ratio = 0.85 #训练集的比例
    date = "20220720"

    with open(os.path.join(root,date+"_gt_train.txt"),"w",encoding="utf-8") as train_f:
        with open(os.path.join(root,date+"_gt_test.txt"),"w",encoding="utf-8") as test_f:

            for subdir in os.listdir(root):
                subdir = os.path.join(root,subdir)
                if not os.path.isdir(subdir):
                    continue
                for file in os.listdir(subdir):
                    ext = file.rsplit(".",1)[-1]
                    if ext.lower() in extdName:
                        if random.random() < train_ratio:
                            write_f = train_f
                        else:
                            write_f = test_f

                        label = file.rsplit("_",-1)[0]
                        father_dir = os.path.basename(subdir)
                        write_msg = os.path.join(father_dir, file) + "\t" + label + "\n"
                        write_f.write(write_msg)
                        print(write_msg)

在这里插入图片描述

  • 20220720_gt_train.txt的内容
    格式:imagePath \t label
    图片路径和label之间用\t隔开
    在这里插入图片描述
  • 准备字符字典:把所有label中出现过的字符都写在字符字典中。
    在这里插入图片描述
  • yml
    需要修改的地方

Global:

save_model_dir: 模型的保存路径
character_dict_path: 字符字典路径
save_res_path:预测结果保存路径

Optimizer:

learning_rate: 0.0001 学习率,finetune的时候可以调小一点

Train:

data_dir: 数据集的路径
label_file_list:gt_train.txt的路径
batch_size_per_card: 32 指定batch_size

Eval:

data_dir: 数据集的路径
label_file_list:gt_test.txt的路径
batch_size_per_card: 32 指定batch_size

下面是我训练时用文件

Global:
  debug: false
  use_gpu: true
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/myOCR_model2
  save_epoch_step: 3
  eval_batch_step: [0, 500]
  cal_metric_during_train: true
  pretrained_model:
  checkpoints:
  save_inference_dir:
  use_visualdl: false
  infer_img: ./doc/imgs_words/ch/word_1.jpg
  character_dict_path: ppocr/utils/my_en_dict2.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/myOCR_model2/predicts_ppocrv3_en.txt


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0001
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  algorithm: SVTR
  Transform:
  Backbone:
    name: MobileNetV1Enhance
    scale: 0.5
    last_conv_stride: [1, 2]
    last_pool_type: avg
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 64
            depth: 2
            hidden_dims: 120
            use_guide: True
          Head:
            fc_decay: 0.00001
      - SARHead:
          enc_dim: 512
          max_text_length: *max_text_length

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    - SARLoss:

PostProcess:  
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: False

Train:
  dataset:
    name: SimpleDataSet
    data_dir: D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset
    ext_op_transform_idx: 1
    label_file_list:
    - D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset\20220720_gt_train.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
    - RecAug:
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 32
    drop_last: true
    num_workers: 1
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset
    label_file_list:
    - D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset\20220720_gt_test.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 32
    num_workers: 1

  • 训练rec模型
python tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec_my2.yml -o Global.pretrained_model=./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值