数据集:/PaddleOCR/doc/doc_ch/datasets.md
数据合成工具:/PaddleOCR/doc/doc_ch/data_synthesis.md
文字识别训练:/PaddleOCR/doc/doc_ch/recognition.md
暂时没有自己的数据,只能用开源数据练手。
根据recognition.md中的说明一步一步地操作,一般训练都是在Linux下操作的,我是在win10下训练的,所以原文档可能有些对我不太适合,可能原文档也有些错误。
1、下载数据
icdar15用于识别的数据集已经下载不到,可以从我的百度网盘下载:链接:https://pan.baidu.com/s/1c_8qzXPeJ6WMd30PaHx8LA
提取码:s1wk
其中的图片虽然后缀是.jpg其实是png文件,需要自己用脚本修改后缀为png。
脚本如下:
ren *.jpg *.png
比如起个名字叫rename.bat,放到图片文件夹下,双击执行,就会将.jpg后缀的文件重命名为.png后缀的文件。
PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载:
# 训练集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# 测试集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
最终训练集应有如下文件结构:
|-train_data
|-ic15_data
|- rec_gt_train.txt
|- train
|- word_001.png
|- word_002.png
|- word_003.png
| ...
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示:
|-train_data
|-ic15_data
|- rec_gt_test.txt
|- test
|- word_001.png
|- word_002.png
|- word_003.png
| ...
2、启动训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune
cd PaddleOCR/
# 下载MobileNetV3的预训练模型
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar
# 解压模型参数
cd pretrain_models
tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc_v2.0_train.tar
开始训练:
如果您安装的是cpu版本,请将配置文件中的 use_gpu
字段修改为false
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
由于我是在win10下训练的,并且只有一个显卡,训练脚本如下:
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
python tools/train.py -c configs/rec/rec_icdar15_train.yml
同时rec_icdar15_train.yml文件中的一些相关路径也是有问题的,需要改成自己实际的路径:
需要注意的是下面三个路径:character_dict_path:、data_dir:、label_file_list:
之前我就遇到过,character_dict_path:路径前面少了一个“./”导致训练时 acc 一直为0.
一些说明:
1、使用了预训练模型,下面的rec_icdar15_train.yml文件中又相应的内容(测试通过)
pretrained_model: ./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy
2、如果不在pretrained_model: 后面添加预训练模型路径,通过命令行使用预训练模型应该也是可以的,参考如下(测试未通过):
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
python tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.pretrain_weights=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy
3、字母大写支持,需要两方面的改动,字符集添加大写字母,标签大写字母(未测试)。
参考如下:
Global:
use_gpu: true
epoch_num: 720
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/ic15/
save_epoch_step: 3
# evaluation is run every 2000 iterations
eval_batch_step: [0, 2000]
# if pretrained_model is saved in static mode, load_static_weights must set to True
cal_metric_during_train: True
pretrained_model: ./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy
checkpoints: #./output/rec/ic15/best_accuracy
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ./ppocr/utils/ic15_dict.txt
character_type: ch
max_text_length: 25
infer_mode: False
use_space_char: False
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
learning_rate: 0.0005
regularizer:
name: 'L2'
factor: 0
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 96
Head:
name: CTCHead
fc_decay: 0
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/
label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 256
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/
label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 4
use_shared_memory: False
因为使用了预训练模型,acc一上来就比较高
下面是训练时截取的片段:
[2021/04/25 22:55:46] root INFO: epoch: [33/720], iter: 560, lr: 0.000500, loss: 0.326820, acc: 0.968750, norm_edit_dis: 0.986556, reader_cost: 0.07451 s, batch_cost: 0.16704 s, samples: 2560, ips: 1532.54
701
[2021/04/25 22:55:47] root INFO: save model in ./output/rec/ic15/latest
[2021/04/25 22:55:47] root INFO: save model in ./output/rec/ic15/iter_epoch_33
[2021/04/25 22:55:47] root INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_train.txt']
[2021/04/25 22:55:50] root INFO: epoch: [34/720], iter: 570, lr: 0.000500, loss: 0.256092, acc: 0.968750, norm_edit_dis: 0.989618, reader_cost: 0.07875 s, batch_cost: 0.17310 s, samples: 2560, ips: 1478.88
973
[2021/04/25 22:55:52] root INFO: save model in ./output/rec/ic15/latest
[2021/04/25 22:55:52] root INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_train.txt']
[2021/04/25 22:55:53] root INFO: epoch: [35/720], iter: 580, lr: 0.000500, loss: 0.221114, acc: 0.972656, norm_edit_dis: 0.992008, reader_cost: 0.02936 s, batch_cost: 0.05761 s, samples: 768, ips: 1333.150
19
[2021/04/25 22:55:56] root INFO: epoch: [35/720], iter: 590, lr: 0.000500, loss: 0.212451, acc: 0.972656, norm_edit_dis: 0.992313, reader_cost: 0.07461 s, batch_cost: 0.16886 s, samples: 2560, ips: 1516.03
996
[2021/04/25 22:55:58] root INFO: save model in ./output/rec/ic15/latest
[2021/04/25 22:55:58] root INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_train.txt']
[2021/04/25 22:56:00] root INFO: epoch: [36/720], iter: 600, lr: 0.000500, loss: 0.217688, acc: 0.972656, norm_edit_dis: 0.991217, reader_cost: 0.04877 s, batch_cost: 0.10549 s, samples: 1536, ips: 1456.10
046
[2021/04/25 22:56:03] root INFO: epoch: [36/720], iter: 610, lr: 0.000500, loss: 0.230538, acc: 0.972656, norm_edit_dis: 0.990990, reader_cost: 0.07450 s, batch_cost: 0.16910 s, samples: 2560, ips: 1513.88
089
[2021/04/25 22:56:03] root INFO: save model in ./output/rec/ic15/latest
[2021/04/25 22:56:03] root INFO: save model in ./output/rec/ic15/iter_epoch_36
[2021/04/25 22:56:03] root INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_train.txt']
[2021/04/25 22:56:06] root INFO: epoch: [37/720], iter: 620, lr: 0.000500, loss: 0.200544, acc: 0.976562, norm_edit_dis: 0.991990, reader_cost: 0.07021 s, batch_cost: 0.15514 s, samples: 2304, ips: 1485.15
136
简单的示例跑通就可以替换自己的数据跑识别训练了。
然后单张测试:
# 预测英文结果
python tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.pretrained_model=./output/rec/ic15/latest Global.load_static_weights=false Global.infer_img=./train_data/ic15_data/test/word_10.png
测试结果:
[2021/04/25 22:56:17] root INFO: train with paddle 2.0.0 and device CUDAPlace(0)
W0425 22:56:17.050060 3624 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
W0425 22:56:17.062021 3624 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[2021/04/25 22:56:19] root INFO: load pretrained model from ['./output/rec/ic15/latest']
[2021/04/25 22:56:19] root INFO: infer_img: ./train_data/ic15_data/test/word_10.png
[2021/04/25 22:56:19] root INFO: result: ('pain', 0.99994206)
[2021/04/25 22:56:19] root INFO: success!
实际图片如下,可见识别正确。