最近想使用百度的PaddleOCR进行许可证识别,想通过训练一些自己的数据集来提高点识别率。但是模型训练时一直报错,就这一个错误折磨了我这个小菜鸡一下午~~>_<~~!,所以写个帖子来记录一下!
一直报错:
Traceback (most recent call last):
File ".\tools\train.py", line 148, in
main(config, device, logger, vdl_writer)
File ".\tools\train.py", line 52, in main
train_dataloader = build_dataloader(config, 'Train', device, logger)
File "D:\PaddleOCR-release-2.4\ppocr\data_init_.py", line 64, in build_dataloader
dataset = eval(module_name)(config, mode, logger, seed)
File "D:\PaddleOCR-release-2.4\ppocr\data\simple_dataset.py", line 41, in init
) == data_source_num, "The length of ratio_list should be the same as the file_list."
AssertionError: The length of ratio_list should be the same as the file_list.
经过一顿查找发现是configs文件夹下模型使用的.yml配置文件中data_dir,label_file_list的地址没有写对,这里的地址对应的是自己的数据集路径。