【山东大学软件学院 21 级项目实训】OCR模型优化方向

江应怜744

已于 2024-05-30 23:28:51 修改

阅读量561

点赞数 9

文章标签：计算机视觉 python 深度学习

于 2024-05-30 23:27:39 首次发布

本文链接：https://blog.csdn.net/qq_62496566/article/details/139337127

版权

模型优化方向

VisualDL

首先是我们对acc的评估目前还不够可视化，对此BML CodeLab集成了可视化终端可以在左侧的侧边栏查看，非常便捷。

核心代码就是下面从log里把acc找出来

# 按逗号分割字符串得到各个部分  
parts = logs.split(', ')  
                        
# 遍历各个部分，找到对应的键并提取值  
for part in parts:  
    key_value = part.split(': ')  
    if len(key_value) == 2:  
        key, value = key_value  
    if key == 'acc':  
        acc_value = value  
    elif key == 'loss':  
        loss_value = value  
                        
# 向记录器添加一个tag为`acc`的数据
writer.add_scalar(tag="acc", step=global_step, value=acc_value)
# 向记录器添加一个tag为`loss`的数据
writer.add_scalar(tag="loss", step=global_step, value=loss_value)

采用A币采用A100 进行4卡脚本训练

数据预处理好后，新建终端，将运行代码复制到终端里，就可以无视notebook保存失败的困扰了

!python -m paddle.distributed.launch --gpus '0,1,2,3' /home/aistudio/PaddleOCR/tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet_fp32_ultra.yml \
                       -o Global.pretrained_model=/home/aistudio/ch_PP-OCRv4_rec_server_train/best_accuracy.pdparams \
                          # Global.checkpoints=/home/aistudio/output/rec_ppocr_v4_hgnet/latest

trick方向的尝试

图片宽高比旋转

 for image_file in image_file_list:
        img, flag, _ = check_and_read(image_file)
        if not flag:
            img = cv2.imread(image_file)
        if img is None:
            logger.info("error in loading image:{}".format(image_file))
            continue
        valid_image_file_list.append(image_file)
    
        # 此处对“瘦高”图像（宽高比大于1.6）左转90度，以便模型识别
        h, w = img.shape[:2]
        if h > 1.5 * w:
            img = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)

经过测试，逆时针效果比顺时针好，该trick的上限就是图像方向分类器，但是苦于训练集图像过于模糊，效果并不好

全角符号半角符号替换

通过统计原始数据集里的全角半角符号占比，可以得知半角符号的占比是要显著高于全角的基于这个发现，提出来全角符号转半角符号的猜想经过筛选，一下这五个符号替换能带来较大的正提升\

def convert_punctuation(text):
    """
    Convert Chinese punctuation to corresponding English punctuation in the text.
    """
    punctuation_map = {
        '，': ',',  # 中文逗号换为英文逗号
        '。': '.',  # 中文句号换为英文句号
        '！': '!',  # 中文感叹号换为英文感叹号
        '％': '%',  # 中文顿号换为英文逗号
        '‘': "'",   # 中文左单引号换为英文左单引号
        '’': "'",   # 中文右单引号换为英文右单引号
    }
    for cn_punc, en_punc in punctuation_map.items():
        text = text.replace(cn_punc, en_punc)
    return text

部分替换错字

针对OCR容易出现的形近字错字问题，可以采用字符串替换的方法，相当于一本正确的字典，纠正预测结果,下面是节选预览\

"富家": "佛家",
"富尼": "富居",
"寒时": "寒時",
"寒本": "根本",

字典共有7044条规则，下面介绍一下如何使用以及如何生成字典

字典使用

使用字典需要从py文件里提取字典，然后添加函数，在预测结果存入txt前调用\

from OCRwikisome import OCRwikisome
def convert_OCR_some(text):
    
    for false_punc, true_punc in OCRwikisome.items():
        correct = text.replace(false_punc, true_punc)
        if correct != text:
            return correct
    return text