【SDK案例系列 02】基于 MindX SDK + Pytorch CRNN的OCR识别

hiascend

已于 2023-09-19 11:11:03 修改

阅读量376

点赞数

分类专栏：推理开发文章标签： pytorch 人工智能深度学习

于 2023-01-06 15:36:02 首次发布

本文链接：https://blog.csdn.net/hiascend/article/details/128579990

版权

推理开发专栏收录该内容

12 篇文章 3 订阅

订阅专栏

源码下载：

https://gitee.com/open-ascend/atlas_mindxsdk_samples/blob/master/contrib/cv/ocr/image_crnn

快速运行攻略（MindX SDK环境已经部署完毕情况下）：

1、获取模型文件

（1）crnn.onnx文件

https://gitee.com/ai_samples/pytorch_models/tree/master/cv/ocr/crnn

存放到 image_crnn/data/models/crnn 目录下

2、模型文件转换

（1）image_crnn/data/models/crnn目录下执行模型转换，根据芯片类型，执行atc_310.sh 或 atc_310P3.sh

bash atc_310.sh

bash atc_310P3.sh

3、修改run_cpp.sh & run_python.sh中MX_SDK_HOME为MindX SDK安装目录

export MX_SDK_HOME=/usr/local/sdk_home/mxVision

4、执行run_cpp.sh 或者 run_python.sh

bash run_cpp.sh

bash run_python.sh

一、安装昇腾驱动

先安装昇腾驱动，昇腾驱动请参考各个产品安装手册，安装完成后npu-smi info 显示安装成功

[root@localhost ~]#
[root@localhost ~]# npu-smi info
+-------------------------------------------------------------------------------------------------+
| npu-smi 22.0.2                   Version: 22.0.2                                                |
+------------------+--------------+---------------------------------------------------------------+
| NPU    Name      | Health       | Power(W)             Temp(C)           Hugepages-Usage(page)  |
| Chip   Device    | Bus-Id       | AICore(%)            Memory-Usage(MB)                         |
+==================+==============+===============================================================+
| 1      310       | OK           | 12.8                 45                0   / 0                |
| 0      0         | 0000:05:00.0 | 0                    2621  / 8192                             |
+==================+==============+===============================================================+

二、安装MindX SDK > mxVision

（1）MindX SDK需要通过官网获取。

（2）mxVision说明手册：

https://www.hiascend.com/document/detail/zh/mind-sdk/30rc3/quickstart/visionquickstart/visionquickstart_0000.html

（3）安装MindX SDK

./Ascend-mindxsdk-mxvision_3.0.RC2_linux-aarch64.run --install --install-path=/usr/local/sdk_home

–install-path为指定安装的路径

（4）安装成功后会提示如下信息

Installing collected packages:mindx
Successfully installed mindx-3.0.RC2

（5）安装成功后在对应目录下查看，能看到mxVision

[root@localhost sdk_home]#
[root@localhost sdk_home]# pwd
/usr/local/sdk_home
[root@localhost sdk_home]# ls
mxVision mxVision-3.0.RC2
[root@localhost sdk_home]#
[root@localhost sdk_home]#

（6）MindX SDK使用中需要用到OSD功能，安装后需要执行以下命令，生成om文件

bash /usr/local/sdk_home/mxVision/operators/opencvosd/generate_osd_om.sh

执行成功后，显示如下效果

[root@localhost ~]# bash /usr/local/sdk_home/mxVision/operators/opencvosd/generate_osd_om.sh
ASCEND_HOME is set to /usr/local/Ascend by user
Set ASCEND_VERSION to the default value:ascend-toolkit/latest
ATC start working now,please wait for a moment.
ATC run success, welcome to the next use.

The model has been successfully converted to om,please get it under /usr/local/sdk_home/mxVision/operators/opencvosd.
[root@localhost ~]#

（9）安装完MindX SDK后，需要配置环境变量

.bashrc文件添加以下环境变量

# 安装mxVision时配置
. /usr/local/sdk_home/mxVision/set_env.sh

用户也可以通过修改~/.bashrc文件方式设置永久环境变量，操作如下：

a) 以运行用户在任意目录下执行vi ~/.bashrc命令，打开.bashrc文件，在文件最后一行后面添加上述内容。

b) 执行:wq!命令保存文件并退出。

c) 执行source ~/.bashrc命令使其立即生效。

三、ATC模型转换

1、把训练好的crnn.pth模型转onnx后，放在image_crnn/data/models/crnn目录下

获取路径：

https://gitee.com/ai_samples/pytorch_models/tree/master/cv/ocr/crnn

[root@localhost crnn]#
[root@localhost crnn]# ls
aipp_crnn_rgb_gray_norm.config  atc_310.sh  atc_310P3.sh  crnn.cfg  crnn-label.names  crnn.onnx
[root@localhost crnn]#

2、执行模型转换命令

（1）AIPP需要配置aipp.config文件，在ATC转换的过程中插入AIPP算子，即可与DVPP处理后的数据无缝对接，AIPP参数配置请参见《CANN 开发辅助工具指南 (推理)》中“ATC工具使用指南”。

https://support.huawei.com/enterprise/zh/doc/EDOC1100234054?idPath=23710424%7C251366513%7C22892968%7C251168373

aipp_crnn_rgb_gray_norm.config

aipp_op {
aipp_mode: static
input_format: RGB888_U8

src_image_size_w: 100
src_image_size_h: 32

csc_switch: true
rbuv_swap_switch: true

matrix_r0c0 : 76
matrix_r0c1 : 150
matrix_r0c2 : 30
matrix_r1c0 : 0
matrix_r1c1 : 0
matrix_r1c2 : 0
matrix_r2c0 : 0
matrix_r2c1 : 0
matrix_r2c2 : 0
output_bias_0 : 0
output_bias_1 : 0
output_bias_2 : 0

min_chn_0: 127.5
min_chn_1: 127.5
min_chn_2: 127.5
var_reci_chn_0: 0.0078431
var_reci_chn_1: 0.0078431
var_reci_chn_2: 0.0078431
}

（2）用户可以使用ATC的帮助命令atc --help查看参数配置。

（3）模型转换

Ascend310芯片模型转换命令如下：

atc \
    --mode=0 \
    --framework=5 \
    --model=crnn.onnx \
    --output=crnn \
    --input_format=NCHW \
    --input_shape="actual_input_1:1,1,32,100" \
    --enable_small_channel=1 \
    --log=info \
    --soc_version=Ascend310 \
    --insert_op_conf=aipp_crnn_rgb_gray_norm.config

Ascend310P3芯片模型转换命令如下：

atc \
    --mode=0 \
    --framework=5 \
    --model=crnn.onnx \
    --output=crnn \
    --input_format=NCHW \
    --input_shape="actual_input_1:1,1,32,100" \
    --enable_small_channel=1 \
    --log=info \
    --soc_version=Ascend310P3 \
    --insert_op_conf=aipp_crnn_rgb_gray_norm.config

参数说明：

–model：待转换的ONNX模型的路径文件。

–framework：5代表ONNX模型。

–output：转换后输出的om模型名称。

–input_format：输入数据的格式。

–input_shape：输入数据的shape。actual_input_1的取值根据实际使用场景确定。

–enable_small_channel=1：对于视觉模型四维数据卷积算子的特殊优化，可以提升性能，其他模型可能导致性能下降，不建议开启。

–insert_op_conf=aipp_TorchVision.config：根据实际aipp配置文件路径进行调整。AIPP插入节点，通过config文件配置算子信息，功能包括图片色域转换、裁剪、归一化，主要用于处理原图输入数据，常与DVPP配合使用。

（4）模型转换后，会在目录下生成crnn.om

[root@localhost crnn]#
[root@localhost crnn]# ls
aipp_crnn_rgb_gray_norm.config  atc_310.sh  atc_310P3.sh  crnn.cfg  crnn-label.names  crnn.om  crnn.onnx
[root@localhost crnn]#

四、使用image_crnn

1、修改run_cpp.sh & run_python.sh中MX_SDK_HOME为MindX SDK安装目录

export MX_SDK_HOME=/usr/local/sdk_home/mxVision

2、执行run_cpp.sh 或者 run_python.sh

bash run_cpp.sh

bash run_python.sh

3、OCR识别出的文字信息与test.jpg一致

CRNN IS GOOD

五、image_crnn详解

1、技术流程图

在这里插入图片描述

视频解码：调用OPENCV解码能力，转换为 YUV 格式图像数据。

图像缩放：调用OPENCV，将图像缩放到一定尺寸大小。

模型推理：CRNN模型针对文字进行OCR识别。

模型后处理：针对推理结果进行后处理文字转换。

数据序列化：将stream结果组装成json字符串输出。

2、pipeline详解

{
  "classification": {
    "stream_config": {  ##设置业务流在哪个芯片上处理
      "deviceId": "0"
    },
    "mxpi_imagedecoder0": {  ##图像解码（OpenCV方式）
      "props": {
        "handleMethod": "opencv"
      },
      "factory": "mxpi_imagedecoder",
      "next": "mxpi_imageresize0"
    },
    "mxpi_imageresize0": {  ##图像缩放（OpenCV方式）
      "props": {
        "handleMethod": "opencv",
        "resizeHeight": "32",
        "resizeWidth": "100",
        "resizeType": "Resizer_Stretch"
      },
      "factory": "mxpi_imageresize",
      "next": "mxpi_tensorinfer0"
    },
    "mxpi_tensorinfer0": {  ##模型推理
      "props": {
        "dataSource": "mxpi_imageresize0",
        "modelPath": "data/models/crnn/crnn.om",  ##模型路径
        "waitingTime": "2000",
        "outputDeviceId": "-1"
      },
      "factory": "mxpi_tensorinfer",
      "next": "mxpi_classpostprocessor0"
    },
    "mxpi_classpostprocessor0": {  ##模型后处理
      "props": {
        "dataSource": "mxpi_tensorinfer0",
        "postProcessConfigPath": "data/models/crnn/crnn.cfg",
        "labelPath": "data/models/crnn/crnn-label.names",
        "postProcessLibPath": "libcrnnpostprocess.so"
      },
      "factory": "mxpi_textgenerationpostprocessor",
      "next": "mxpi_dataserialize0"
    },
    "mxpi_dataserialize0": {  ##数据序列化
      "props": {
        "outputDataKeys": "mxpi_classpostprocessor0"
      },
      "factory": "mxpi_dataserialize",
      "next": "appsink0"
    },
    "appsrc0": {
      "props": {
        "blocksize": "409600"
      },
      "factory": "appsrc",
      "next": "mxpi_imagedecoder0"
    },
    "appsink0": {  ##输出推理结果
      "props": {
        "blocksize": "4096000"
      },
      "factory": "appsink"
    }
  }
}

3、C++源码详解

int main(int argc, char* argv[])
{
    // 读取pipeline配置文件
    std::string pipelineConfigPath = "data/pipeline/Sample.pipeline";
    std::string pipelineConfig = ReadPipelineConfig(pipelineConfigPath);
    if (pipelineConfig == "") {
        LogError << "Read pipeline failed.";
        return APP_ERR_COMM_INIT_FAIL;
    }
    // 初始化 Stream manager 资源
    MxStream::MxStreamManager mxStreamManager;
    APP_ERROR ret = mxStreamManager.InitManager();
    if (ret != APP_ERR_OK) {
        LogError << GetError(ret) << "Failed to init Stream manager.";
        return ret;
    }
    // 根据指定的pipeline配置创建Stream
    ret = mxStreamManager.CreateMultipleStreams(pipelineConfig);
    if (ret != APP_ERR_OK) {
        LogError << GetError(ret) << "Failed to create Stream.";
        return ret;
    }
    // 读取测试图片
    MxStream::MxstDataInput dataBuffer;
    ret = ReadFile("data/test.jpg", dataBuffer);
    if (ret != APP_ERR_OK) {
        LogError << GetError(ret) << "Failed to read image file.";
        return ret;
    }
    std::string streamName = "classification";
    int inPluginId = 0;
    // 发送测试图片到Stream进行推理
    ret = mxStreamManager.SendData(streamName, inPluginId, dataBuffer);
    if (ret != APP_ERR_OK) {
        LogError << GetError(ret) << "Failed to send data to stream.";
        delete dataBuffer.dataPtr;
        dataBuffer.dataPtr = nullptr;
        return ret;
    }
    // 获取推理结果
    MxStream::MxstDataOutput* output = mxStreamManager.GetResult(streamName, inPluginId);
    if (output == nullptr) {
        LogError << "Failed to get pipeline output.";
        delete dataBuffer.dataPtr;
        dataBuffer.dataPtr = nullptr;
        return ret;
    }
    // 打印推理结果
    std::string result = std::string((char *)output->dataPtr, output->dataSize);
    LogInfo << "Results:" << result;

    // 销毁Stream
    mxStreamManager.DestroyAllStreams();
    delete dataBuffer.dataPtr;
    dataBuffer.dataPtr = nullptr;

    delete output;
    return 0;
}

4、Python源码详解

if __name__ == '__main__':
    # 初始化 Stream manager 资源
    streamManagerApi = StreamManagerApi()
    ret = streamManagerApi.InitManager()
    if ret != 0:
        print("Failed to init Stream manager, ret=%s" % str(ret))
        exit()

    # 根据指定的pipeline配置创建Stream
    with open("data/pipeline/Sample.pipeline", 'rb') as f:
        pipelineStr = f.read()
    ret = streamManagerApi.CreateMultipleStreams(pipelineStr)
    if ret != 0:
        print("Failed to create Stream, ret=%s" % str(ret))
        exit()

    # 读取测试图片
    dataInput = MxDataInput()
    with open("data/test.jpg", 'rb') as f:
        dataInput.data = f.read()

    # 发送测试图片到Stream进行推理
    streamName = b'classification'
    inPluginId = 0
    uniqueId = streamManagerApi.SendDataWithUniqueId(streamName, inPluginId, dataInput)
    if uniqueId < 0:
        print("Failed to send data to stream.")
        exit()

    # 获取推理结果
    inferResult = streamManagerApi.GetResultWithUniqueId(streamName, uniqueId, 3000)
    if inferResult.errorCode != 0:
        print("GetResultWithUniqueId error. errorCode=%d, errorMsg=%s" % (
            inferResult.errorCode, inferResult.data.decode()))
        exit()

    # 打印推理结果
    print(inferResult.data.decode())

    # 销毁Stream
    streamManagerApi.DestroyAllStreams()