openvino系列 16. OpenVINO 手写字体识别 OCR

openvino系列 16. OpenVINO 手写字体识别 OCR

此案例中,我们对手写中文(简体)和日语进行OCR识别。该模型一次只能处理一行符号。

环境描述:

  • 本案例运行环境:Win10,10代i5笔记本
  • IDE:VSCode
  • openvino版本:2022.1
  • 代码链接11-OCR


1 关于手写体识别模型使用

这个案例中,我们对手写中文(简体)和日语进行OCR识别。该模型一次只能处理一行符号。

本笔记本使用的模型是handwritten-japanese-recognitionhandwritten-simplified-chinese。将模型输出解码为可读文本 kondate_nakayosiscut_ept 字符列表被使用。这两种模型都可以在 Open Model Zoo 上找到。

1.1 handwritten-japanese-recognition

这里我们不对模型的具体算法做解释,只技术其输入输出。

输入:[1,1,96,2000],对应 [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。

注意:源图片在保持宽高比的情况下,调整到特定高度(如96),调整后的宽度不大于2000,然后右下角用边缘值填充到2000。

输出:[186,1,4442],对应 [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in Kondate and Nakayosi。

1.2 handwritten-simplified-chinese

输入:[1,1,96,2000],对应 [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。

注意:源图片在保持宽高比的情况下,调整到特定高度(如96),调整后的宽度不大于2000,然后右下角用边缘值填充到2000。

输出:[186,1,4059],对应 [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in SCUT-EPT。

2 手写体识别模型代码

2.1 选择手写字体模型,并加载

from collections import namedtuple
from itertools import groupby
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.runtime import Core

# Directories where data will be placed
model_folder = "model"
data_folder = "data"
charlist_folder = f"{data_folder}/charlists"
# Precision used by model
precision = "FP16"

# To group files, you have to define the collection. In this case, you can use `namedtuple`.
Language = namedtuple(
    typename="Language", field_names=["model_name", "charlist_name", "demo_image_name"]
)
chinese_files = Language(
    model_name="handwritten-simplified-chinese-recognition-0001",
    charlist_name="chinese_charlist.txt",
    demo_image_name="handwritten_chinese_test.jpg",
)
japanese_files = Language(
    model_name="handwritten-japanese-recognition-0001",
    charlist_name="japanese_charlist.txt",
    demo_image_name="handwritten_japanese_test.png",
)

print("1 - Choose a language model to download, either Chinese or Japanese.")
# Select language by using either language='chinese' or language='japanese'
language = "chinese"
languages = {"chinese": chinese_files, "japanese": japanese_files}
selected_language = languages.get(language)

# Download the model
path_to_model_weights = Path(f'{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.bin')
if not path_to_model_weights.is_file():
    download_command = f'omz_downloader --name {selected_language.model_name} --output_dir {model_folder} --precision {precision}'
    print(download_command)
    ! $download_command
else:
    print("model has been downloaded.")

print("2 - Load the model, and print its input and output")
ie = Core()
path_to_model = path_to_model_weights.with_suffix(".xml")
model = ie.read_model(model=path_to_model)
# Select Device Name
compiled_model = ie.compile_model(model=model, device_name="CPU")
recognition_output_layer = compiled_model.output(0)
recognition_input_layer = compiled_model.input(0)
print("- model input shape: {}".format(recognition_input_layer))
print("- model output shape: {}".format(recognition_output_layer))

Terminal打印:

1 - Choose a language model to download, either Chinese or Japanese.
model has been downloaded.
2 - Load the model, and print its input and output
- model input shape: <ConstOutput: names[actual_input] shape{1,1,96,2000} type: f32>
- model output shape: <ConstOutput: names[output] shape{186,1,4059} type: f32>

2.2 加载图像,并调整其尺寸以符合模型输入尺寸

下一步是加载图像。该模型需要单通道图像作为输入,这就是我们以灰度读取图像的原因。加载输入图像后,下一步是获取用于计算比例的信息。 这描述了所需输入层高度与当前图像高度之间的比率。在下面的单元格中,图像将被调整大小和填充以保持字母成比例并符合输入形状。

print("3 - load image to test.")
# Read file name of demo file based on the selected model
file_name = selected_language.demo_image_name
# Text detection models expects an image in grayscale format
# IMPORTANT!!! This model allows to read only one line at time
# Read image
image = cv2.imread(filename=f"{data_folder}/{file_name}", flags=cv2.IMREAD_GRAYSCALE)
# Fetch shape
image_height, _ = image.shape
print("- Original image shape: {}".format(image.shape))
print("- Image scale needs to be reshaped into: {}".format(recognition_input_layer.shape))
# B,C,H,W = batch size, number of channels, height, width
_, _, H, W = recognition_input_layer.shape
print("- We need to first resize image then add paddings in order to align with model input size.")
# Calculate scale ratio between input shape height and image height to resize image
scale_ratio = H / image_height
# Resize image to expected input sizes
resized_image = cv2.resize(
    image, None, fx=scale_ratio, fy=scale_ratio, interpolation=cv2.INTER_AREA
)
# Pad image to match input size, without changing aspect ratio
resized_image = np.pad(
    resized_image, ((0, 0), (0, W - resized_image.shape[1])), mode="edge"
)
# Reshape to network the input shape
input_image = resized_image[None, None, :, :]

## Visualise Input Image
plt.figure()
plt.axis("off")
plt.imshow(image, cmap="gray", vmin=0, vmax=255);
plt.figure(figsize=(20, 1))
plt.axis("off")
plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255);

Terminal 打印:

3 - load image to test.
- Original image shape: (115, 1250)
- Image scale needs to be reshaped into: {1, 1, 96, 2000}
- We need to first resize image then add paddings in order to align with model input size.

输入的原图如下:

在这里插入图片描述

尺寸调整后如下:

在这里插入图片描述

2.3 准备Charlist

现在模型已加载,图像已准备就绪。 下一步,我们下载的字符列表。在我们使用它之前,必须在字符列表的开头添加一个空白符号。

print("4 - Prepare Charlist, which is a ground truth list which we could match with our inference results.")
# Get dictionary to encode output, based on model documentation
used_charlist = selected_language.charlist_name
# With both models, there should be blank symbol added at index 0 of each charlist
blank_char = "~"
with open(f"{charlist_folder}/{used_charlist}", "r", encoding="utf-8") as charlist:
    letters = blank_char + "".join(line.strip() for line in charlist)

2.4 模型推理输出结果

现在运行推理。 compiled_model() 采用与模型输入顺序相同的输入列表。 然后我们可以从输出张量中获取输出。

模型格式的输出为 W x B x L,其中:

  • W - 输出序列长度
  • B - 批量大小
  • L - Kondate 和 Nakayosi 中支持的符号的置信度分布。

要获得更易于阅读的格式,请选择概率最高的符号。 由于 CTC 解码的限制,我们将删除并发符号,然后删除空白。

最后一步是从 charlist 中的相应索引中获取符号。

# Run inference on the model
predictions = compiled_model([input_image])[recognition_output_layer]
print("5 - Model Inference. Prediction results shape: {}".format(predictions.shape))
# Remove batch dimension
predictions = np.squeeze(predictions)
print("- We first squeeze the inference result into shape: {}".format(predictions.shape))
# Run argmax to pick the symbols with the highest probability
predictions_indexes = np.argmax(predictions, axis=1)
# Use groupby to remove concurrent letters, as required by CTC greedy decoding
output_text_indexes = list(groupby(predictions_indexes))
# Remove grouper objects
output_text_indexes, _ = np.transpose(output_text_indexes, (1, 0))
print("- We find out the highest probability character, and remove concurrent letters and grouper objects into shape: {}".format(output_text_indexes.shape))
# Remove blank symbols
output_text_indexes = output_text_indexes[output_text_indexes != 0]
print("- We remove blank symbolsa into shape: {}".format(output_text_indexes.shape))
# Assign letters to indexes from output array
output_text = [letters[letter_index] for letter_index in output_text_indexes]
print("- Final results: {}".format(output_text))
# Print Output
plt.figure(figsize=(20, 1))
plt.axis("off")
plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)

Terminal 打印:

5 - Model Inference. Prediction results shape: (186, 1, 4059)
- We first squeeze the inference result into shape: (186, 4059)
- We find out the highest probability character, and remove concurrent letters and grouper objects into shape: (32,)
- We remove blank symbolsa into shape: (20,)
- Final results: ['人', '有', '悲', '欢', '离', '合', ',', '月', '有', '阴', '睛', '圆', '缺', ',', '此', '事', '古', '难', '全', '。']
  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

破浪会有时

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值