关于油印印刷字体OCR识别的代码流程与解析（数据集联系博主索要）

本文链接：https://blog.csdn.net/chenai886/article/details/144600639

本文将详细解析基于OCR技术对酸奶包装印刷字体进行识别的代码流程。本代码实现了从图像预处理、区域提取到字符识别的完整步骤。其主要应用场景是对工业生产中的油印或类似字体的批量识别需求。

代码流程解析与注释

以下为代码的详细解析与注释：

1. 导入OCR字体库

read_ocr_class_mlp ('Industrial_0-9A-Z_NoRej.omc', OCRHandle)

作用：加载OCR多层感知机(Multi-Layer Perceptron, MLP)字符分类器模型。
参数解释：
- 'Industrial_0-9A-Z_NoRej.omc'：指定的OCR模型文件，适用于识别工业印刷字体中的数字和大写字母。
- OCRHandle：用于存储OCR模型的句柄，便于后续识别调用。

2. 初始化显示窗口

dev_update_off ()
dev_set_draw ('margin')
dev_set_line_width (2)
dev_close_window ()
dev_open_window (0, 0, 512, 355, 'black', WindowHandle)
set_display_font (WindowHandle, 25, 'mono', 'false', 'false')

作用：初始化程序运行的显示窗口及绘制参数。
- dev_update_off()：关闭窗口更新，提高处理效率。
- dev_set_draw('margin')：设置绘制方式为矩形边框。
- dev_set_line_width(2)：设置绘制线宽为2。
- dev_open_window：打开显示窗口，设置窗口大小和背景颜色。
- set_display_font：设置显示文字的字体和大小。

3. 读取待识别图像文件

list_files ('', ['files','follow_links'], ImageFiles)
tuple_regexp_select (ImageFiles, ['\\.(tif|tiff|gif|bmp|jpg|jpeg|jp2|png|pcx|pgm|ppm|pbm|xwd|ima|hobj)$','ignore_case'], ImageFiles)

作用：从指定文件夹中读取符合格式的图像文件。
- list_files('', ['files','follow_links'], ImageFiles)：列出当前目录中的所有文件。
- tuple_regexp_select：筛选出符合正则表达式的图像文件（如.jpg、.png等）。

4. 图像预处理

read_image (Image, ImageFiles[Index])
count_seconds (T1)
get_image_size (Image, Width, Height)
rgb1_to_gray (Image, GrayImage)

作用：读取图像并进行预处理。
- read_image：读取图像。
- count_seconds (T1)：记录图像读取开始时间，便于后续计算耗时。
- get_image_size：获取图像尺寸。
- rgb1_to_gray：将彩图转换为灰度图，以减少数据维度和计算复杂度。

5. 图像矫正

text_line_orientation (GrayImage, GrayImage, 86, rad(0), rad(30), OrientationAngle)
rotate_image (GrayImage, ImageRotate, -OrientationAngle / rad(180) * 180, 'constant')

作用：对倾斜图像进行矫正。
- text_line_orientation：检测文本行的倾斜角度（OrientationAngle），设定范围为0°到30°。
- rotate_image：基于检测到的倾斜角度对图像进行旋转矫正。

6. 提取感兴趣区域（ROI）

gen_rectangle1 (Rectangle, 520, 422, 1080, 2082)
reduce_domain (GrayImage, Rectangle, ImageReduced)
crop_domain (ImageReduced, ImagePart)

作用：提取图像中的感兴趣区域，去除无关部分。
- gen_rectangle1：生成矩形ROI，定义字符区域的上下左右坐标。
- reduce_domain：将灰度图像裁剪到指定的矩形ROI范围。
- crop_domain：实际提取矩形区域的图像部分。

7. 二值化与去噪

threshold (ImagePart, Region, 0, 150)
opening_circle (Region, RegionOpening, 3.5)

作用：通过二值化处理提取字符区域，并去除噪点。
- threshold：基于灰度阈值（0到150）对图像进行二值化，提取出字符区域。
- opening_circle：形态学操作，去除字符区域的细小噪点，保留字符的完整性。

8. 提取并切分字符区域

connection (RegionOpening, ConnectedRegions)
select_shape (ConnectedRegions, SelectedRegions, 'height', 'and', 150, 500)
union1 (SelectedRegions, RegionUnion)
region_features (RegionUnion, 'width', Value)
partition_dynamic (RegionUnion, Partitioned, Value/10, 20)

作用：提取字符区域并按字符宽度切分。
- connection：将二值化后的字符区域连通。
- select_shape：根据区域高度筛选字符（150到500之间）。
- union1：将所有字符区域合并为一个整体。
- region_features：计算字符区域的宽度。
- partition_dynamic：动态分割字符区域。

9. 排序并识别字符

sort_region (Partitioned, SortedRegions, 'character', 'true', 'row')
do_ocr_multi_class_mlp (SortedRegions, ImagePart, OCRHandle, Class, Confidence)

作用：对字符区域排序后进行OCR识别。
- sort_region：根据字符区域的几何位置进行排序（行优先）。
- do_ocr_multi_class_mlp：使用OCR模型对每个字符区域进行识别，返回识别结果（Class）和置信度（Confidence）。

10. 结果显示

smallest_rectangle1 (Rectangle, RowT, ColL, RowB, ColR)
count_seconds (T2)
dev_clear_window ()
dev_set_part (0, 0, Width-1, Height-1)
dev_display (Image)
dev_display (Rectangle)
dev_disp_text (sum(Class)+'耗时:'+1000*(T2-T1), 'image',RowB-200, ColL, 'black','box', false)

作用：将识别结果及处理耗时显示在窗口中。
- smallest_rectangle1：获取识别字符的最小外接矩形。
- count_seconds (T2)：记录处理结束时间。
- dev_disp_text：显示识别结果及耗时。

11. 清理资源

clear_ocr_class_mlp (OCRHandle)

作用：释放OCR模型资源，避免内存泄漏。

完整代码解读

* 导入OCR多层感知机(Multi-Layer Perceptron)字符识别模型
* 这个模型专门用于识别工业字体，包括数字和大写字母
read_ocr_class_mlp ('Industrial_0-9A-Z_NoRej.omc', OCRHandle)

* 设置窗口显示参数
* 关闭窗口自动更新以提升运行效率
dev_update_off ()

* 设置绘制模式为边框
dev_set_draw ('margin')

* 设置绘制线宽为2
dev_set_line_width (2)

* 关闭已存在的窗口
dev_close_window ()

* 打开一个新的显示窗口，设置背景颜色为黑色，大小为512x355
dev_open_window (0, 0, 512, 355, 'black', WindowHandle)

* 设置显示文字的字体为单行字体，大小为25，字体非粗体、非斜体
set_display_font (WindowHandle, 25, 'mono', 'false', 'false')

* 列出当前目录中的所有文件，并存储在变量ImageFiles中
list_files ('', ['files','follow_links'], ImageFiles)

* 使用正则表达式筛选出图像文件，例如tif、jpg、png等格式
tuple_regexp_select (ImageFiles, ['\\.(tif|tiff|gif|bmp|jpg|jpeg|jp2|png|pcx|pgm|ppm|pbm|xwd|ima|hobj)$','ignore_case'], ImageFiles)

* 遍历图像文件列表
for Index := 0 to |ImageFiles| - 1 by 1

    * 读取当前图像文件
    read_image (Image, ImageFiles[Index])

    * 获取当前时间戳，用于计算耗时
    count_seconds (T1)

    * 获取图像的宽度和高度
    get_image_size (Image, Width, Height)

    * 将彩图转换为灰度图，简化图像数据
    rgb1_to_gray (Image, GrayImage)

    * 检测文本行的倾斜角度
    * 允许的角度范围是0到30度
    text_line_orientation (GrayImage, GrayImage, 86, rad(0),rad(30), OrientationAngle)

    * 根据检测到的倾斜角度对图像进行旋转矫正
    rotate_image (GrayImage, ImageRotate, -OrientationAngle / rad(180) * 180, 'constant')

    * 生成一个矩形ROI，用于指定感兴趣区域
    * 这里假设感兴趣区域的坐标为(520, 422)到(1080, 2082)
    gen_rectangle1 (Rectangle, 520, 422, 1080, 2082)

    * 在图像中裁剪出感兴趣区域
    reduce_domain (GrayImage, Rectangle, ImageReduced)

    * 从裁剪后的图像中提取实际的内容区域
    crop_domain (ImageReduced, ImagePart)

    * 对图像进行二值化处理，提取出字符区域
    * 这里设定灰度阈值范围为0到150
    threshold (ImagePart, Region, 0, 150)

    * 使用形态学操作去除细小的噪点
    * opening_circle可以保留字符区域的完整性
    opening_circle (Region, RegionOpening, 3.5)

    * 将所有连通区域分开，用于进一步的字符提取
    connection (RegionOpening, ConnectedRegions)

    * 根据高度筛选出可能是字符的区域
    * 筛选范围是高度150到500之间
    select_shape (ConnectedRegions, SelectedRegions, 'height', 'and', 150, 500)

    * 合并所有的字符区域为一个整体
    union1 (SelectedRegions, RegionUnion)

    * 计算合并后字符区域的宽度
    region_features (RegionUnion, 'width', Value)

    * 如果宽度为0，说明未检测到字符，显示提示信息
    if(|Value| = 0)
        smallest_rectangle1 (Rectangle, RowT, ColL, RowB, ColR)
        dev_clear_window ()
        dev_set_part (0, 0, Width-1, Height-1)
        dev_display (Image)
        dev_display (Rectangle)
        dev_disp_text ('未发现字符','image',RowT, ColL, 'red', 'box', false)
        continue
    endif

    * 对字符区域进行动态分割，按宽度进行切分
    partition_dynamic (RegionUnion, Partitioned,Value/10, 20)

    * 对分割后的字符区域按照行进行排序
    sort_region (Partitioned, SortedRegions, 'character', 'true', 'row')

    * 使用OCR模型对每个字符区域进行识别
    * 返回识别的字符类别（Class）和置信度（Confidence）
    do_ocr_multi_class_mlp (SortedRegions, ImagePart, OCRHandle, Class, Confidence)

    * 获取识别结果的最小外接矩形，用于显示
    smallest_rectangle1 (Rectangle, RowT, ColL, RowB, ColR)

    * 获取当前时间戳，用于计算处理耗时
    count_seconds (T2)

    * 清空显示窗口
    dev_clear_window ()

    * 设置显示窗口的显示范围
    dev_set_part (0, 0, Width-1, Height-1)

    * 显示原始图像
    dev_display (Image)

    * 显示最小外接矩形
    dev_display (Rectangle)

    * 在图像上显示识别结果和处理耗时
    * 结果格式为字符加耗时
    dev_disp_text (sum(Class)+'耗时:'+1000*(T2-T1), 'image',RowB-200, ColL, 'black','box', false)

    * 暂停程序，便于用户查看结果
    stop ()

endfor

* 释放OCR模型资源，避免内存泄漏
clear_ocr_class_mlp (OCRHandle)