ocr图像识别引擎_CycleGAN作为OCR图像的去噪引擎

最新推荐文章于 2024-08-02 11:43:03 发布

weixin_26632369

最新推荐文章于 2024-08-02 11:43:03 发布

阅读量1.3k

点赞数

文章标签：计算机视觉

原文链接：https://medium.com/towards-artificial-intelligence/cyclegan-as-a-denoising-engine-for-ocr-images-8d2a4988f769

版权

本文探讨了如何利用CycleGAN作为OCR（光学字符识别）图像的去噪引擎，提高图像识别的准确性。

摘要由CSDN通过智能技术生成

ocr图像识别引擎

深度学习 (Deep Learning)

With the rapid growth of digitization, the need for digitized content is of crucial importance for data processing, storage, and transmission. Optical Character Recognition (“OCR”) is the process of converting a typed, handwritten, or printed text into a digitized format that is editable, searchable, and interpretable while obviating the need for entry of data into systems.

瓦特 i个数字化的快速增长，需要对数字化内容进行数据处理，存储和传输至关重要。光学字符识别(OCR)是将打字，手写或印刷的文本转换为可编辑，可搜索和可解释的数字格式的过程，同时无需将数据输入系统。

Most often than not, scanned documents contain noise which prevents the OCR from recognizing the full content of the text. The scanning process often results in the introduction of noise such as watermarking, background noise, blurriness due to camera motion or shake, faded text, wrinkles, or coffee stains. These noises pose many readability challenges to current text recognition algorithms which significantly degrade their performance.

扫描的文档通常包含噪音，这会阻止OCR识别文本的全部内容。扫描过程通常会引入噪声，例如水印，背景噪声，由于照相机运动或震动引起的模糊，褪色的文本，皱纹或咖啡渍。这些噪声给当前的文本识别算法带来了许多可读性挑战，这大大降低了它们的性能。

Image for post — Fig 2. Scanned document converted into a text document using OCR

基本的OCR预处理方法 (Basic OCR Pre-processing Methods)

Binarization is the conversion of a colored image into an image which consists of only black and white pixels by fixing a threshold.
二值化是通过固定阈值将彩色图像转换为仅包含黑白像素的图像。
Skew correction generally involves skew angle determination and correction of the document image based on the skew angle.
偏斜校正通常包括偏斜角确定和基于偏斜角的文档图像校正。
Noise removal helps to smoothen the image by removing small dots or patches which have high intensity than the rest of the image.
去除噪声有助于消除强度比图像其余部分高的小点或小块，从而使图像平滑。
Thinning and skeletonization ensure the uniformity of the stroke width for handwritten text as different writers have a different style of writing.
细化和骨架化可确保手写文本的笔划宽度均匀，因为不同的作者具有不同的书写风格。

CycleGAN作为高级OCR预处理方法 (CycleGAN as an Advanced OCR Pre-processing Method)

Generative Adversarial Networks (“GANs”) are a deep learning-based generative model. The GAN model architecture involves two sub-models: a generator model for generating new examples, and a discriminator model for classifying whether generated examples are real, from the domain, or fake, generated by the generator model.

生成对抗网络 (GAN)是基于深度学习的生成模型。 GAN模型体系结构涉及两个子模型：一个用于生成新示例的生成器模型，以及一个用于对生成器示例是真实的，来自领域的还是伪造的进行分类的鉴别器模型。

CycleGAN was selected for implementation using TensorFlow, as an advanced OCR pre-processing method. An advantage of CycleGAN is that it does not require paired training data. Generally, paired data are data sets where every data point in one independent sample would be paired uniquely to a data point in another independent sample.

选择使用TensorFlow作为先进的OCR预处理方法来实施CycleGAN。 CycleGAN的一个优点是它不需要成对的训练数据 。通常，配对数据是一个独立样本中的每个数据点将与另一个独立样本中的数据点唯一配对的数据集。

While input and output variables are still required, they do not need to directly correspond to each other. Since paired data is hard to find in most domains, the unsupervised training capabilities of CycleGAN are indeed very useful.

尽管仍然需要输入和输出变量，但它们不需要彼此直接对应。由于在大多数领域都很难找到配对的数据，因此CycleGAN的无监督训练功能确实非常有用。

In the absence of paired images for training, CycleGAN is able to learn a mapping between the distributions of the noisy images to the denoised images using unpaired data, to achieve image-to-image translation for cleaning the noisy documents.

在没有用于训练的成对图像的情况下，CycleGAN能够使用未成对的数据来学习嘈杂图像到去噪图像之间的映射关系，从而实现图像到图像的转换以清理有噪声的文档。

Image-to-image translation is the process of transforming an image from one domain (ie. noisy document image), to another (ie. clean document image). Other features of the image like text should stay recognizably the same, instead of features not directly related to either domain, such as the background.

图像到图像的转换是将图像从一个域(即嘈杂的文档图像)转换为另一域(即干净的文档图像)的过程。图像的其他特征(如文本)应保持可识别的相同，而不是与任何领域都没有直接关系的特征(例如背景)。

CycleGAN体系结构 (CycleGAN Architecture)

The architecture of the CycleGAN comprises of two pairs of generators and discriminators. Each generator has a corresponding discriminator, which attempts to evaluate its synthesized images from the real ones. As with any GANs, the generators and discriminators learn adversarially. Each generator attempts to “fool” the corresponding discriminator, while discriminators learn to not get “fooled”.

CycleGAN的体系结构由两对生成器 和鉴别器组成 。每个生成器都有一个对应的鉴别器，该鉴别器试图从真实图像中评估其合成图像。与任何GAN一样， 生成器和鉴别器在对抗中学习 。每个生成器都试图“欺骗”相应的鉴别器，而鉴别器则学会了不被“欺骗”。

In order for the generator to preserve the text of the dirty documents, the model computes the cycle consistency loss, which evaluates how much an image that was translated from and back to its domain, resembles its original version.

为了使生成器保留脏文档的文本，该模型计算循环一致性损失 ，该损失评估了在其域之间来回转换的图像与其原始版本相似的程度。