java ocr图片识别去燥,如何从图像中分割噪声和文本以进行OCR的预处理

最新推荐文章于 2022-11-26 14:52:02 发布

天才娜娜ln

最新推荐文章于 2022-11-26 14:52:02 发布

阅读量421

点赞数

文章标签： java ocr图片识别去燥

I am applying OCR against subtitle in TV footage. (I am using Tesseact 3.x w/ C++) I am trying to split text and background part as a preprocessing of OCR.

Here's the original image:

And, preprocessed image:

The OCR result is: Sicemn clone

As the above preprocessed image shown, there're some "fog" remained around the letter which prevents OCR module to do their job properly.

Is there any way to recognize those "fog" programatically to remove, or do some image processing to remove/reduce it from the preprocessed image?

Since preprocessed logic is heavily optimized to handle different images, I rather want to find a way to "clean" the preprocessed image, than modifying preprocessed logic (since optimizing to this pics can affecting to other pics)

Any suggestion is very welcome.

Update

Apparently, sixela's answer is great, and will work with most of the case.

The case it does not work is background also include similar color of text

Example of not working case:

Example of result:

Seemingly, Gaussian filter seems to cause a problem in this types of footage.

This implies, different footage may requires different approach.

解决方案

I managed to have a clearer (not perfect) image by using morphological operations and thresholding.

Here is how:

I started by converting the original image in greyscale

Applied a gaussian Blur (9x9 kernel) to denoise the greyscale image

Top Hat Morphological operation (3x3 kernel)to get the white text

Otsu thresholding method

dilation

Inverted binary threshold to get the white text in black

I finally obtained the following image

Which gives, as OCR results, this text: "Since vou don'k"

PS: This result can of course be improved by tweaking the parameters (kernel size for example) but i hope it can guide you. I used OpenCv in Python to quickly try out those methods.

import cv2

image = cv2.imread('./inputImg.png', 0)

imgBlur = cv2.GaussianBlur(image, (9, 9), 0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))

imgTH = cv2.morphologyEx(imgBlur, cv2.MORPH_TOPHAT, kernel)

_, imgBin = cv2.threshold(imgTH, 0, 250, cv2.THRESH_OTSU)

imgdil = cv2.dilate(imgBin, kernel)

_, imgBin_Inv = cv2.threshold(imgdil, 0, 250, cv2.THRESH_BINARY_INV)

cv2.imshow('original', image)

cv2.imshow('bin', imgBin)

cv2.imshow('dil', imgdil)

cv2.imshow('inv', imgBin_Inv)