Opencv|Document Scanning & Optical Character Recognition

Opencv|Document Scanning & Optical Character Recognition(OCR)

Step 1. Import some packages and a pyfile named resize for the project.

import cv2
import numpy as np
import resize

Step 2. Import and preliminary processing of the image.
Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.

image = cv2.imread('test.jpg')
image = cv2.resize(image, (1500, 1125))

orig = image.copy()
# Create a copy of the original image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
# Grayscale the image, and then perform line Gaussian blur to reduce noise

edged = cv2.Canny(blurred, 0, 50)
# Use canny algorithm for edge detection
orig_edged = edged.copy()
# Create a copy processed by the canny algorithm.

Step 3. Get approximate contours of the image.

Find the outline in the edge image, keep only the largest one, and initialize the screen outline.

contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
# findContours() for finding contours from binary images

contours = sorted(contours, key=cv2.contourArea, reverse=True)
# Use the sorted function in python to return the results of contours 

# Get approximate contours:
for c in contours:
    p = cv2.arcLength(c, True)  
    # Calculate the circumference of the closed contour or the length of the curve
    approx = cv2.approxPolyDP(c, 0.02 * p, True) 
    # Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.
    if len(approx) == 4:  
        target = approx  
        break  
    #Find the rectangle profile we are looking for.

Step 4. Create a function to rectify and resize the target image.

ps: Function rectify is stored in resize.py.

def rectify(h):
    h = h.reshape((4, 2))  
    hnew = np.zeros((4, 2), dtype=np.float32)  
    add = h.sum(1)
    hnew[0] = h[np.argmin(add)]  
    # return the larger number
    hnew[2] = h[np.argmax(add)]
    diff = np.diff(h, axis=1)  
    # Calculate the N-dimensional discrete difference along the specified axis.
    hnew[1] = h[np.argmin(diff)]
    hnew[3] = h[np.argmax(diff)]
    # Determine the four vertices of the detected document.
    return hnew
    
 approx = resize.rectify(target)

Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.

pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])

M = cv2.getPerspectiveTransform(approx, pts2)
#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.
#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the target image.)

dst = cv2.warpPerspective(orig, M, (400,600))
# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.

Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.
We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing are not shown in the article. If you are interested in it, just try it by yourself.

dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)
# Grayscale the image after perspective transformation
cv2.drawContours(image, [target], -1, (0, 255, 0), 2)
# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.
ret, th1 = cv2.threshold(dst, 127, 255, cv2.THRESH_BINARY)
#Threshold
ret2, th2 = cv2.threshold(dst, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Otsu's binarization
th3 = cv2.adaptiveThreshold(dst, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2)
#Adaptive threshold of mean
th4 = cv2.adaptiveThreshold(dst, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
#Adaptive threshold of gaussian

The origianal image is :

result:

Step 7. Do the Optical Character Recognition.

ps:
1.For windows users, before we install the pytesseract package, we need to install the tesseract-ocr-setup-4.00.00dev.exe program on windows system.
2. For Mac users, we need to install Homebrew on Mac system and install pytesseract by Homebrew.

Code:

from PIL import Image
import pytesseract
import cv2
import os

preprocess='thresh'
image= cv2.imread('scan.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

if preprocess =='thresh':
    gray = cv2.threshold(gray,0,255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)[1]
if preprocess=='blur':
    gray=cv2.medianBlur(gray,3)
#Choose an appropriate method to process the image.
filename="{}.jpg".format(os.getpid())
cv2.imwrite(filename,gray)
text = pytesseract.image_to_string(Image.open(filename))
#Use OCR to recognize the text information on the image.
print(text)
# Print the information.
os.remove(filename)

cv2.imshow('image',image)
cv2.imshow('output',gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

It can only recognize English and numbers, however, there is Japanese in the scanned pictures, as a result of which, the effect is not good.

The scanned image:

result:
在这里插入图片描述
Therefore, we chose another scanned image, mostly in English and numbers, for text recognition to see the real effect of OCR.
The scanned image:

The results are as follows.

在这里插入图片描述
The result is much better but there still exists some small mistakes which can not be organized correctly.
Thank you for reading!

--credit by dora 2020.4.13


Resources:
https://www.bilibili.com/video/BV1X4411Z7qV?p=10
https://blog.csdn.net/showgea/article/details/82656515
https://zhuanlan.zhihu.com/p/93092044
https://blog.csdn.net/gaoyu1253401563/article/details/84995349
https://zhuanlan.zhihu.com/p/59805070

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值