项目细节:
首先载入源图像,并进行尺寸预处理。
载入源图像image并作拷贝为org,将image按原始h,w的比例大小设置为高度为500的图像。
进行边缘检测和轮廓检测
在灰度化->边缘检测->轮廓检测后,将轮廓按轮廓的面积进行排序(注意这里默认是顺序的即从小到大,我们需要从大到小排序,所以reverse = True),取面积最大的前5个轮廓,并用多边形逼近(cv.approxPolyDP)的方法将轮廓近似出来,因为检测的轮廓有圆形有长矩形,我们需要的检测的目标轮廓是四边形(类似于矩形)。所以我们经过筛选得到我们需要的四边形的坐标。
坐标的透视变换
由多边形逼近轮廓的方法得到的坐标 是每个轮廓逆时钟方向的各个顶点的坐标,而我们想要顺时针方向的各个顶点的坐标,所以需要先对轮廓坐标重新排序。接着需要求出四边形轮廓的高和宽,来创建一个dst数组:该数组为[[0,0],[width-1,0],[width-1,height-1],[0,height-1]
。将四边形轮廓坐标和dst输入到cv.getPerspectiveTransform
函数里,得到透视变换的M矩阵。接着将用M矩阵对原图像做透视变化,其中得出的warped的大小为(width,height),这样透视变换就做完了。
简单点说:首先读取两个坐标数组,计算变换矩阵;然后根据变换矩阵对原图进行透视变换,并输出到目标画布,
OCR识别
在OCR识别之前要对待识别的图像进行预处理,即灰度二值化,接着利用ocr指令来识别。
源码:
import cv2 as cv
import numpy as np
import pytesseract
def order_point(pts):
rect = np.zeros((4, 2), dtype = "float32")
s = pts.sum(axis = 1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts,axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
def four_point_transfer(image,pts):
rect = order_point(pts)
(tl,tr,br,bl) = rect
width1 = np.sqrt((tr[0]-tl[0])*(tr[0]-tl[0])+(tr[1]-tl[1])*(tr[1]-tl[1]))
width2 = np.sqrt((br[0]-bl[0])*(br[0]-bl[0])+(br[1]-bl[1])*(br[1]-bl[1]))
width = max(width1,width2) #python中有max函数和np.max函数,前者是比较两个数值的大小取最大值,后者是取出数组的最大值
height1 = np.sqrt((tr[0]-br[0])*(tr[0]-br[0])+(tr[1]-br[1])*(tr[1]-br[1]))
height2 = np.sqrt((tl[0]-bl[0])*(tl[0]-bl[0])+(tl[1]-bl[1])*(tl[1]-bl[1]))
height = max(height1,height2)
dst = np.array([[0,0],[width-1,0],[width-1,height-1],[0,height-1]],dtype="float32")
M = cv.getPerspectiveTransform(rect,dst)
warped =cv.warpPerspective(image,M,(width,height))
return warped
def resize(image,height=None):
if height is None:
return image
else :
h,w= image.shape[:2] #shape:h,w,channel image[h(row),w(col),channel]
r = height/h
width = int(w*r) #关于size函数参数的一般是(宽,高)
image = cv.resize(image,(width,height),interpolation=cv.INTER_AREA) #还有resize(img,(宽,高)),即先列后行
return image #利用cv.bounding()得到x,y,width,height
#其它情况一般都是先行后列(高,宽)
#如shape得到参数,或者roi区域内部参数,建立新的Mat 都是先行后列
image = cv.imread("E:\opencv\picture\page.jpg")
orig = image.copy()
image = resize(image,height=500)
ratio = orig.shape[0]/500
#边缘检测
image_gray = cv.cvtColor(image,cv.COLOR_BGR2GRAY)
image_gray = cv.GaussianBlur(image_gray,(5,5),0)
image_edge = cv.Canny(image_gray,75,200)
#轮廓检测
image_contours = cv.findContours(image_edge.copy(),cv.RETR_LIST,cv.CHAIN_APPROX_SIMPLE)[1]
countours = sorted(image_contours,key=cv.contourArea,reverse=True)[:5]
for c in countours:
arc = cv.arcLength(c,closed=True)
approx = cv.approxPolyDP(c,arc*0.02,True)
if len(approx) == 4:
screen_shot = approx
break
cv.drawContours(image,[screen_shot],-1,(0,0,255),2)
warped =four_point_transfer(orig,screen_shot.reshape(4,2)*ratio)
cv.imshow('warped_window',resize(warped,height=650))
warped =cv.cvtColor(warped,cv.COLOR_BGR2GRAY)
scan = cv.threshold(warped,0,255,cv.THRESH_BINARY|cv.THRESH_OTSU)[1]
cv.imwrite("E:/opencv/picture/scan.png",scan)
cv.imshow("scan ",scan)
scanstring = pytesseract.image_to_string(scan)
print(scanstring)
cv.waitKey(0)
cv.destroyAllWindows()