OpenCv实战-1.使用OpenCV的自动文档扫描仪-主要功能是旋转文档

吹夏天的风

已于 2024-06-27 11:28:03 修改

阅读量199

点赞数 9

文章标签： opencv 人工智能计算机视觉

于 2024-06-27 11:23:20 首次发布

本文链接：https://blog.csdn.net/m0_64298393/article/details/140006200

版权

原文：https://learnopencv.com/automatic-document-scanner-using-opencv/ PS：作者的代码有些简略而且有点问题，完全复制下来无法正常运行

目的：将左侧这种倾斜的文档自动转换为右侧正向文档

将左侧
主要使用到了以下技术：
1.Morphological Operation 形态学操作
2.Edge Detection 边缘检测
3.Contour Detection 轮廓检测
4.Homography 单应性
5.GrabCut 图像分割
6.Perspective Transform 透视变换

OK，开始

1.导入图片，并进行形态学操作，屏蔽噪声

oriImg = cv2.imread("test.jpg") #原始图片路径
kernel = np.ones((5,5),np.uint8) #闭运算内核
img = cv2.morphologyEx(oriImg , cv2.MORPH_CLOSE, kernel, iterations=3)
"""
* `oriImg`: 输入图像。这通常是一个二值图像，但在某些情况下，它也可以是一个灰度图像。  
* `cv2.MORPH_CLOSE`: 这是你希望执行的形态学操作的类型。在这种情况下，它是闭合（closing）操作。
    闭运算：先腐蚀，后膨胀,主要用于出去噪声，比如小孔。  
    PS：开运算先腐蚀，后膨胀，用于关闭前景物体内部的小孔，或物体上的小黑点
* `kernel`: 这是一个定义结构元素的数组。结构元素定义了形态学操作的形状和大小。常见的结构元素包括方形、圆形、十字形等。`kernel`的大小和形状会影响形态学操作的效果。  
* `iterations=3`: 这指定了形态学操作的迭代次数。在这里，膨胀和腐蚀操作都会重复3次。增加迭代次数可以增强形态学操作的效果，但也可能导致图像过度失真。
"""

2.使用grabCut进行前后背景分离，如果你分割的效果不好，可以增加grabCut函数的次数

mask = np.zeros(img.shape[:2], np.uint8)  # 创建一个跟图像大小一样的全黑图像，用于前层背景
bgdModel = np.zeros((1, 65), np.float64)  # grabCut参数
fgdModel = np.zeros((1, 65), np.float64)  # grabCut参数
rect = (20, 20, img.shape[1] - 20, img.shape[0] - 20)  # 区分前景背景，一般取图像周围一圈20像素为背景，所以这里rect为前景
cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT)  # 区分前景背景
"""
img：输入图像，应该是8位单通道图像（灰度图）或8位3通道图像（彩色图）。对于彩色图像，函数会将其转换为 HSV 色彩空间。
mask：这是一个与输入图像大小相同的矩阵，用于存储分割信息。这个矩阵的每个元素可以是以下四个值之一：
GC_BGD --> 定义为明显的背景像素 0
GC_FGD --> 定义为明显的前景像素 1
GC_PR_BGD --> 定义为可能的背景像素 2
GC_PR_FGD --> 定义为可能的前景像素 3
rect：这是一个包含前景对象（你想分割出来的对象）的矩形框的坐标。它是一个四元组，格式为 (x, y, w, h)，其中 (x, y) 是矩形左上角的坐标，w 是矩形的宽度，h 是矩形的高度。
bgdModel 和 fgdModel：这两个是输出数组，它们保存了背景模型和前景模型的内部数据。这些模型在函数内部被更新，并用于下一次迭代。通常，你不需要直接访问或修改这些数据。它们的尺寸应该是 65 * (1 + number of channels in img)，其中 number of channels 是输入图像中的通道数（对于灰度图像是1，对于彩色图像是3）。
iterCount：迭代次数。这决定了算法将运行多少次迭代来尝试改进分割结果。增加这个值可能会改善分割的准确性，但也会增加计算时间。
mode：这个参数决定了函数的初始模式。在大多数情况下，你会使用 cv2.GC_INIT_WITH_RECT，它表示初始分割将由用户提供的矩形决定。
"""
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')  #背景的位置设置为0，前景的位置设置为1
img = img * mask2[:, :, np.newaxis] 
#np.newaxis作用是增加一个维度，因为原始img是3维，我们创建的Mask是2维，
# 这里的作用是将原图与我们的Mask相乘，因为Mask 背景为0，前景为1，所以相乘后，原图背景变为黑，前景不变

3.使用canny进行边缘检测

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  #转换为灰度图
gray = cv2.GaussianBlur(gray, (11, 11), 0)   #高斯模糊，去除噪声
# Edge Detection.
canny = cv2.Canny(gray, 0, 200)  #边缘检测
canny = cv2.dilate(canny, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))  #对检测到边缘继续宁膨胀，方便更好的找到轮廓
# Blank canvas.
con = np.zeros_like(img)  #再创建一个全黑图像，
# Finding contours for the detected edges.
contours, hierarchy = cv2.findContours(canny, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)  #找到轮廓
# Keeping only the largest detected contour.
page = sorted(contours, key=cv2.contourArea, reverse=True)[:5]  #对前五个检测到的论轮廓按面积进行排序，因为可能右很多轮廓，这里取前5个，其实本例中这种明显的取第一个就行
con = cv2.drawContours(con, page, -1, (0, 255, 255), 3)      #在纯黑图像上画出检测到的轮廓

4.获取轮廓

con = np.zeros_like(img)  #再创建一个全黑图像，
# Finding contours for the detected edges.
contours, hierarchy = cv2.findContours(canny, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)  #找到轮廓
# Keeping only the largest detected contour.
page = sorted(contours, key=cv2.contourArea, reverse=True)[:5]  #对前五个检测到的论轮廓按面积进行排序，因为可能右很多轮廓，这里取前5个，其实本例中这种明显的取第一个就行
con = cv2.drawContours(con, page, -1, (0, 255, 255), 3)      #在纯黑图像上画出检测到的轮廓
# Blank canvas.
# Loop over the contours.

for c in page:  #遍历轮廓
    # Approximate the contour.
    epsilon = 0.02 * cv2.arcLength(c, True)
    """这行代码计算轮廓c的周长（或弧长），并将其乘以一个小系数（这里是0.02）来得到一个阈值epsilon。这个阈值用于定义在近似轮廓时应该保留多少细节。

        cv2.arcLength(c, True): 计算轮廓c的周长。第二个参数True表示轮廓是闭合的（即首尾相连）。
        0.02: 这是一个系数，用于调整近似轮廓时的精度。系数越小，近似轮廓就越接近原始轮廓；系数越大，近似轮廓就越简化。
        """
    corners = cv2.approxPolyDP(c, epsilon, True)  #计算近似轮廓
    """
    这行代码使用道格拉斯-普克（Douglas-Peucker）算法来近似轮廓c，并返回一个新的点集corners，这个点集定义了近似轮廓。

    c: 要近似的原始轮廓。
    epsilon: 在上一行代码中计算得到的阈值，用于定义近似的精度。
    True: 表示轮廓是闭合的。
    """
    # If our approximated contour has four points
    if len(corners) == 4:   # 这里是原作者加的，目的应该是只遍历一次，取最大的轮廓
        break
# Sorting the corners and converting them to desired shape.
corners = sorted(np.concatenate(corners).tolist())  #将检测到的轮廓转为list
# Displaying the corners.
for index, c in enumerate(corners):  #画出四个点位，得到获取corner的顺序，本例中的顺序是 左上，左下，右上，右下
    character = chr(65 + index)
    cv2.putText(con, character, tuple(c), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)
    show(con)

5.摆正

def order_points(pts):  #将点位按照顺时针排列（有些情况无法通过画图的方式得到点位坐标），这里的计算方式不适合所有情况
    '''Rearrange coordinates to order:
      top-left, top-right, bottom-right, bottom-left'''
    rect = np.zeros((4, 2), dtype='float32')  #创建 4*2的数组
    pts = np.array(pts)  #将轮廓转为数组
    s = pts.sum(axis=1)  #对每个点位进行求和，即（x+ Y）
    # Top-left point will have the smallest sum.
    rect[0] = pts[np.argmin(s)]  #xy之和最小的为左上角
    # Bottom-right point will have the largest sum.
    rect[2] = pts[np.argmax(s)] #xy之和最大的为右上角

    diff = np.diff(pts, axis=1)
    # Top-right point will have the smallest difference.
    rect[1] = pts[np.argmin(diff)] #yx之差最小的为左下角
    # Bottom-left will have the largest difference.
    rect[3] = pts[np.argmax(diff)]#yx之差最大的为右上角
    # Return the ordered coordinates.
    return rect.astype('int').tolist()


rect = order_points(corners)  #获取 四个点位按照顺时针排列的坐标
(tl, tr, br, bl) = rect
# Finding the maximum width.
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB)) #利用三角形定律获得获取文档摆正后的 最大宽度
# Finding the maximum height.
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))#利用三角形定律获得获取文档摆正后的 最大高度


# Final destination co-ordinates.
destination_corners = [[0, 0], [maxWidth, 0], [maxWidth, maxHeight], [0, maxHeight]]  #摆正后文档的坐标
# Getting the homography.
M = cv2.getPerspectiveTransform(np.float32(rect), np.float32(destination_corners))  # 计算摆正前到摆正后的矩阵
#PS，这里作者的源代码是 M = cv2.getPerspectiveTransform(np.float32(corners), np.float32(destination_corners))，但coenners的四个点位排列方式错误，无法正常得到摆正后的文档，使用我修改后的rect即可正常摆正
# Perspective transform using homography.
final = cv2.warpPerspective(oriImg, M, (destination_corners[2][0], destination_corners[2][1]), flags=cv2.INTER_LINEAR)   #将原始图像按照矩阵进行变化

6.完整代码

# -*- coding: utf-8 -*- 
# @Time : 2024/6/27 9:27 
# @Author : yjy
# @File : test.py
# @desc:
# Repeated Closing operation to remove text from the document.

import numpy as np
import cv2
def show(img):
    cv2.imshow("test2", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
oriImg = cv2.imread("test.jpg")

kernel = np.ones((5,5),np.uint8)
img = cv2.morphologyEx(oriImg , cv2.MORPH_CLOSE, kernel, iterations= 3)
mask = np.zeros(img.shape[:2],np.uint8)
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (20,20,img.shape[1]-20,img.shape[0]-20)
cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (11, 11), 0)
# Edge Detection.
canny = cv2.Canny(gray, 0, 200)
canny = cv2.dilate(canny, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))
# Blank canvas.
con = np.zeros_like(img)
# Finding contours for the detected edges.
contours, hierarchy = cv2.findContours(canny, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
# Keeping only the largest detected contour.
page = sorted(contours, key=cv2.contourArea, reverse=True)[:5]
con = cv2.drawContours(con, page, -1, (0, 255, 255), 3)
# Blank canvas.
con = np.zeros_like(img)
# Loop over the contours.
for c in page:
    # Approximate the contour.
    epsilon = 0.02 * cv2.arcLength(c, True)
    corners = cv2.approxPolyDP(c, epsilon, True)
    # If our approximated contour has four points
    if len(corners) == 4:
        break
cv2.drawContours(con, c, -1, (0, 255, 255), 3)
cv2.drawContours(con, corners, -1, (0, 255, 0), 10)
# Sorting the corners and converting them to desired shape.
corners = sorted(np.concatenate(corners).tolist())
# Displaying the corners.
for index, c in enumerate(corners):
    character = chr(65 + index)
    cv2.putText(con, character, tuple(c), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)
    show(con)


def order_points(pts):
    '''Rearrange coordinates to order:
      top-left, top-right, bottom-right, bottom-left'''
    rect = np.zeros((4, 2), dtype='float32')
    pts = np.array(pts)
    s = pts.sum(axis=1)
    # Top-left point will have the smallest sum.
    rect[0] = pts[np.argmin(s)]
    # Bottom-right point will have the largest sum.
    rect[2] = pts[np.argmax(s)]

    diff = np.diff(pts, axis=1)
    # Top-right point will have the smallest difference.
    rect[1] = pts[np.argmin(diff)]
    # Bottom-left will have the largest difference.
    rect[3] = pts[np.argmax(diff)]
    # Return the ordered coordinates.
    return rect.astype('int').tolist()
rect = order_points(corners)
(tl, tr, br, bl) = rect
# Finding the maximum width.
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
# Finding the maximum height.
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
# Final destination co-ordinates.
destination_corners = [[0, 0], [maxWidth, 0], [maxWidth, maxHeight], [0, maxHeight]]
# Getting the homography.
M = cv2.getPerspectiveTransform(np.float32(rect), np.float32(destination_corners))
# Perspective transform using homography.
final = cv2.warpPerspective(oriImg, M, (destination_corners[2][0], destination_corners[2][1]), flags=cv2.INTER_LINEAR)
show(final)

7.限制

图片中文档没有明确的四个角，就无法正常旋转成功了，如下图
在这里插入图片描述

吹夏天的风

关注

9
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
OpenCv实战-1.使用OpenCV的自动文档扫描仪-主要功能是旋转文档

2.使用grabCut进行前后背景分离，如果你分割的效果不好，可以增加grabCut函数的次数。1.Morphological Operation 形态学操作。6.Perspective Transform 透视变换。3.Contour Detection 轮廓检测。2.Edge Detection 边缘检测。1.导入图片，并进行形态学操作，屏蔽噪声。4.Homography 单应性。3.使用canny进行边缘检测。5.GrabCut 图像分割。
复制链接

扫一扫