全能扫描王的实现(python版本)- 目标检测图像矫正

本文介绍了一种针对扫描文件和规则形状物体图片的校正方法,通过边缘检测、轮廓提取并应用透视变换,实现图片的规整和优化。步骤包括边缘检测、找到轮廓以及应用四点变换进行图像校准。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本方法适用于扫描文件,或者是形状规则(四个角上的点)的物体图片的校正。

0.导入必要的包

import numpy as np
import argparse
import cv2
import imutils

1.边缘检测

首先对图像进行处理。

# 设置读取图片的路径
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", default="scan.jpg",
	help = "Path to the image to be scanned") ## 这里修改为要扫描的图片的位置
args = vars(ap.parse_args())

# 加载原图 并对原图进行Resize
image = cv2.imread(args["image"])
ratio = image.shape[0] / 500.0
orig = image.copy()
image = imutils.resize(image, height = 500) #根据长宽比自动计算另外一边的尺寸进行resize

#根据处理找到边缘
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 75, 200)

# show the original image and the edge detected image
print("STEP 1: Edge Detection")
cv2.imshow("Image", image)
cv2.imshow("Edged", edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

(原图与边缘检测后的图)

2. 找到轮廓

# find the contours in the edged image, keeping only the
# largest ones, and initialize the screen contour
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) ## 找到轮廓
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]

# loop over the contours
for c in cnts:
	# approximate the contour
	peri = cv2.arcLength(c, True)
	approx = cv2.approxPolyDP(c, 0.02 * peri, True)

	# if our approximated contour has four points, then we
	# can assume that we have found our screen
	if len(approx) == 4:
		screenCnt = approx
		break

# show the contour (outline) of the piece of paper
print("STEP 2: Find contours of paper")
cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)
cv2.imshow("Outline", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

(轮廓)

3. 图像校正

在本部分 首先要定义两个函数

def order_points(pts):
	# initialzie a list of coordinates that will be ordered
	# such that the first entry in the list is the top-left,
	# the second entry is the top-right, the third is the
	# bottom-right, and the fourth is the bottom-left
	rect = np.zeros((4, 2), dtype = "float32")

	# the top-left point will have the smallest sum, whereas
	# the bottom-right point will have the largest sum
	s = pts.sum(axis = 1)
	rect[0] = pts[np.argmin(s)]
	rect[2] = pts[np.argmax(s)]

	# now, compute the difference between the points, the
	# top-right point will have the smallest difference,
	# whereas the bottom-left will have the largest difference
	diff = np.diff(pts, axis = 1)
	rect[1] = pts[np.argmin(diff)]
	rect[3] = pts[np.argmax(diff)]

	# return the ordered coordinates
	return rect


def four_point_transform(image, pts):
	# obtain a consistent order of the points and unpack them
	# individually
	rect = order_points(pts)
	(tl, tr, br, bl) = rect

	# compute the width of the new image, which will be the
	# maximum distance between bottom-right and bottom-left
	# x-coordiates or the top-right and top-left x-coordinates
	widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
	widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
	maxWidth = max(int(widthA), int(widthB))

	# compute the height of the new image, which will be the
	# maximum distance between the top-right and bottom-right
	# y-coordinates or the top-left and bottom-left y-coordinates
	heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
	heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
	maxHeight = max(int(heightA), int(heightB))

	# now that we have the dimensions of the new image, construct
	# the set of destination points to obtain a "birds eye view",
	# (i.e. top-down view) of the image, again specifying points
	# in the top-left, top-right, bottom-right, and bottom-left
	# order
	dst = np.array([
		[0, 0],
		[maxWidth - 1, 0],
		[maxWidth - 1, maxHeight - 1],
		[0, maxHeight - 1]], dtype = "float32")

	# compute the perspective transform matrix and then apply it
	M = cv2.getPerspectiveTransform(rect, dst)
	warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

	# return the warped image
	return warped

然后调用函数就好了,如果需要进行扫描,二值化的话 把下面注释掉的代码放开就行,因为我只需要原图校正,所以不需要。(左图为原始图片,右侧为校正或者扫描后的硬盘图片。)

warped = four_point_transform(orig, screenCnt.reshape(4, 2) * ratio)

# convert the warped image to grayscale, then threshold it 
# 这里是二值化,也就是扫描成黑白的,可以把这部分放开注释,如果需要的话。
# to give it that 'black and white' paper effect
# warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
# T = threshold_local(warped, 11, offset = 10, method = "gaussian")
# warped = (warped > T).astype("uint8") * 255

# show the original and scanned images
print("STEP 3: Apply perspective transform")
cv2.imshow("Original", imutils.resize(orig, height = 650))
cv2.imshow("Scanned", imutils.resize(warped, height = 650))
cv2.waitKey(0)

在这里插入图片描述

### 关于全能扫描页面增强功能的实现与配置 #### 页面增强功能概述 CamScanner 的页面增强功能主要体现在其高清晰度扫描技术和智能化图像处理能力上。这些技术能够有效改善原始文档的质量,使其更易于阅读和保存[^1]。 具体而言,页面增强功能通常涉及以下几个方面: - **智能裁剪**:通过算法检测文档边界并自动去除多余背景部分。 - **颜色校正**:调整亮度、对比度以及色彩平衡,使扫描后的图片更加真实自然。 - **去噪处理**:减少因光线不足或其他原因造成的噪声干扰。 - **倾斜矫正**:当拍摄角度不理想时,系统会自动纠正歪斜的文档影像。 以上提到的技术手段共同作用下实现了高质量的数字化转换过程。 #### 技术架构分析 从开发角度来看, 如果想要模仿或扩展此类应用中的某些特性,则需考虑如下几个层面的设计思路: ##### 前端交互设计 为了保证用户体验良好,在前端界面上应该做到简洁明了的同时也要兼顾功能性。例如设置不同的模式选项供用户选择(黑白/彩色),允许手动微调各项参数等操作都属于提高易用性的措施之一[^2]。 ##### 后端逻辑处理 后端负责接收来自客户端的数据请求,并执行相应的业务流程。对于像添加请假审批这样的场景描述中可以看出存在典型的三层结构模型——即表示层(Presentation Layer)、业务逻辑层(Business Logic Layer) 及数据访问层(Data Access Layer)[^4] 。同样地,在构建类似的文件管理系统时也可以采用这种分层方式来组织代码结构从而便于维护升级等工作开展顺利进行下去. 另外值得注意的是 OCR (Optical Character Recognition) 文字识别作为一项重要增值服务被集成进来之后不仅可以帮助提取关键信息还能进一步简化后续检索分类等一系列动作所需耗费的人力成本大幅降低效率得到极大提升.[^1] #### 开发工具推荐 如果计划自行研发具有上述特性的应用程序的话,可以借助一些现有的开源库或者框架加快进度缩短周期。比如 OpenCV 提供了许多计算机视觉方面的基础组件非常适合用来完成诸如边缘探测之类的任务;而 Tesseract 则是一个非常强大的光学字符识别引擎支持多种语言环境下的文本解析作业等等都是不错的选择方向值得深入探索研究一番才行呢! 最后提醒一点就是不管选用哪种方案都要充分考虑到目标平台的具体情况做出适当取舍这样才能达到最佳效果哦~ ```python import cv2 from PIL import Image import pytesseract def enhance_image(image_path): img = cv2.imread(image_path) # Convert to grayscale and apply thresholding for better contrast gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) # Perform noise removal using median filtering denoised_img = cv2.medianBlur(thresh, 3) # Save the enhanced image output_file = 'enhanced_' + image_path.split('/')[-1] cv2.imwrite(output_file, denoised_img) return output_file if __name__ == "__main__": input_image = './example.jpg' result = enhance_image(input_image) print(f"Enhanced image saved as {result}") ``` 上面展示了一段简单的 Python 脚本用于演示基本的图像预处理步骤包括灰阶变换、二值化阈值设定还有基于中值滤波器消除随机点状杂讯等内容仅供参考学习用途而已实际项目当中可能还需要加入更多复杂环节才能满足特定需求条件限制等因素影响最终成果呈现形式各异千变万化无定法可循唯有不断实践积累经验方能有所成就矣! ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值