使用Python和OpenCV创建自己的“ CamScanner”-CSDN博客

本文介绍了计算机视觉的原理及其在AI和机器学习领域的快速发展，指出数据是推动这一进步的关键因素。特斯拉汽车公司是计算机视觉技术的先驱，其自动驾驶车辆依赖于这项技术。通过结合图像数据和复杂的机器学习算法，可以创建真正的人工智能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

感谢Soham Mhatre为本文做出了重要贡献。 (Thanks to Soham Mhatre for contributing significantly towards this article.)

计算机视觉又为何嗡嗡声？ (Computer Vision and why the buzz?)

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. Basically, it’s a scientific field to make the computers understand a photo/video similar to how it will be interpreted by a human being.

计算机视觉是一门跨学科的科学领域，涉及计算机如何从数字图像或视频中获得高级了解。从工程学的角度来看，它试图理解和自动化人类视觉系统可以完成的任务。 基本上 ，使计算机理解与人类如何理解照片/视频类似的科学领域。

那么为什么嗡嗡声 (So why the buzz)

Advancement in AI and Machine Learning has accelerated the developments in computer vision. Earlier these were two separate fields and there were different techniques, coding languages & academic researchers in both of them. But now, the gap has reduced significantly and more and more data scientists are working in the field of computer vision and vice-a-versa. The reason is the simple common denominator in both the fields— Data.

人工智能和机器学习的进步加速了计算机视觉的发展。之前，这是两个单独的领域，并且两者都有不同的技术，编码语言和学术研究人员。但是现在，这种差距已大大缩小，越来越多的数据科学家正在计算机视觉领域进行研究，反之亦然。原因是两个字段(数据)中的简单公分母。

At the end of the day, a computer will learn by consuming data. And AI helps the computers to not only process, but also improve it’s Understanding/Interpretation by trial-and-error. So now, if we can combine the data from images and run complex machine learning algorithms on it, what we get is an actual AI.

最终，计算机将通过使用数据来学习。 AI不仅可以帮助计算机进行处理，还可以通过反复试验来提高对计算机的理解/解释 。因此，现在，如果我们可以合并图像中的数据并在其上运行复杂的机器学习算法，那么我们得到的就是一个真正的AI。

One modern company who has pioneered the technology of Computer Vision is Tesla Motors

特斯拉汽车公司 ( Tesla Motors)是率先开发计算机视觉技术的现代公司

Tesla Motors is known for pioneering the self-driving vehicle revolution in the world. They are also known for achieving high reliability in autonomous vehicles. Tesla cars depend entirely upon computer vision.

特斯拉汽车公司(Tesla Motors)以在世界上引领自动驾驶汽车革命而闻名。它们还以在自动驾驶汽车中实现高可靠性而闻名。特斯拉汽车完全取决于计算机视觉。

今天我们要实现什么？ (What are we gonna achieve today?)

For this article we will concentrate only on Computer Vision and leave Machine Learning for some later time. Also we will just use just one library OpenCV to create the whole thing.

在本文中，我们将仅专注于计算机视觉，并在以后再使用机器学习。同样，我们将仅使用一个库OpenCV 来创建整个程序 。

指数 (Index)

What is OpenCV?
什么是OpenCV？
Preprocess the image using different concepts such as blurring, thresholding, denoising (Non-Local Means).
使用不同的概念(例如模糊，阈值处理，去噪(非局部均值))对图像进行预处理。
Canny Edge detection & Extraction of biggest contour
Canny Edge检测和最大轮廓提取
Finally — Sharpening & Brightness correction
最后-锐化和亮度校正

什么是OpenCV (What is OpenCV)

OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage and then Itseez. The library is cross-platform and free for use under the open-source BSD license. It was initially developed in C++ but now it’s available across multiple languages such Python, Java, etc.

OpenCV是主要针对实时计算机视觉的编程功能库。它最初由英特尔开发，后来得到了Willow Garage和Itseez的支持。该库是跨平台的，可在开源BSD许可下免费使用。它最初是用C ++开发的，但现在可以在多种语言中使用，例如Python，Java等。

从预处理开始 (Start with Preprocessing)

燃烧 (BLURRING)

The goal of blurring is to reduce the noise in the image. It removes high frequency content (e.g: noise, edges) from the image — resulting in blurred edges. There are multiple blurring techniques (filters) in OpenCV, and the most common are:

模糊的目的是减少图像中的噪点。它可以去除图像中的高频成分(例如，噪声，边缘)，从而导致边缘模糊。 OpenCV中有多种模糊技术(过滤器)，最常见的是：

Averaging — It simply takes the average of all the pixels under kernel area and replaces the central element with this average

平均 -仅取内核区域下所有像素的平均值，然后用该平均值替换中心元素

Gaussian Filter — Instead of a box filter consisting of equal filter coefficients, a Gaussian kernel is used

高斯滤波器 -使用高斯核代替由相等滤波器系数组成的盒式滤波器

Median Filter — Computes the median of all the pixels under the kernel window and the central pixel is replaced with this median value

中值过滤器 —计算内核窗口下所有像素的中值，并将中心像素替换为该中值

Bilateral Filter — Advanced version of Gaussian blurring. Not only does it removes noise, but also smoothens edges.

双边过滤器 -高斯模糊的高级版本。它不仅可以消除噪音，还可以平滑边缘。

阈值 (THRESHOLDING)

In image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images. This is generally done so as to clearly differentiate between different shades of pixel intensities. Most common thresholding techniques in OpenCV are:

在图像处理中，阈值化是分割图像的最简单方法。从灰度图像中，阈值可用于创建二进制图像。通常这样做是为了清楚地区分像素强度的不同阴影。 OpenCV中最常用的阈值技术是：

Simple Thresholding — If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black)

简单阈值处理 -如果像素值大于阈值，则为其分配一个值(可以是白色)，否则可以分配另一个值(可以是黑色)

Adaptive Thresholding — Algorithm calculates the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.

自适应阈值处理 -算法为图像的一小部分计算阈值。因此，对于同一图像的不同区域，我们获得了不同的阈值，对于光照变化的图像，它可以提供更好的结果。

Note:Remember to convert the images to grayscale before thresholding

注意：请记住在阈值化之前将图像转换为灰度

去噪 (DENOISING)

There is another kind of de-noising that we conduct —Non-Local Means Denoising. The principle of the initial denoising methods were to replace the colour of a pixel with an average of the colours of nearby pixels. The variance law in probability theory ensures that if nine pixels are averaged, the noise standard deviation of the average is divided by three. Hence giving us a denoised picture.

我们还会进行另一种降噪- 非本地均值降噪。 初始去噪方法的原理是用附近像素的平均颜色替换像素的颜色。概率论中的方差定律可确保如果将9个像素平均，则将平均噪声标准偏差除以3。因此给了我们一张去噪的图片。

But what if there is edge or elongated pattern where denoising by averaging wont work. Therefore, we need to scan a vast portion of the image in search of all the pixels that really resemble the pixel we want to denoise. Denoising is then done by computing the average colour of these most resembling pixels. This is called — Non-Local Means Denoising.

但是，如果存在边缘或拉长的图案而无法通过平均去噪怎么办？因此，我们需要扫描图像的很大一部分，以查找与我们要去噪的像素非常相似的所有像素。然后通过计算这些最相似像素的平均颜色来进行降噪。这称为非局部均值去噪。

Use cv2.fastNlMeansDenoising for the same.

使用cv2.fastNlMeansDenoising相同。

Canny Edge检测和最大轮廓提取 (Canny Edge detection & Extraction of biggest contour)

After image blurring & thresholding, the next step is to find the biggest contour (biggest bounding box) and crop out the image. This is done by using Canny Edge Detection followed by extraction of biggest contour using four-point transformation.

在图像模糊和阈值化之后，下一步是找到最大的轮廓(最大的边界框)并裁剪出图像。这是通过使用Canny Edge Detection进行的，然后使用四点变换提取最大轮廓。

佳能 (CANNY EDGE)

Canny edge detection is a multi-step algorithm that can detect edges. We should send a de-noised image to this algorithm so that it is able to detect relevant edges only.

Canny边缘检测是可以检测边缘的多步算法。我们应该将降噪后的图像发送到此算法，以便它只能检测相关的边缘。

查找轮廓 (FIND CONTOURS)

After finding the edges, pass the image through cv2.findcontours(). It joins all the continuous points (along the edges), having same colour or intensity. After this we will get all contours — rectangles, spheres, etc

找到边缘后，将图像传递到cv2.findcontours() 。它连接具有相同颜色或强度的所有连续点(沿边)。之后，我们将获得所有轮廓-矩形，球形等

Use cv2.convexHull() and cv2.approxPolyDP to find the biggest rectangular contour(approx) in the photo.

使用cv2.convexHull()和cv2.approxPolyDP查找照片中最大的矩形轮廓。

提取最大的轮廓 (EXTRACTING THE BIGGEST CONTOUR)

Although we have found the biggest contour which looks like a rectangle, we still need to find the corners so as to find the exact co-ordinates to crop the image.

尽管我们找到了看起来像矩形的最大轮廓，但仍然需要找到拐角以便找到精确的坐标来裁剪图像。

For this first you pass the co-ordinates of the approx rectangle(biggest contour) and apply an order points transformation on the same. The resultant is an exact (x,y) coordinates of the biggest contour.

首先，您传递近似矩形(最大轮廓)的坐标，并在其上应用顺序点转换。结果是最大轮廓的精确(x，y)坐标。

Four Point Transformation — Using the above (x,y) coordinates, calculate the width and height of the contour. Pass it through the cv2.warpPerspective()to crop the contour. Voila — you have the successfully cropped out the relevant data from the input image

四点变换 —使用上述(x，y)坐标，计算轮廓的宽度和高度。通过cv2.warpPerspective()来裁剪轮廓。瞧-您已成功从输入图像中裁剪出相关数据

Notice — How well the image is cropped out even though its a poorly lit and clicked image

注意—即使光线不足并单击的图像，其裁剪效果也很好

最后-锐化和亮度校正 (Finally — Sharpening & Brightness correction)

Now that we have cropped out the relevant info (biggest contour) from the image, the last step is to sharpen the picture so that we get well illuminated and readable document.

现在我们已经从图像中裁剪出了相关的信息(最大轮廓)，最后一步是使图片锐化，从而使我们获得照亮且可读的文档。

— For this we use hue, saturation, value (h,s,v) concept where value represents the brightness. Can play around with this value to increase the brightness of the documents

—为此，我们使用色相，饱和度，值(h，s，v)概念，其中值表示亮度。 可以玩这个值来增加文件的亮度

— Kernel Sharpening - A kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image

— 内核锐化-内核 ， 卷积矩阵或掩码是小矩阵。它用于模糊，锐化，压纹，边缘检测等。这是通过在内核和映像之间进行卷积来完成的