51c视觉~CV~合集7_51cto opencv-CSDN博客

本文链接：https://blog.csdn.net/weixin_49587977/article/details/146222014

我自己的原文哦~ https://blog.51cto.com/whaosoft/12206577

一、傅里叶变换改善图像亮度/曝光

傅立叶变换

傅里叶变换将波形（基本上是任何现实世界的波形）分解为正弦波。也就是说，傅里叶变换为我们提供了另一种表示波形的方式。对于数字图像，频率概念与图像细节有关，即低频表示一般形状，高频表示更精确的细节。

此外，用于数字数据的离散傅里叶变换 (DFT) 呈现了一幅新的图像，其中存在噪声或瑕疵，这些噪声或瑕疵在空间域上不易处理或看到。为了举例说明，让我们看看 DFT 如何表示莫尔条纹，即图像上的周期性噪声。下面显示的这种图案是旧时打印图像的一种特征。

上面的 DFT 结果有点嘈杂，但对于理解和可用性来说已经足够了。因此，可以在灰色平面图的中间看到对称的点组，这些点代表周期性的莫尔噪声。因此，可以使用特殊的过滤器去除这些点，如果去除这些点并进行逆 DFT，就会得到一个新的无噪声图像。

同态滤波

在简要介绍完这个概念之后，提出了一种使用 DFT 实现同态滤波器的方法。这是一种在有大量阴影的图片上改善闪电曝光的方法。

图像的每个像素都可以用照度和反射率分量来表示。照度i(x,y)表示像素上的光量，在空间中变化缓慢。而反射率r(x, y)表示反射了多少光。该分量与反射光的材料密切相关，并且可以快速变化。

因此，每个像素都可以理解为这两个成分的混合：

这样，为了修复场景中的照明，不仅需要增加照度分量，因为它与反射率混合；将 DFT 应用于图像，我们不会分离这些分量。

问题比这更难。没有太多的数学知识，我给出了生成过滤图像的总结：

为频域中的同态滤波器 H(u, v)：

其中，根据公式：

然而，经过过滤的图像来自：

此外，下面通过直方图逐步表示：

现在我将逐步解释如何实现数字图像的 DFT；但首先让我们回顾一下数字图像的坐标是如何表示的。在第一部分的介绍中，我提到数字图像的 (0,0) 坐标位于左上角。如果我们像这样对图像进行 DFT，则生成的图像的中心将不会位于图像中心。因此，为了帮助这种可视化和过滤应用，它被制作成下图所示的形状，因此我们得到了一个以中心为中心的图像。（上面的莫尔条纹的傅立叶图已经是那样了）。

虽然这项技术意味着要对图像进行平方，但这个过程可以通过后面介绍的内置 OpenCV 函数来完成。现在，在实施之前，还有一点需要注意，那就是根据图像大小，DFT 计算有一些优化。OpenCV 确认 2、3 和 5 的倍数会产生更好的结果，因此还需要重新计算图像大小，并且很可能进行填充。

import cv2
import numpy as np
from math import exp, sqrt


image = cv2.imread("teste.jpg", 0)
height, width = image.shape
dft_M = cv2.getOptimalDFTSize(height)
dft_N = cv2.getOptimalDFTSize(width)


#Filter parameters
yh, yl, c, d0, = 0, 0, 0, 0
#User parameters
y_track, d0_track, c_track = 0, 0, 0
complex = 0

首先，让我们读取一张曝光不良的黑白图像，获取其最佳 DFT 大小，并启动使用过滤器所需的一些全局变量。

def main():
    #copyMakeBorder(src, top, bottom, left, right, borderType[, dst[, value]]) 
    #BORDER_CONSTANT = Pad the image with a constant value (i.e. black or 0)
    padded = cv2.copyMakeBorder(image, 0, dft_M - height, 0, dft_N - width, cv2.BORDER_CONSTANT, 0) 
    padded = np.log(padded + 1) #so we never have log of 0
    global complex
    complex = cv2.dft(np.float32(padded)/255.0, flags = cv2.DFT_COMPLEX_OUTPUT)
    complex = np.fft.fftshift(complex)
    img = 20 * np.log(cv2.magnitude(complex[:,:,0], complex[:,:,1]))


    cv2.namedWindow('Image', cv2.WINDOW_NORMAL)
    cv2.imshow("Image", image)
    cv2.resizeWindow("Image", 400, 400)


    cv2.namedWindow('DFT', cv2.WINDOW_NORMAL)
    cv2.imshow("DFT", np.uint8(img))
    cv2.resizeWindow("DFT", 250, 250)


    cv2.createTrackbar("YL", "Image", y_track, 100, setyl)
    cv2.createTrackbar("YH", "Image", y_track, 100, setyh)
    cv2.createTrackbar("C", "Image", c_track, 100, setc)
    cv2.createTrackbar("D0", "Image", d0_track, 100, setd0)


    cv2.waitKey(0)     
    cv2.destroyAllWindows()

现在，有了理想的尺寸，我们可以用常数值在底部和右侧为图像设置新的边框尺寸（无论如何都可以这样做）。填充后，我们可以转换到频域，然后进行前面提到的移位，以便更好地表示（doc）。一个更复杂的想法是在显示结果之前进行操作，因为我们有实数和复数维度的值。

现在，有了图像和 DFT，是时候实现过滤器本身了。代码已根据之前的解释进行了注释。

def homomorphic():
    global yh, yl, c, d0, complex
    du = np.zeros(complex.shape, dtype = np.float32)
    #H(u, v)
    for u in range(dft_M):
        for v in range(dft_N):
            du[u,v] = sqrt((u - dft_M/2.0)*(u - dft_M/2.0) + (v - dft_N/2.0)*(v - dft_N/2.0))


    du2 = cv2.multiply(du,du) / (d0*d0)
    re = np.exp(- c * du2)
    H = (yh - yl) * (1 - re) + yl
    #S(u, v)
    filtered = cv2.mulSpectrums(complex, H, 0)
     #inverse DFT (does the shift back first)
    filtered = np.fft.ifftshift(filtered)
    filtered = cv2.idft(filtered)
    #normalization to be representable 
    filtered = cv2.magnitude(filtered[:, :, 0], filtered[:, :, 1])
    cv2.normalize(filtered, filtered, 0, 1, cv2.NORM_MINMAX)
    #g(x, y) = exp(s(x, y))
    filtered = np.exp(filtered)
    cv2.normalize(filtered, filtered,0, 1, cv2.NORM_MINMAX)


    cv2.namedWindow('homomorphic', cv2.WINDOW_NORMAL)
    cv2.imshow("homomorphic", filtered)
    cv2.resizeWindow("homomorphic", 600, 550)

然后，负责在用户更改轨迹栏时更改其参数的函数将调用同态函数。轨迹栏在主函数中定义，并采用限制、与栏相关的函数和与实际值相对应的变量。栏按模式保持在 0-100 范围内，并负责更改过滤器的参数。

def setyl(y_track):
    global yl
    yl = y_track
    if yl == 0:
        yl = 1
    if yl > yh:
        yl = yh - 1
    homomorphic()


def setyh(y_track):
    global yh
    yh = y_track
    if yh == 0:
        yh = 1
    if yl > yh:
        yh = yl + 1
    homomorphic()


def setc(c_track):
    global c
    c = c_track/100.0
    if c == 0:
        c_track = 1    
    homomorphic()


def setd0(d0_track):
    global d0
    d0 = d0_track
    if d0 == 0:
        d0 = 1
    homomorphic()

D0、C、YL 和 YH 是合理的参数，它们的实现并不是为了非常精确地处理用户输入，也就是说，它们是为了体验而实现的。

对于所有参数，即使用户设置了，也不允许变为 0，这样它就不会抵消其效果。C 参数的规模被缩小，因为它比其他参数对饱和度更敏感，而 YL — YH（分别表示伽马低和高）的实现方式是它们总是一个大于另一个。

经过调整后，照片的曝光效果如预期般好了很多。黑色边框是为了解决优化问题而实施的填充，不会干扰滤波器计算。

二、更稳更快的找圆方法--EdgeDrawing

如何在OpenCV中使用EdgeDrawing模块查找圆

背景介绍

从OpenCV4.5.2开始，Contrib模块中封装了开源库ED_Lib用于查找图像中的直线、线段、椭圆和圆。Github地址：

https://github.com/CihanTopal/ED_Lib

算法原理简介：

边缘绘制（ED）算法是一种解决边缘检测问题的主动方法。与许多其他遵循减法方法的现有边缘检测算法相比（即在图像上应用梯度滤波器后，根据多种规则消除像素，例如 Canny 中的非极大值抑制和滞后），ED 算法通过加法策略工作，即逐一选取边缘像素，因此称为“边缘绘制”。然后我们处理这些随机形状的边缘段以提取更高级别的边缘特征，即直线、圆、椭圆等。从阈值梯度幅度中提取边缘像素的流行方法是非极大值抑制，它测试每个像素是否具有最大值沿其梯度方向的梯度响应，如果没有则消除。然而，此方法不检查相邻像素的状态，因此可能会导致低质量（在边缘连续性、平滑度、薄度、定位方面）边缘片段。ED 不是非极大值抑制，而是指向一组边缘像素，并通过最大化边缘段的总梯度响应来将它们连接起来。因此，它可以提取高质量的边缘片段，而不需要额外的滞后步骤。

OpenCV中使用介绍文档：

https://docs.opencv.org/4.5.2/d1/d1c/classcv_1_1ximgproc_1_1EdgeDrawing.html

使用步骤

EdgeDrawing类是在Contrib的ximgproc模块中，C++中使用它需要满足以下条件：

① OpenCV >= 4.5.2

② CMake编译Contrib模块

③ 包含edge_drawing.hpp头文件

Python中使用需要安装opencv-python-contrib >=4.5.2

【1】Python中使用演示：

#公众号--OpenCV与AI深度学习


'''
This example illustrates how to use cv.ximgproc.EdgeDrawing class.
Usage:
    ed.py [<image_name>]
    image argument defaults to board.jpg
'''
# Python 2/3 compatibility
from __future__ import print_function
import numpy as np
import cv2 as cv
import random as rng
import sys
rng.seed(12345)
def main():
try:
        fn = sys.argv[1]
except IndexError:
        fn = 'board.jpg'
    src = cv.imread(cv.samples.findFile(fn))
    gray = cv.cvtColor(src, cv.COLOR_BGR2GRAY)
    cv.imshow("source", src)
    ssrc = src.copy()*0
    lsrc = src.copy()
    esrc = src.copy()
    ed = cv.ximgproc.createEdgeDrawing()
# you can change parameters (refer the documentation to see all parameters)
    EDParams = cv.ximgproc_EdgeDrawing_Params()
    EDParams.MinPathLength = 50     # try changing this value between 5 to 1000
    EDParams.PFmode = False         # defaut value try to swich it to True
    EDParams.MinLineLength = 20     # try changing this value between 5 to 100
    EDParams.NFAValidation = True   # defaut value try to swich it to False
    ed.setParams(EDParams)
# Detect edges
# you should call this before detectLines() and detectEllipses()
    ed.detectEdges(gray)
    segments = ed.getSegments()
    lines = ed.detectLines()
    ellipses = ed.detectEllipses()
#Draw detected edge segments
for i in range(len(segments)):
        color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
        cv.polylines(ssrc, [segments[i]], False, color, 1, cv.LINE_8)
    cv.imshow("detected edge segments", ssrc)
#Draw detected lines
if lines is not None: # Check if the lines have been found and only then iterate over these and add them to the image
        lines = np.uint16(np.around(lines))
for i in range(len(lines)):
            cv.line(lsrc, (lines[i][0][0], lines[i][0][1]), (lines[i][0][2], lines[i][0][3]), (0, 0, 255), 1, cv.LINE_AA)
    cv.imshow("detected lines", lsrc)
#Draw detected circles and ellipses
if ellipses is not None: # Check if circles and ellipses have been found and only then iterate over these and add them to the image
for i in range(len(ellipses)):
            center = (int(ellipses[i][0][0]), int(ellipses[i][0][1]))
            axes = (int(ellipses[i][0][2])+int(ellipses[i][0][3]),int(ellipses[i][0][2])+int(ellipses[i][0][4]))
            angle = ellipses[i][0][5]
            color = (0, 0, 255)
if ellipses[i][0][2] == 0:
                color = (0, 255, 0)
            cv.ellipse(esrc, center, axes, angle,0, 360, color, 2, cv.LINE_AA)
    cv.imshow("detected circles and ellipses", esrc)
    cv.waitKey(0)
    print('Done')
if __name__ == '__main__':
    print(__doc__)
    main()
    cv.destroyAllWindows()

执行指令：ed.py [<image_name>]

实例1: edge_drawing.py 1.png

实例2: edge_drawing.py 2.png

实例3: edge_drawing.py 3.png

上述图中，绿色表示找到的椭圆，红色表示找到的圆。当然，EdgeDrawing还可以获取边缘信息和查找直线，效果如下：

【2】C++中使用演示：

//公众号--OpenCV与AI深度学习


#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/ximgproc/edge_drawing.hpp>


using namespace std;
using namespace cv;
using namespace ximgproc;


int main()
{
  Mat src = imread("./imgs/11.bmp");
  if (src.empty())
  {
    cout << "src image is empty, check again!" << endl;
    return -1;
  }
  //resize(src, src, Size(), 0.2, 0.2);
  imshow("src", src);
  Mat gray;
  cvtColor(src, gray, COLOR_BGR2GRAY);


  double start = static_cast<double>(getTickCount()); //计时开始
  
  Ptr<EdgeDrawing> ed = createEdgeDrawing();
  ed->params.EdgeDetectionOperator = EdgeDrawing::PREWITT;
  ed->params.MinPathLength = 50; // try changing this value between 5 to 1000
  ed->params.PFmode = false; //defaut value try to swich it to true
  ed->params.MinLineLength = 10; // try changing this value between 5 to 100
  ed->params.NFAValidation = false; // defaut value try to swich it to false
  ed->params.GradientThresholdValue = 20;

实例1:

实例2:

实例3：

简单总结

总体来说EdgeDrawing提供的找圆和直线的方法简单易用且效果好，简单情况下使用默认参数即可。参数调整可以参考文档自己尝试，这里挑几个常用的简单说明一下。

Ptr<EdgeDrawing> ed = createEdgeDrawing();
ed->params.EdgeDetectionOperator = EdgeDrawing::LSD;
ed->params.MinPathLength = 50; // try changing this value between 5 to 1000
ed->params.PFmode = false; //defaut value try to swich it to true
ed->params.MinLineLength = 10; // try changing this value between 5 to 100
ed->params.NFAValidation = true; // defaut value try to swich it to false
ed->params.GradientThresholdValue = 20;

【1】算法使用的梯度算子，可选4种，默认是PREWITT，大家可以设置不同的梯度算子尝试效果。

【2】梯度阈值GradientThresholdValue，值越小，更能找到对比度低的圆。比如下面分别是梯度阈值为100和50的效果：

【3】NFAValidation：默认值为true。指示是否将NFA（错误警报数）算法用于直线和椭圆验证。设置为false时，能找到更多圆或直线。

【4】MinPathLength：最小连接像素长度处理以创建边缘段。在梯度图像中，为创建边缘段而处理的最小连接像素长度。具有高于GradientThresholdValue的值的像素将被处理，默认值为10。比如下面分别是比如下面分别是梯度阈值为50和10的效果(值越小，更小的圆被找到)：

三、球跟踪和落点预测

在这个项目中，我使用 Python、OpenCV、Cvzone 和一些数学（特别是多项式回归）开发了一个球跟踪系统。该系统可以跟踪视频中的球，可视化其轨迹，甚至预测它是否会落在特定区域——虚拟“篮筐”。下面，我将分解我是如何构建它的、我遇到的挑战以及我在此过程中实现的一些很酷的功能。

第 1 步：设置环境

我使用的关键库是用于图像和视频处理的 OpenCV、用于简化计算机视觉任务的 Cvzone 和用于数学运算的 NumPy。首先，我导入了必要的模块：

import math
import cv2
import cvzone
from cvzone.ColorModule import ColorFinder
import numpy as np

初始化：

cv2.VideoCapture() # grabbing frames from the video file `Videos/vid (3).mp4

第 2 步：检测球

为了检测球，我使用了 HSV（色相、饱和度、值）颜色检测。这允许我通过设置特定范围的 HSV 值来根据球的颜色跟踪球：

myColorFinder = ColorFinder(False)
hsvVals = {'hmin': 8, 'smin': 96, 'vmin': 115, 'hmax': 14, 'smax': 255, 'vmax': 255}

这些 HSV 值对应于我视频中球的颜色。定义颜色范围后，我从视频中抓取每一帧并对其进行处理以找到球。

第 3 步：跟踪球

在检测到球的颜色后，我使用 'cvzone.findContours（）' 来定位球在每一帧中的位置。这就是它变得有趣的地方。对于检测到球的每一帧，我将其位置保存在两个列表中：'posListX' 用于 x 坐标，'posListY' 用于 y 坐标。

imgContours, contours = cvzone.findContours(img, mask, minArea=500)
if contours:
 posListX.append(contours[0]['center'][0])
 posListY.append(contours[0]['center'][1])

“findContours”功能可以非常容易地找到球的确切位置，在检测到的球周围画一个绿色圆圈。

第 4 步：可视化轨迹

现在我已经有了球在各个帧中的位置，我使用多项式回归来模拟它的轨迹。多项式回归有助于找到二次方程的系数 A、B 和 C，该方程对球在 2D 空间中的运动进行建模：

A, B, C = np.polyfit(posListX, posListY, 2)

使用这个方程，我可以预测任何给定 x 坐标的 y 坐标，从而在视频中绘制球的预测路径。

第 5 步：预测结果

最后的触球是预测球是否会落在特定区域。我使用从多项式回归中得出的二次方程来计算球在未来某个时间点的位置。

a = A
b = B
c = C - 590 # 590 represents the y-value threshold for the "basket" area
x = int((-b - math.sqrt(b ** 2 - (4 * a * c))) / (2 * a))
prediction = 330 < x < 430 # Basket zone

如果球的 x 坐标落在 330 到 430 之间（我的“篮筐”范围），那就是一次成功的投篮！如果没有，它就错过了。

第 6 步：显示结果

我使用“cvzone.putTextRect（）”在屏幕上实时显示预测。如果预测球会落入篮筐，则会出现绿色的“篮筐”消息;否则，会弹出一条红色的 “No Basket” 消息。

if prediction:
 cvzone.putTextRect(imgContours, "Basket", (50, 150), scale=5, thickness=5, colorR=(0, 200, 0), offset=20)
else:
 cvzone.putTextRect(imgContours, "No Basket", (50, 150), scale=5, thickness=5, colorR=(0, 0, 200), offset=20)

四、模块化图像处理管道

在这篇文章中，我们将学习如何为图像处理实现一个简单的模块化管道，本文使用 OpenCV 进行图像处理和操作，并使用 Python 生成器进行管道步骤。

图像处理管道是一组按预定义顺序执行的任务，用于将图像转换为所需的结果或提取一些有趣的特征。

任务示例可以是：

图像转换，如平移、旋转、调整大小、翻转和裁剪，
图像的增强，
提取感兴趣区域（ROI），
计算特征描述符，
图像或对象分类，
物体检测，
用于机器学习的图像注释，

最终结果可能是一个新图像，或者只是一个包含一些图像信息的JSON文件。

假设我们在一个目录中有大量图像，并且想要检测其中的人脸并将每个人脸写入单独的文件。此外，我们希望有一些 JSON 摘要文件，它告诉我们在何处找到人脸以及在哪个文件中找到人脸。我们的人脸检测流程如下所示：

人脸检测流程

这是一个非常简单的例子，可以用以下代码总结：

import cv2
import os
import json
import numpy as np

def parse_args():
    import argparse

    # Parse command line arguments
    ap = argparse.ArgumentParser(descriptinotallow="Image processing pipeline")
    ap.add_argument("-i", "--input", required=True,
                    help="path to input image files")
    ap.add_argument("-o", "--output", default="output",
                    help="path to output directory")
    ap.add_argument("-os", "--out-summary", default=None,
                    help="output JSON summary file name")
    ap.add_argument("-c", "--classifier", default="models/haarcascade/haarcascade_frontalface_default.xml",
                    help="path to where the face cascade resides")

    return vars(ap.parse_args())

def list_images(path, valid_exts=None):
    image_files = []
    # Loop over the input directory structure
    for (root_dir, dir_names, filenames) in os.walk(path):
        for filename in sorted(filenames):
            # Determine the file extension of the current file
            ext = filename[filename.rfind("."):].lower()
            if valid_exts and ext.endswith(valid_exts):
                # Construct the path to the file and yield it
                file = os.path.join(root_dir, filename)
                image_files.append(file)

    return image_files

def main(args):
    os.makedirs(args["output"], exist_ok=True)

    # load the face detector
    detector = cv2.CascadeClassifier(args["classifier"])

    # list images from input directory
    input_image_files = list_images(args["input"], (".jpg", ".png"))

    # Storage for JSON summary
    summary = {}

    # Loop over the image paths
    for image_file in input_image_files:
        # Load the image and convert it to grayscale
        image = cv2.imread(image_file)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # Detect faces
        face_rects = detector.detectMultiScale(gray, scaleFactor=1.05, minNeighbors=5,
                                               minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
        summary[image_file] = {}
        # Loop over all detected faces
        for i, (x, y, w, h) in enumerate(face_rects):
            face = image[y:y+w, x:x+h]

            # Prepare output directory for faces
            output = os.path.join(*(image_file.split(os.path.sep)[1:]))
            output = os.path.join(args["output"], output)
            os.makedirs(output, exist_ok=True)

            # Save faces
            face_file = os.path.join(output, f"{i:05d}.jpg")
            cv2.imwrite(face_file, face)

            # Store summary data
            summary[image_file][face_file] = np.array([x, y, w, h], dtype=int).tolist()

        # Display summary
        print(f"[INFO] {image_file}: face detections {len(face_rects)}")

    # Save summary data
    if args["out_summary"]:
        summary_file = os.path.join(args["output"], args["out_summary"])
        print(f"[INFO] Saving summary to {summary_file}...")
        with open(summary_file, 'w') as json_file:
            json_file.write(json.dumps(summary))

if __name__ == '__main__':
    args = parse_args()
    main(args)

用于人脸检测和提取的简单图像处理脚本

代码中的注释也很有探索性，让我们来深入研究一下。首先，我们定义命令行参数解析器（第 6-20 行）以接受以下参数：

--input：这是包含我们图像的目录的路径（可以是子目录），这是唯一的强制性参数。

--output: 保存管道结果的输出目录。

--out-summary：如果我们想要一个 JSON 摘要，只需提供它的名称（例如 output.json）。

--classifier：用于人脸检测的预训练 Haar 级联的路径

接下来，我们定义list_images函数（第 22-34 行），它将帮助我们遍历输入目录结构以获取图像路径。对于人脸检测，我们使用称为Haar级联（第 40 行）的 Viola-Jones 算法，在深度学习和容易出现误报（在没有人脸的地方报告人脸）的时代，这是一种相当古老的算法。

来自电影“老友记”的示例图像，其中存在一些误报

主要处理循环如下：我们遍历图像文件（第 49行），逐个读取它们（第 51 行），检测人脸（第 55 行），将它们保存到准备好的目录（第 59-72 行）并保存带有人脸坐标的摘要报告（第 78-82 行）。

准备项目环境：

$ git clone git://github.com/jagin/image-processing-pipeline.git
$ cd image-processing-pipeline
$ git checkout 77c19422f0d7a90f1541ff81782948e9a12d2519
$ conda env create -f environment.yml
$ conda activate pipeline

为了确保你们的代码能够正常运行，请检查你们的切换分支命令是否正确：
77c19422f0d7a90f1541ff81782948e9a12d2519

让我们运行它：$ python process\_images.py --input assets/images -os output.json 我们得到了一个很好的总结：

[INFO] assets/images/friends/friends\_01.jpg: face detections 2

[INFO] assets/images/friends/friends\_02.jpg: face detections 3

[INFO] assets/images/friends/friends\_03.jpg: face detections 5

[INFO] assets/images/friends/friends\_04.jpg: face detections 14

[INFO] assets/images/landscapes/landscape\_01.jpg: face detections 0

[INFO] assets/images/landscapes/landscape\_02.jpg: face detections 0

[INFO] Saving summary to output/output.json...

每个图像的人脸图像（也有误报）存储在单独的目录中。

output
├── images
│   └── friends
│       ├── friends_01.jpg
│       │   ├── 00000.jpg
│       │   └── 00001.jpg
│       ├── friends_02.jpg
│       │   ├── 00000.jpg
│       │   ├── 00001.jpg
│       │   └── 00002.jpg
│       ├── friends_03.jpg
│       │   ├── 00000.jpg
│       │   ├── 00001.jpg
│       │   ├── 00002.jpg
│       │   ├── 00003.jpg
│       │   └── 00004.jpg
│       └── friends_04.jpg
│           ├── 00000.jpg
│           ├── 00001.jpg
│           ├── 00002.jpg
│           ├── 00003.jpg
│           ├── 00004.jpg
│           ├── 00005.jpg
│           ├── 00006.jpg
│           ├── 00007.jpg
│           ├── 00008.jpg
│           ├── 00009.jpg
│           ├── 00010.jpg
│           ├── 00011.jpg
│           ├── 00012.jpg
│           └── 00013.jpg
└── output.json

五、基于轮廓分析对象提取

本文介绍了如何使用OpenCV进行轮廓分析以提取图像中的白色区域，并提供了详细的工作流程和代码示例，展示了从图像预处理到轮廓检测和面积计算的完整步骤。

问：基于OpenCV如何找到白色区域，有什么思路？OpenCV方法有什么好的思路吗？找到下面的图中两个白色区域的方法。

其实就是用轮廓分析搞定。

OpenCV解决

基于OpenCV实验大师工具软件1.1 设计的流程如下：

最终每一步的运行结果如下：

面积计算数据跟统计结果如下：

OpenCV工作流引擎SDK支持

通过导出的vm配置文件，加载到工作流引擎，可以实现流程复用，处理多张图像，支持的SDK调用代码如下：

#include "main_workflow.h"
#include <iostream>
#include <fstream>

int main(int argc, char** argv) {
  std::shared_ptr<QTongCoreCVWorkFlow> engine(new QTongCoreCVWorkFlow());
  bool succ = engine->initWorkFlow("D:/12121.vm", "69585e470300cdb5a6910131eb639882");
  if (!succ) {
    std::cout << "Could not load workflow file here..." << std::endl;
    return -1;
  }
  cv::Mat frame = cv::imread("D:/facedb/CT_Testing/nCovAg6.bmp");
  cv::namedWindow("OpenCV实验大师 C++工作流引擎演示", cv::WINDOW_NORMAL);
  cv::Mat result;
  std::vector<std::string> logs;
  engine->run_workflow(frame, result, logs);

  cv::imshow("OpenCV实验大师 C++工作流引擎演示", result);
  cv::waitKey(0);
  cv::destroyAllWindows();
  return 0;
}

运行结果如下：

六、CV和关键点检测计算瓶子角度

计算机视觉中的关键点检测是一种用于识别图像中独特点或位置的技术，可用作进一步分析的参考，例如对象识别、姿态估计或运动跟踪等。

本文中，我们将介绍如何使用关键点标记来训练计算机视觉模型来识别水瓶的兴趣点。

然后，我们将使用此信息来估计水瓶的方向。水瓶的方向是指其相对于参考系或轴的旋转对齐或定位。简单来说，它是水瓶相对于某个方向倾斜或转动的角度。

例如，如果您将一个水瓶放在桌子上，并且它完全直立，其底座与桌子表面平行，则其方向将被视为 90 度。如果随后将瓶子稍微向左或向右倾斜，其方向会相应地改变。

在计算机视觉或机器人环境中，确定水瓶的方向可能涉及测量水瓶从预定义的参考方向倾斜或旋转的角度。此信息可用于各种应用，例如自动化系统中的对象检测、操作或跟踪。

确定物体的方向，例如水瓶或任何其他物体，在各个领域都有许多实际应用，例如：

在机器人技术中，工业机器人需要精确抓取物体以执行组装、分拣、包装或将物体放置在指定位置等任务。
在运输和物流中，将物品装入集装箱、卡车或货机时，了解包裹的方向有助于最大限度地提高空间利用率。

估计瓶子方向的方法

首先，我们需要训练一个关键点检测模型来识别水瓶的顶部和底部关键点。我们将在下一步中介绍如何训练模型。此模型将为我们提供这些关键点的 x 和 y 坐标。然后，使用三角函数，计算由两个点（x1， y1）和（x2， y2）形成的线段的角度，以估计水瓶的方向，如下所示：

首先，找出 x 坐标（Δx）和 y 坐标（Δy）的差值：

然后，使用 arctangent 函数求角度：

此信息用于估计对象的方向，在我们的示例中，一个具有顶部和底部关键点的水瓶。如果瓶子的底部是直立的，呈 90 度角，则表明方向正确;否则，它被视为不正确。

如果正确识别了对象的关键点，则此概念可用于估计任何物理对象的方向。

详细实现步骤

具体步骤如下：

收集和标记水瓶数据集
训练关键点检测模型
构建应用程序以检测关键点并估计水瓶的方向

步骤 #1：收集并标记水瓶数据集

手动采集水瓶的数据集。该数据集包含以三个不同方向放置的水瓶，如下图所示

收集数据集后，将其上传到 Roboflow 项目进行标记和训练。要标记数据集，首先需要创建一个关键点骨架。有关如何使用 Roboflow 创建和标记关键点项目的更多详细信息，您可以参考下方链接：

https://blog.roboflow.com/keypoint-detection-on-roboflow/

对于此项目，我定义了一个关键点骨架，用于描述顶部点和底部点，如下图所示。

定义关键点骨架类后，它用于通过拖动边界框并将 “top” 和 “bottom” 关键点定位到其所需位置来标记每个图像，如下图所示。

所有图像都使用 keypoint 类进行标记，并生成数据集。

步骤 #2：训练关键点检测模型

完成标记过程后，将生成数据集版本，并使用 Roboflow 自动训练功能对模型进行训练。实现的训练准确率为 99.5%。

该模型会自动部署到云 API。Roboflow 提供了一系列用于测试和部署模型的选项，例如在 Web 浏览器中进行实时测试和部署到边缘设备。随附的图片说明了通过 Roboflow 的 Web 界面进行测试的模型。

步骤 #3：构建应用程序以检测关键点并估计水瓶的方向

此步骤涉及构建应用程序，以检测实时摄像头源中水瓶的关键点。最初，我们将开发一个基本的 Python 脚本，该脚本能够检测水瓶的关键点，并在图像上用边界框显示它们。为此，我们将使用提供的测试映像。

首先，安装 Roboflow Python 包和 Inference SDK 包，我们将使用它们在我们的模型上运行推理：

pip install roboflow inference-sdk inference

然后，我们可以编写一个脚本来运行推理。创建新文件并添加以下代码：

from inference_sdk import InferenceHTTPClient
import cv2
import json
CLIENT = InferenceHTTPClient(
    api_url="https://detect.roboflow.com",
    api_key="YOUR_API_KEY"
)
# infer on a local image
json_data = CLIENT.infer("bottle.jpg", model_id="bottle-keypoints/1")
print(json_data)

上面的代码以 JSON 格式给出输出，如下所示。此预测结果存储在 json_data 变量中。

{'time': 0.09515731000010419, 'image': {'width': 800, 'height': 360}, 'predictions': [{'x': 341.5, 'y': 196.5, 'width': 309.0, 'height': 85.0, 'confidence': 0.9074831008911133, 'class': 'bottle', 'class_id': 0, 'detection_id': 'bff695c1-df86-4576-83ad-8c802e08774e', 'keypoints': [{'x': 186.0, 'y': 198.0, 'confidence': 0.9994387626647949, 'class_id': 0, 'class_name': 'top'}, {'x': 496.0, 'y': 202.0, 'confidence': 0.9994300007820129, 'class_id': 1, 'class_name': 'bottom'}]}]}

我们会将其转换为 JSON 字符串（在以下代码中），然后使用该字符串在测试图像输出上绘制边界框和关键点。

要显示带有关键点检测模型返回的边界框和关键点的图像，我们需要加载图像。然后，我们需要遍历存储在 JSON 字符串中的预测结果，并绘制边界框和关键点，如以下代码所示。

json_string = json.dumps(json_data)
data = json.loads(json_string )
image = cv2.imread("bottle.jpg") 
for prediction in data['predictions']:
    x = int(prediction['x'])
    y = int(prediction['y'])
    width = int(prediction['width'])
    height = int(prediction['height'])


    x1 = int(x - (width / 2))
    y1 = int(y - (height / 2))
    x2 = int(x + (width / 2))
    y2 = int(y + (height / 2))


    # Draw bounding box
    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)


    # Draw keypoints
    for keypoint in prediction['keypoints']:
        keypoint_x = int(keypoint['x'])
        keypoint_y = int(keypoint['y'])
        class_name = keypoint['class_name']
        if class_name == 'top':
            color = (0, 0, 255)  # Red color for top keypoints
        elif class_name == 'bottom':
            color = (255, 0, 0)  # Blue color for bottom keypoints
        else:
            color = (0, 255, 0)  # Green color for other keypoints
        cv2.circle(image, (keypoint_x, keypoint_y), 5, color, -1)
cv2.imshow("Image with Bounding Boxes and Keypoints", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

效果如下：

接下来，我们将更新此代码以对视频流执行推理。我们将利用网络摄像头来捕获视频并对视频的每一帧执行推理。

创建新文件并添加以下代码：

from inference_sdk import InferenceHTTPClient
import cv2
import json
CLIENT = InferenceHTTPClient(
    api_url="https://detect.roboflow.com",
    api_key="YOUR_API_KEY"
)


def calculate_angle(x1, y1, x2, y2):
    # Calculate the differences in coordinates
    delta_x = x1 - x2
    delta_y = y1 - y2


    # Calculate the angle using arctan2 and convert it to degrees
    angle_rad = math.atan2(delta_y, delta_x)
    angle_deg = math.degrees(angle_rad)


    # Ensure the angle is between 0 and 360 degrees
    mapped_angle = angle_deg % 360
    if mapped_angle < 0:
        mapped_angle += 360  # Ensure angle is positive


    return mapped_angle


cap = cv2.VideoCapture(0)
 ret, frame = cap.read()


    if not ret:
        break


    # Perform inference on the current frame
    json_data = CLIENT.infer(frame, model_id="bottle-keypoints/1")


    # Convert JSON data to dictionary
    data = json.loads(json.dumps(json_data))
# Variables to store bottom and top keypoint coordinates
    bottom_x, bottom_y = None, None
    top_x, top_y = None, None


    # Iterate through predictions
    for prediction in data['predictions']:
        x = int(prediction['x'])
        y = int(prediction['y'])
        width = int(prediction['width'])
        height = int(prediction['height'])


        x1 = int(x - (width / 2))
        y1 = int(y - (height / 2))
        x2 = int(x + (width / 2))
        y2 = int(y + (height / 2))


        # Draw bounding box
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)


        # Draw keypoints
        for keypoint in prediction['keypoints']:
            keypoint_x = int(keypoint['x'])
            keypoint_y = int(keypoint['y'])
            class_name = keypoint['class_name']
            if class_name == 'top':
                color = (0, 0, 255)  # Red color for top keypoints
                top_x, top_y = keypoint_x, keypoint_y
            elif class_name == 'bottom':
                color = (255, 0, 0)  # Blue color for bottom keypoints
                bottom_x, bottom_y = keypoint_x, keypoint_y
            else:
                color = (0, 255, 0)  # Green color for other keypoints
            cv2.circle(frame, (keypoint_x, keypoint_y), 5, color, -1)
 if bottom_x is not None and bottom_y is not None and top_x is not None and top_y is not None:
        angle = calculate_angle(bottom_x, bottom_y, top_x, top_y)


        # Display the angle on the frame
        cv2.putText(frame, "Angle: {:.2f} degrees".format(angle), (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, ( 251, 241, 25), 2)
# Check for orientation
        if 0 <= angle <= 85 or 95 <= angle <= 185:  # Angle close to 0 or 180 degrees
            cv2.putText(frame, "Wrong orientation", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        elif 85 <= angle <= 95 or 265 <= angle <= 275:  # Angle close to 90 degrees
            cv2.putText(frame, "Correct orientation", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    # Display the frame with predictions and angle
    cv2.imshow('Webcam', frame)


    # Check for 'q' key press to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break


# Release the webcam and close OpenCV windows
cap.release()
cv2.destroyAllWindows()

在我们的代码中，我们使用一个函数来计算模型关键点之间的角度。使用这个角度，我们将确定水瓶的正确方向。

我们可以打开视频流并从网络摄像头捕获帧，对每个帧执行推理，并保存预测结果。然后，我们在捕获的视频的每一帧上绘制边界框和关键点。

在此之后，我们将计算角度并将其显示在视频帧上。

然后，我们可以评估方向，如果瓶子的角度为 90 度，则认为它的位置正确;但是，如果它倾斜接近 0 度或 180 度，则认为它的位置不正确。

这是我们系统的最终输出：

七、去除文档水印

水印对于保护知识产权很重要，但有时也会造成阻碍。如果您正在处理文档、书籍或其他扫描材料的图像，您可能希望删除这些水印以供个人使用、修复或研究。本文将介绍如何使用 Python、OpenCV 和 Tkinter 从文档图像中删除水印。

实现步骤

现在，让我们分解用于从文档图像中删除水印的 Python 代码。该代码结合了用于图像处理的 OpenCV、用于 GUI 交互的 Tkinter 和用于可视化的 Matplotlib。

1. 安装库并导入必要的库

pip install opencv-python pillow matplotlib numpy

# imported necessary library
import tkinter
from tkinter import *
import tkinter as tk
import tkinter.messagebox as mbox
from tkinter import ttk
from tkinter import filedialog
from PIL import ImageTk, Image
import cv2
import os
import matplotlib.pyplot as plt

在这里，我们导入所有需要的库。

tkinter：这是用于创建图形用户界面 (GUI) 的标准 Python 库。我们将使用它来构建一个简单的 GUI，用于文件处理和用户交互。
cv2：这是用于图像处理的OpenCV库。它提供各种图像处理功能。
matplotlib.pyplot：用于在窗口中显示图像以实现可视化。
PIL：Python 图像库，用于处理和转换图像文件。

2. 设置工作目录

#images folder path
os.chdir("images")

这会将工作目录更改为包含图像的文件夹。它允许脚本轻松找到图像。确保路径正确并且图像存储在目录中"images"。

3. 设置全局变量

click1 = False
point1 = (0, 0)
img = None

这些全局变量对于跟踪鼠标点击状态和正在处理的图像至关重要。具体来说：

click1：跟踪鼠标是否被点击。
point1：存储第一次点击的坐标。
img：存储正在处理的图像。

4. 处理鼠标事件进行裁剪

def click(event, x, y, flags, params):
    global click1, point1, img
    if event == cv2.EVENT_LBUTTONDOWN:
        click1 = True
        point1 = (x, y)
    elif event == cv2.EVENT_MOUSEMOVE and click1:
        img_copy = img.copy()
        cv2.rectangle(img_copy, point1, (x, y), (0, 0, 255), 2)
        plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB))
        plt.show()
    elif event == cv2.EVENT_LBUTTONUP:
        click1 = False
        sub_img = img[point1[1]:y, point1[0]:x]
        plt.imshow(cv2.cvtColor(sub_img, cv2.COLOR_BGR2RGB))
        plt.show()

此函数处理鼠标事件以选择图像上的感兴趣区域 (ROI)。其工作原理如下：

cv2.EVENT_LBUTTONDOWN：当按下鼠标左键时，记录起始坐标。
cv2.EVENT_MOUSEMOVE：当按下左键并移动鼠标时，它会在图像的副本上动态绘制一个矩形，显示所选区域。
cv2.EVENT_LBUTTONUP：一旦释放鼠标左键，就会显示图像的选定部分。

这使得用户能够直观地选择他们想要关注的图像部分，例如删除特定的水印。

5. 加载和保存图像

def open_and_save_img(img_path):
    global img
    img = cv2.imread(img_path, 1)
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.show()


    img1 = cv2.imread(img_path)
    _, thresh = cv2.threshold(img1, 150, 255, cv2.THRESH_BINARY)
    plt.imshow(cv2.cvtColor(thresh, cv2.COLOR_BGR2RGB))
    plt.show()


    # Saving the images without watermark
    cv2.imwrite('Image_With_Watermark.jpg', img)
    cv2.imwrite('Image_Without_Watermark.jpg', thresh)

该函数执行以下任务：

1. 加载图像：

它使用从给定的文件路径读取图像cv2.imread()。

使用 Matplotlib 显示图像，将颜色从 BGR（OpenCV 使用）转换为 RGB 以实现正确的可视化。

2. 对图像进行阈值处理：

阈值处理用于将图像转换为二进制形式（黑白）。该cv2.threshold()函数应用了 150 的阈值。这种二进制转换可以帮助隔离水印，使其更容易去除。

3. 保存图像：

原始图像保存为“Image_With_Watermark.jpg”。

处理后的（二进制）图像将保存为“Image_Without_Watermark.jpg”。此图像将删除或最小化水印。

5. main函数

def main():
    img_path = "1e958f1c-book_229.jpg"  # Change this to your image path
    open_and_save_img(img_path)




if __name__ == "__main__":
    main()

源码和素材下载：

https://github.com/mdmonsurali/Document-AI/blob/main/Denoising%20technique/Watermark%20remove/watermark_remove.ipynb

八、传统方法实现密集圆形分割与计数

主要介绍基于OpenCV传统方法实现密集圆形分割与计数应用

背景介绍

实例图片来源于网络，目标是分割下图中圆形目标并计数。

本文实现效果如下：

实现步骤

【1】灰度转换 + 均值滤波 + 二值化，得到参考背景

img = cv2.imread('src.jpg')
cv2.imshow("src",img)


gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow("gray",gray)


blur = cv2.medianBlur(gray,7)
cv2.imshow("blur",blur)


_,thres = cv2.threshold(gray, 199, 255, cv2.THRESH_BINARY_INV )
cv2.imshow("thresh",thres)

【2】对灰度图做拉普拉斯变换，提取边缘，并做阈值分割

lap =cv2.Laplacian(gray, -1, ksize = 5)
cv2.imshow("laplacian",lap)
_,lap_thres = cv2.threshold(lap, 250, 255, cv2.THRESH_BINARY)
cv2.imshow("lap_thres",lap_thres)

【3】将上图做膨胀操作，增粗边缘

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))
dilation = cv2.dilate(lap_thres,kernel,iterations = 1)
cv2.imshow("dilation",dilation)

【4】将第【1】步中的二值图与上图做差，腐蚀去除噪点，凸显圆形内部区域：

diff = thres - dilation
erode = cv2.erode(diff,kernel,iterations = 1)
cv2.imshow("diff",erode)

【5】轮廓分析：获取最小外接圆和轮廓面积，筛选轮廓面积/圆面积>0.2的有效轮廓，绘制外接圆标注，并计数。

contours,hierarchy = cv2.findContours(erode, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
count = 0  
for i in range(0,len(contours)):
    center,radius = cv2.minEnclosingCircle(contours[i])
    if radius > 10:
        area = cv2.contourArea(contours[i])
        if area / (math.pi * radius * radius) > 0.2:
            count += 1
            cv2.circle(img,(int(center[0]),int(center[1])),int(radius),colors[i%9],-1)
strCount = 'count=%d'%count
cv2.putText(img,strCount,(10,100),0,2,(255,255,0),3)

最终结果如下：

总结

本例中核心思想是用目标前景区域(二值化得到)和边缘区域(拉普拉斯变化得到，不用Canny)做差得到圆内部区域轮廓，然后做后续处理。当然也可以使用距离变换 + 分水岭方法来实现，有兴趣的话可以自己尝试一下。

九、模糊检测 / 自动对焦

使用OpenCV实现图像模糊检测/相机自动对焦功能。

为了检测图片是否对焦，现代消费类相机使用复杂的相位检测电路和专用传感器。但是拍摄后如何确定拍摄的照片是否对焦。拥有这些测量信息可以在很多方面提供帮助（选择序列中的最佳图片、控制电动镜头、清晰的延时视频等等）。

在我们的例子中，拉普拉斯变换虽然不是完美的解决方案，但可以区分相同场景的聚焦帧和模糊帧。虽然很难描述拉普利亚函数的作用，但您始终可以在维基百科页面上阅读更多详细信息。

我将再次使用 OpenCV 来解决这个问题。让我们捕获短视频剪辑并运行脚本来查看结果。脚本的方式是在底部显示带有质量栏的视频文件，并保存带有数值结果的文本文件以供进一步分析。

实现与代码

注意底部的红色条表示对焦质量

整个剪辑分析表明，该功能可以非常精确地区分是否对焦。不幸的是，在极端条件下很难确定模糊程度。

为了显示焦点/模糊随时间的分布，我使用了 LibreOffice 图形函数。下面是帧数表示的视觉模糊。

实现代码：

import cv2
from tqdm import trange




cap = cv2.VideoCapture('10.avi')
f = open('results.txt', 'w')


frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))


for i in trange(frame_count, unit=' frames', leave=False, dynamic_ncols=True, desc='Calculating blur ratio'):
  ret, frame = cap.read()
  gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
  fm = cv2.Laplacian(gray, cv2.CV_64F).var()


  # Sample quality bar. Parameters adjusted manually to fit horizontal image size
  cv2.rectangle(frame, (0, 1080), (int(fm*1.6), 1040), (0,0,255), thickness=cv2.FILLED)


  im = cv2.resize(frame, None,fx=0.5, fy=0.5, interpolation = cv2.INTER_CUBIC)
  cv2.imshow("Output", im)


  f.write(str(fm)+'\r')


  k = cv2.waitKey(1) & 0xff
  if k == 27:
    break

十、Python 和 CLIP ~太阳能电池板缺陷检测

太阳能电池板的效率直接影响可再生能源的生产，污染和缺陷会显著降低其性能。该项目使用 CLIP 进行自动太阳能电池板检查。

CLIP（对比语言-图像预训练）由 OpenAI 开发，使用 LLM 进行图像分类。这使得它特别适合太阳能电池板检查任务，因为它可以轻松适应各种缺陷类型，而无需进行大量的再训练。

使用计算机视觉对太阳能电池板进行预测性维护可以完成以下几项任务，例如检测：1/覆盖部分光伏模块（并因此降低效率）的东西，如鸟粪、雪或灰尘；2/模块的物理损坏，如面板破损（通常是有东西掉落在面板上）；3/腐蚀；或 4/植被侵占，即当面板下的植被长得太大并干扰模块时。

在这个项目中，我们使用 Hugging Face Transformers 库来访问 CLIP 模型。图像数据集来自一组太阳能电池板图像，分为“干净”和“不干净”类别。

图片集来自 Kaggle，链接如下：

https://www.kaggle.com/datasets/pythonafroz/solar-panel-images

我在让 kagglehub 下载正常工作时遇到了很多麻烦。我最终以老式的方式下载了图像，以设置分析所需的文件夹结构。

我建立了一个模型，根据图像将光伏组件的状况分为 6 个类别之一。CLIP 用这种方法表现很差，因为它基本上测量了 CLIP 的答案与现有类别名称的接近程度。CLIP通过将其设置为二元分类问题表现更好。

使用 TensorFlow 的图像数据集实用程序加载和预处理数据集。

import tensorflow as tf 
import matplotlib.pyplot as plt 
import numpy as np 
from PIL import Image 
from transformers import CLIPProcessor, CLIPModel 
import torch 
from sklearn.metrics import classes_report, confusion_matrix 
import seaborn as sns 


# 设置图像尺寸
img_height, img_width = 224 , 224 


# 加载 CLIP 模型和处理器
model = CLIPModel.from_pretrained( "openai/clip-vit-base-patch32" ) 
process = CLIPProcessor.from_pretrained( "openai/clip-vit-base-patch32" ) 


# 数据增强
data_augmentation = tf.keras.Sequential([ 
    tf.keras.layers.RandomFlip( "horizontal" ), 
    tf.keras.layers.RandomRotation( 0.1 ), 
    tf.keras.layers.RandomBrightness( 0.2 ), 
    tf.keras.layers.RandomContrast( 0.2 ), 
]) 


# 使用增强加载数据集
train_ds = tf.keras.utils.image_dataset_from_directory( 
    '/content/binary_solar_panels/' , 
    validation_split= 0.2 , 
    subset= 'training' , 
    image_size=(img_height, img_width), 
    batch_size= 32 , 
    seed= 42
 ) 


val_ds = tf.keras.utils.image_dataset_from_directory( 
    '/content/binary_solar_panels/' , 
    validation_split= 0.2 , 
    subset= 'validation' , 
    image_size=(img_height, img_width), 
    batch_size= 32 , 
    seed= 42
 ) 


class_names = train_ds.class_names 
print ( "类别："，class_names)

CLIP 的优势之一是它能够理解自然语言描述。我们为每个类别定义一组提示：

# 更详细和具体的提示
text_descriptions = [ 
    [ 
        “表面完好无损的太阳能电池板”，
        “完好无损的一尘不染的太阳能电池板”，
        “干净且保养良好的太阳能电池板”，
        “玻璃表面透明的太阳能电池板”，
        “看起来崭新的太阳能电池板”
     ]，
    [ 
        “有明显污垢或损坏的太阳能电池板”，
        “布满鸟粪的太阳能电池板”，
        “损坏或有故障的太阳能电池板”，
        “布满灰尘和污垢的太阳能电池板”，
        “表面有碎屑的太阳能电池板”
     ] 
]

`predict_clip` 函数处理图像和文本提示以生成预测：

# Function to predict using CLIP with ensemble of prompts
def predict_clip(image_batch, temperature=100.0):
    images = [Image.fromarray(img.numpy().astype("uint8")) for img in image_batch]
    # Process images
    image_inputs = processor(
        images=images,
        return_tensors="pt",
        padding=True
    )
    # Initialize aggregated predictions
    total_predictions = np.zeros((len(images), 2))
    # Process each set of prompts
    with torch.no_grad():
        image_features = model.get_image_features(**image_inputs)
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
        for clean_prompt, not_clean_prompt in zip(text_descriptions[0], text_descriptions[1]):
            # Process text descriptions
            text_inputs = processor(
                text=[clean_prompt, not_clean_prompt],
                return_tensors="pt",
                padding=True
            )
            text_features = model.get_text_features(**text_inputs)
            text_features = text_features / text_features.norm(dim=-1, keepdim=True)
            # Calculate similarity with temperature scaling
            similarity = (temperature * image_features @ text_features.T).softmax(dim=-1)
            total_predictions += similarity.numpy()
    # Average predictions across all prompt pairs
    return total_predictions / len(text_descriptions[0])
# Evaluate model with different temperature values
temperatures = [50.0, 100.0, 150.0]
best_accuracy = 0
best_temperature = None
best_threshold = None
best_predictions = None
for temp in temperatures:
    print(f"\nTesting temperature: {temp}")
    y_true = []
    y_pred_probs = []
    for images, labels in val_ds:
        predictions = predict_clip(images, temperature=temp)
        y_true.extend(labels.numpy())
        y_pred_probs.extend(predictions[:, 1])
    y_true = np.array(y_true)
    y_pred_probs = np.array(y_pred_probs)
    # Try different thresholds
    thresholds = np.arange(0.3, 0.7, 0.05)
    for threshold in thresholds:
        y_pred = (y_pred_probs > threshold).astype(int)
        accuracy = np.mean(y_pred == y_true)
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_temperature = temp
            best_threshold = threshold
            best_predictions = y_pred
print(f"\nBest temperature: {best_temperature}")
print(f"Best threshold: {best_threshold}")
print(f"Best accuracy: {best_accuracy:.3f}")
# Use best parameters for final evaluation
y_true = []
y_pred_probs = []
for images, labels in val_ds:
    predictions = predict_clip(images, temperature=best_temperature)
    y_true.extend(labels.numpy())
    y_pred_probs.extend(predictions[:, 1])
y_true = np.array(y_true)
y_pred_probs = np.array(y_pred_probs)
y_pred = (y_pred_probs > best_threshold).astype(int)
# Print classification report
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=class_names))
# Plot confusion matrix
plt.figure(figsize=(8, 6))
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.xticks([0.5, 1.5], class_names)
plt.yticks([0.5, 1.5], class_names)
plt.show()

# Function to visualize predictions
def plot_predictions(dataset, num_images=25):
    plt.figure(figsize=(20, 20))
    for images, labels in dataset.take(1):
        predictions = predict_clip(images, temperature=best_temperature)
        predicted_classes = (predictions[:, 1] > best_threshold).astype(int)


        for i in range(min(num_images, len(images))):
            ax = plt.subplot(5, 5, i + 1)
            plt.imshow(images[i].numpy().astype("uint8"))


            predicted_class = class_names[predicted_classes[i]]
            actual_class = class_names[labels[i]]
            prob = predictions[i][1]


            color = 'green' if predicted_class == actual_class else 'red'
            plt.title(f"Actual: {actual_class}\nPred: {predicted_class}\nConf: {prob:.2f}", 
                     color=color, fnotallow=10)
            plt.axis("off")
    plt.tight_layout()
    plt.show()


# Plot sample predictions
plot_predictions(val_ds)


# Plot probability distributions
plt.figure(figsize=(10, 6))
clean_probs = y_pred_probs[y_true == 0]
not_clean_probs = y_pred_probs[y_true == 1]


plt.hist(clean_probs, alpha=0.5, label='Clean', bins=20, density=True)
plt.hist(not_clean_probs, alpha=0.5, label='Not Clean', bins=20, density=True)
plt.axvline(x=best_threshold, color='r', linestyle='--', label=f'Threshold ({best_threshold:.3f})')
plt.xlabel('Probability of Not Clean Class')
plt.ylabel('Density')
plt.title('Distribution of CLIP Probabilities')
plt.legend()
plt.show()

我们使用不同的温度值和阈值评估模型的性能，以找到最佳配置。使用混淆矩阵和样本预测对结果进行可视化。

此次实施证明了 CLIP 在太阳能电池板缺陷检测方面的有效性。其零样本学习功能允许通过简单地更新文本提示轻松适应新的缺陷类型。这种灵活性使 CLIP 特别适合缺陷类别可能随时间演变的工业应用。

相比之下，使用 MobileNetv2 的迁移学习方法的准确率为 0.87。Priyanka 在教程中使用了 VGG16。我发现 mnv2 的准确率大致相同，而且针对我们关注的太阳能电池板检查进行微调的速度要快得多。

完整MobileNetV2实现代码和效果：

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
# Set image dimensions
img_height = 244
img_width = 244
# Load and split dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
    '/content/a/Faulty_solar_panel/',
    validation_split=0.2,
    subset='training',
    image_size=(img_height, img_width),
    batch_size=32,
    seed=42,
    shuffle=True
)
val_ds = tf.keras.utils.image_dataset_from_directory(
    '/content/a/Faulty_solar_panel',
    validation_split=0.2,
    subset='validation',
    image_size=(img_height, img_width),
    batch_size=32,
    seed=42,
    shuffle=True
)
# Function to convert multi-class labels to binary (Clean vs Not Clean)
def to_binary_labels(images, labels):
    binary_labels = tf.where(labels == 1, 0, 1)  # Assuming 'Clean' is label 1
    return images, binary_labels
# Apply binary conversion to datasets
train_ds_binary = train_ds.map(to_binary_labels)
val_ds_binary = val_ds.map(to_binary_labels)
# Data preprocessing
AUTOTUNE = tf.data.AUTOTUNE
train_ds_binary = train_ds_binary.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds_binary = val_ds_binary.cache().prefetch(buffer_size=AUTOTUNE)
# Create the model
def create_model():
    base_model = tf.keras.applications.MobileNetV2(
        input_shape=(img_height, img_width, 3),
        include_top=False,
        weights='imagenet'
    )
    base_model.trainable = False
    model = tf.keras.Sequential([
        base_model,
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(256, activatinotallow='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activatinotallow='sigmoid')  # Binary classification
    ])
    return model
# Create and compile model
model = create_model()
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)
# Callbacks
early_stopping = tf.keras.callbacks.EarlyStopping(
    mnotallow='val_loss',
    patience=5,
    restore_best_weights=True
)
# Train the model
epochs = 20
history = model.fit(
    train_ds_binary,
    validation_data=val_ds_binary,
    epochs=epochs,
    callbacks=[early_stopping]
)
# Plot training results
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Evaluate the model
y_true = []
y_pred = []
for images, labels in val_ds_binary:
    predictions = model.predict(images)
    y_true.extend(labels.numpy())
    y_pred.extend((predictions > 0.5).astype(int).flatten())
# Print classification report
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=['Clean', 'Not Clean']))
# Plot confusion matrix
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.xticks([0.5, 1.5], ['Clean', 'Not Clean'])
plt.yticks([0.5, 1.5], ['Clean', 'Not Clean'])
plt.show()
# Function to plot predictions
def plot_predictions(dataset, num_images=25):
    plt.figure(figsize=(20, 20))
    for images, labels in dataset.take(1):
        predictions = model.predict(images)
        for i in range(min(num_images, len(images))):
            ax = plt.subplot(5, 5, i + 1)
            plt.imshow(images[i].numpy().astype("uint8"))
            predicted_class = "Clean" if predictions[i] < 0.5 else "Not Clean"
            actual_class = "Clean" if labels[i] == 0 else "Not Clean"
            color = 'green' if predicted_class == actual_class else 'red'
            plt.title(f"Actual: {actual_class}\nPred: {predicted_class}", 
                     color=color, fnotallow=10)
            plt.axis("off")
    plt.tight_layout()
    plt.show()
# Plot sample predictions
plot_predictions(val_ds_binary)

十一、摄像头测距

摄像头测距就是计算照片中的目标物体到相机的距离。可以使用相似三角形（triangle similarity）方法实现，或者使用更复杂但更准确的相机模型的内参来实现这个功能。

使用相似三角形计算物体到相机的距离

假设物体的宽度为 W，将其放到离相机距离为 D 的位置，然后对物体进行拍照。在照片上量出物体的像素宽度 P，于是可以得出计算相机焦距 F 的公式：

比如我在相机前 24 英寸距离（D=24 inches）的位置横着放了一张 8.5 x 11 英寸（W=11 inches）的纸，拍照后通过图像处理得出照片上纸的像素宽度 P=248 pixels。所以焦距 F 等于：

此时移动相机离物体更近或者更远，我们可以应用相似三角形得到计算物体到相机的距离的公式：

原理大概就是这样，接下来使用 OpenCV 来实现。

获取目标轮廓

# import the necessary packages
fromimutils import paths
importnumpy as np
importimutils
importcv2
deffind_marker(image):
    # convert the image to grayscale, blur it, and detect edges
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 35, 125)
    # find the contours in the edged image and keep the largest one;
    # we'll assume that this is our piece of paper in the image
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key = cv2.contourArea)
    # compute the bounding box of the of the paper region and return it
returncv2.minAreaRect(c)

这里我们用一张 8.5 x 11 英寸的纸作为目标物体。第一个任务是在图片中找到目标物体。

下面这三行是先将图片转换为灰度图，并进行轻微模糊处理以去除高频噪声，然后进行边缘检测。

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 35, 125)

做了这几步后图片看起来是这样的：

现在已经可以清晰地看到这张纸的边缘，接下来需要做的是找出这张纸的轮廓。

cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key = cv2.contourArea)

用 cv2.findContours 函数找到图片中的众多轮廓，然后获取其中面积最大的轮廓，并假设这是目标物体的轮廓。

这种假设只适用于我们这个场景，在实际使用时，在图片中找出目标物体的方法与应用场景有很大关系。

我们这个场景用简单的边缘检测并找出最大的轮廓就可以了。当然为了使程序更具有鲁棒性，也可以用轮廓近似，并剔除不是四个点的轮廓（纸张是一个有四个点的矩形），然后再找出面积最大，具有四个点的轮廓。

注意: 关于这个方法，详情可以查看这篇文章，用于构建一个移动文本扫描工具。

我们也可以根据颜色特征在图片中找到目标物体，因为目标物体和背景的颜色有着很明显的不同。还可以应用关键点检测（keypoint detection），局部不变性描述子（local invariant descriptors）和关键点匹配（keypoint matching）来寻找目标。但是这些方法不在本文的讨论范围内，而且高度依赖具体场景。

我们现在得到目标物体的轮廓了，find_marker 函数最后返回的是包含轮廓 (x, y) 坐标、像素长度和像素宽度的边框，

计算距离

接下来该使用相似三角形计算目标到相机的距离。

defdistance_to_camera(knownWidth, focalLength, perWidth):
# compute and return the distance from the maker to the camera
return (knownWidth * focalLength) / perWidth

distance_to_camera 函数传入目标的实际宽度，计算得到的焦距和图片上目标的像素宽度，就可以通过相似三角形公式计算目标到相机的距离了。

下面是调用 distance_to_camera 函数之前的准备：

# initialize the known distance from the camera to the object, which
# in this case is 24 inches
KNOWN_DISTANCE = 24.0


# initialize the known object width, which in this case, the piece of
# paper is 12 inches wide
KNOWN_WIDTH = 11.0


# load the furst image that contains an object that is KNOWN TO BE 2 feet
# from our camera, then find the paper marker in the image, and initialize
# the focal length
image = cv2.imread("images/2ft.jpg")
marker = find_marker(image)
focalLength = (marker[1][0] * KNOWN_DISTANCE) / KNOWN_WIDTH

首先是测量目标物体的宽度，和目标物体到相机的距离，并根据上面介绍的方法计算相机的焦距。其实这些并不是真正的摄像机标定。真正的摄像机标定包括摄像机的内参，相关知识可以可以查看这里。

使用 cv2.imread 函数从磁盘加载图片，然后通过 find_marker 函数得到图片中目标物体的坐标和长宽信息，最后根据相似三角形计算出相机的焦距。

现在有了相机的焦距，就可以计算目标物体到相机的距离了。

# loop over the images
for imagePath in sorted(paths.list_images("images")):
# load the image, find the marker in the image, then compute the
# distance to the marker from the camera
    image = cv2.imread(imagePath)
    marker = find_marker(image)
    inches = distance_to_camera(KNOWN_WIDTH, focalLength, marker[1][0])


# draw a bounding box around the image and display it
    box = cv2.cv.BoxPoints(marker) if imutils.is_cv2() else cv2.boxPoints(marker)
    box = np.int0(box)
    cv2.drawContours(image, [box], -1, (0, 255, 0), 2)
    cv2.putText(image, "%.2fft" % (inches / 12),
        (image.shape[1] - 200, image.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX,
        2.0, (0, 255, 0), 3)
    cv2.imshow("image", image)
    cv2.waitKey(0)

使用 for 循环遍历每个图片，计算每张图片中目标对象到相机的距离。在结果中，我们根据得到的轮廓信息将方框画了出来，并显示出了距离。下面是得到的几个结果图：