Python OpenCV 3.x 示例：6~11_opencv-3x-CSDN博客

原文：OpenCV 3.x with Python By Example

协议：CC BY-NC-SA 4.0

译者：飞龙

本文来自【ApacheCN 计算机视觉译文集】，采用译后编辑（MTPE）流程来尽可能提升效率。

当别人说你没有底线的时候，你最好真的没有；当别人说你做过某些事的时候，你也最好真的做过。

六、接缝雕刻

在本章中，我们将学习有关内容感知的图像大小调整，这也称为接缝雕刻。我们将讨论如何检测图像中有趣的部分，以及如何使用该信息调整给定图像的大小而不会降低这些有趣元素的质量。

在本章结束时，您将了解：

什么是内容感知
如何量化和识别图像中有趣的部分
如何使用动态规划进行图像内容分析
如何在保持高度不变的情况下增加和减小图像的宽度而又不使兴趣区域恶化
如何使对象从图像中消失

我们为什么要关心接缝雕刻？

在开始有关接缝雕刻的讨论之前，我们需要首先了解为什么需要接缝雕刻。我们为什么要关心图像内容？为什么我们不能只是调整给定的图像大小并继续生活呢？好吧，要回答这些问题，让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YjgNlwWE-1681870996681)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/dd751d23-260b-4927-b523-6a1a12724de8.jpg)]

现在，假设我们要减小图像的宽度，同时保持高度不变。如果我们这样做，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7A5dqaCU-1681870996682)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/0fb1aacb-dfc3-455c-8d0a-d575cf735cf3.jpg)]

如您所见，图像中的鸭子看起来偏斜，并且图像的整体质量下降。直观地说，可以说鸭子是图像中有趣的部分。因此，当我们调整大小时，我们希望鸭子是完整的。这是缝制雕刻出现的地方。使用接缝雕刻，我们可以检测到这些有趣的区域，并确保它们不会退化。

它是如何工作的？

我们一直在讨论图像调整大小以及调整图像大小时应如何考虑图像的内容。那么为什么在地球上将其称为接缝雕刻呢？它应该只是称为内容感知图像调整大小，对吗？嗯，有许多不同的术语可用来描述此过程，例如图像重新定向，液体缩放，接缝雕刻等。由于我们调整图像大小的方式，因此将其称为“缝缝雕刻”。该算法由 Shai Avidan 和 Ariel Shamir 提出。您可以在这个页面上参考原始论文。

我们知道目标是调整给定图像的大小并保持完整有趣的内容。因此，我们通过找到图像中最不重要的路径来做到这一点。这些路径称为接缝。找到这些接缝后，便从图像中删除或拉伸它们以获得重新缩放的图像。移除，拉伸或雕刻的过程最终将导致图像调整大小。这就是我们称其为接缝雕刻的原因。考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2IBOJSo9-1681870996682)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/b5bfeef3-5e1f-41c8-959f-8c41b1865233.png)]

在上图中，我们可以看到如何将图像大致分为有趣和不有趣的部分。我们需要确保我们的算法能够检测到这些无关紧要的部分并对其进行处理。让我们考虑一下鸭子的形象和我们必须处理的约束。我们要保持高度恒定并减小宽度。这意味着我们需要在图像中找到垂直接缝并将其删除。这些接缝从顶部开始，在底部结束（反之亦然）。如果我们要处理垂直大小调整，则接缝将从左侧开始，到右侧结束。垂直接缝只是从图像的第一行开始到最后一行结束的一连串相连像素。

我们如何定义有趣？

在开始计算接缝之前，我们需要找出用于计算接缝的度量标准。我们需要一种将重要性分配给每个像素的方法，以便我们可以识别出最不重要的路径。用计算机视觉术语来说，我们需要为每个像素分配一个能量值，以便我们找到最小能量的路径。提出一种分配能量值的好方法非常重要，因为这会影响输出的质量。

我们可以使用的指标之一是每个点的导数值。这是该社区活动水平的良好指标。如果有活动，则像素值将快速变化，因此该点的导数值将很高。另一方面，如果区域平淡无趣，那么像素值将不会迅速变化，因此灰度图像中该点的导数值将很低。

对于每个像素位置，我们通过累加该点的 x 和 y 导数来计算能量。我们通过获取当前像素与其相邻像素之间的差来计算导数。回想一下，在使用第 2 章，“检测边缘并应用图像过滤器”中的 sobel 过滤器进行边缘检测时，我们做了类似的操作。一旦计算出这些值，便将它们存储在称为能量矩阵的矩阵中，该矩阵将用于定义接缝。

我们如何计算接缝？

现在我们有了能量矩阵，可以开始计算接缝了。我们需要找到能量最少的图像路径。计算所有可能的路径非常昂贵，因此我们需要找到一种更智能的方法来执行此操作。这是动态规划出现的地方。实际上，接缝雕刻是动态规划的直接应用。

我们需要从第一行中的每个像素开始，然后找到到达最后一行的方式。为了找到能量最少的路径，我们计算并存储了到表中每个像素的最佳路径。一旦我们构建了该表，就可以通过在该表中的行上回溯找到特定像素的路径。

对于当前行中的每个像素，我们计算下一行可以移动到的三个像素位置的能量；即左下，右下和右下。我们不断重复此过程，直到到达最低点。一旦到达最低点，我们就会选择累积值最小的那根，然后回溯到最高点。这将为我们提供最少的能量。每次删除接缝时，图像的宽度都会减少一个像素。因此，我们需要不断移除这些接缝，直到达到所需的图像尺寸为止。

首先，我们将提供一组函数来计算图像中的能量，定位其接缝并绘制它们。这些函数将与前面的每个代码示例一起使用，并且可以作为库包含在您的任何自定义项中：

# Draw vertical seam on top of the image 
def overlay_vertical_seam(img, seam): 
    img_seam_overlay = np.copy(img)

    # Extract the list of points from the seam 
    x_coords, y_coords = np.transpose([(i,int(j)) for i,j in enumerate(seam)]) 

    # Draw a green line on the image using the list of points 
    img_seam_overlay[x_coords, y_coords] = (0,255,0) 
    return img_seam_overlay

# Compute the energy matrix from the input image 
def compute_energy_matrix(img): 
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 

    # Compute X derivative of the image 
    sobel_x = cv2.Sobel(gray,cv2.CV_64F, 1, 0, ksize=3) 

    # Compute Y derivative of the image 
    sobel_y = cv2.Sobel(gray,cv2.CV_64F, 0, 1, ksize=3) 

    abs_sobel_x = cv2.convertScaleAbs(sobel_x) 
    abs_sobel_y = cv2.convertScaleAbs(sobel_y) 

    # Return weighted summation of the two images i.e. 0.5*X + 0.5*Y 
    return cv2.addWeighted(abs_sobel_x, 0.5, abs_sobel_y, 0.5, 0) 

# Find vertical seam in the input image 
def find_vertical_seam(img, energy): 
    rows, cols = img.shape[:2] 

    # Initialize the seam vector with 0 for each element 
    seam = np.zeros(img.shape[0]) 

    # Initialize distance and edge matrices 
    dist_to = np.zeros(img.shape[:2]) + float('inf')
    dist_to[0,:] = np.zeros(img.shape[1]) 
    edge_to = np.zeros(img.shape[:2]) 

    # Dynamic programming; iterate using double loop and compute the paths efficiently 
    for row in range(rows-1): 
        for col in range(cols): 
            if col != 0 and dist_to[row+1, col-1] > dist_to[row, col] + energy[row+1, col-1]: 
                dist_to[row+1, col-1] = dist_to[row, col] + energy[row+1, col-1]
                edge_to[row+1, col-1] = 1 

            if dist_to[row+1, col] > dist_to[row, col] + energy[row+1, col]: 
                dist_to[row+1, col] = dist_to[row, col] + energy[row+1, col] 
                edge_to[row+1, col] = 0 

            if col != cols-1 and \ 
                dist_to[row+1, col+1] > dist_to[row, col] + energy[row+1, col+1]: 
                    dist_to[row+1, col+1] = dist_to[row, col] + energy[row+1, col+1] 
                    edge_to[row+1, col+1] = -1 

    # Retracing the path 
    # Returns the indices of the minimum values along X axis.
    seam[rows-1] = np.argmin(dist_to[rows-1, :]) 
    for i in (x for x in reversed(range(rows)) if x > 0): 
        seam[i-1] = seam[i] + edge_to[i, int(seam[i])] 

    return seam

让我们再次考虑我们的鸭子形象。如果我们计算前 30 个接缝，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4BQ47Aib-1681870996682)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/ff9b72b4-f929-4011-b00d-364c3a52305b.png)]

这些绿线表示最不重要的路径。正如我们在这里看到的那样，它们会小心翼翼地绕过鸭子，以确保不会触碰到有趣的区域。在图像的上半部分，接缝围绕着树枝缠绕，从而保持了质量。从技术上讲，树枝也很有趣。如果我们继续删除前 100 个接缝，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6QPRvfFG-1681870996683)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/36a676fe-cd3f-4602-99b5-c5f8a7547c0b.png)]

现在，将其与简单调整大小的图像进行比较。看起来不是更好吗？鸭子在这个版本中看起来更好。

让我们看一下代码，看看如何做：

import sys 
import cv2 
import numpy as np 

# Remove the input vertical seam from the image 
def remove_vertical_seam(img, seam): 
    rows, cols = img.shape[:2] 

    # To delete a point, move every point after it one step towards the left 
    for row in range(rows): 
        for col in range(int(seam[row]), cols-1): 
            img[row, col] = img[row, col+1] 

    # Discard the last column to create the final output image 
    img = img[:, 0:cols-1] 
    return img 

if __name__=='__main__': 
    # Make sure the size of the input image is reasonable. 
    # Large images take a lot of time to be processed. 
    # Recommended size is 640x480\. 
    img_input = cv2.imread(sys.argv[1]) 

    # Use a small number to get started. Once you get an 
    # idea of the processing time, you can use a bigger number. 
    # To get started, you can set it to 20\. 
    num_seams = int(sys.argv[2]) 

    img = np.copy(img_input) 
    img_overlay_seam = np.copy(img_input) 
    energy = compute_energy_matrix(img) 

    for i in range(num_seams): 
        seam = find_vertical_seam(img, energy) 
        img_overlay_seam = overlay_vertical_seam(img_overlay_seam, seam) 
        img = remove_vertical_seam(img, seam) 
        energy = compute_energy_matrix(img) 
        print('Number of seams removed = ', i+1) 

    cv2.imshow('Input', img_input) 
    cv2.imshow('Seams', img_overlay_seam) 
    cv2.imshow('Output', img) 
    cv2.waitKey()

我们使用remove_vertical_seam从原始图像中去除垂直接缝，从而减小了图像的宽度，但保留了有趣的部分。

我们可以扩大图像吗？

我们知道，我们可以使用接缝雕刻来减小图像的宽度，而不会降低有趣的区域。因此，自然地，我们需要问自己是否可以扩展图像而不会破坏有趣的区域。事实证明，我们可以使用相同的逻辑来做到这一点。在计算接缝时，我们只需要添加一列而不是删除一列。

如果我们简单地扩大鸭子的形象，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zKniwqb3-1681870996683)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/cb4531a6-8508-422a-bfc8-301d93fc5b7d.jpg)]

如果我们以更聪明的方式（即通过使用接缝雕刻）来进行操作，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rTxyKqDS-1681870996683)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/815130c0-2b94-44f7-b77a-5949a298df2f.png)]

如您所见，图像的宽度增加了，鸭子看起来没有拉长。以下是执行此操作的代码：

import sys 
import cv2 
import numpy as np 

# Add a vertical seam to the image 
def add_vertical_seam(img, seam, num_iter): 
    seam = seam + num_iter 
    rows, cols = img.shape[:2] 
    zero_col_mat = np.zeros((rows,1,3), dtype=np.uint8) 
    img_extended = np.hstack((img, zero_col_mat)) 

    for row in range(rows): 
        for col in range(cols, int(seam[row]), -1): 
            img_extended[row, col] = img[row, col-1] 

        # To insert a value between two columns, take the average 
        # value of the neighbors. It looks smooth this way and we 
        # can avoid unwanted artifacts. 
        for i in range(3): 
            v1 = img_extended[row, int(seam[row])-1, i] 
            v2 = img_extended[row, int(seam[row])+1, i] 
            img_extended[row, int(seam[row]), i] = (int(v1)+int(v2))/2 

    return img_extended 

if __name__=='__main__': 
    img_input = cv2.imread(sys.argv[1]) 
    num_seams = int(sys.argv[2]) 

    img = np.copy(img_input) 
    img_output = np.copy(img_input) 
    img_overlay_seam = np.copy(img_input) 
    energy = compute_energy_matrix(img) # Same than previous code sample

    for i in range(num_seams): 
        seam = find_vertical_seam(img, energy) # Same than previous code sample
        img_overlay_seam = overlay_vertical_seam(img_overlay_seam, seam)
        img = remove_vertical_seam(img, seam) # Same than previous code sample
        img_output = add_vertical_seam(img_output, seam, i) 
        energy = compute_energy_matrix(img) 
        print('Number of seams added =', i+1)

    cv2.imshow('Input', img_input) 
    cv2.imshow('Seams', img_overlay_seam)
    cv2.imshow('Output', img_output) 
    cv2.waitKey()

在这种情况下，我们在代码中添加了一个额外的函数add_vertical_seam。我们使用它来添加垂直接缝，以使图像看起来自然，在不改变有趣区域原始比例的情况下，向图像添加接缝以增加其宽度。

我们可以完全去除对象吗？

这也许是接缝雕刻最有趣的应用。我们可以使物体从图像中完全消失。让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rsrDEcOX-1681870996683)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/863078d5-84ea-4757-8849-d9ef4abc1ea6.jpg)]

封闭要使用鼠标删除的区域：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VO86M3yd-1681870996684)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/ea400a6d-c475-4ad8-b600-522b274b5ed8.png)]

卸下右侧的椅子后，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TiW5GySX-1681870996684)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/5c5798f6-6f62-4a1f-b94f-9a58ae7a6587.png)]

好像椅子根本不存在！在查看代码之前，重要的是要知道这需要一段时间才能运行。因此，只需等待几分钟即可了解处理时间。您可以相应地调整输入图像的尺寸！让我们看一下代码：

import sys
import cv2 
import numpy as np 

# Draw rectangle on top of the input image 
def draw_rectangle(event, x, y, flags, params): 
    global x_init, y_init, drawing, top_left_pt, bottom_right_pt, img_orig 

    # Detecting a mouse click 
    if event == cv2.EVENT_LBUTTONDOWN: 
        drawing = True 
        x_init, y_init = x, y 

    # Detecting mouse movement 
    elif event == cv2.EVENT_MOUSEMOVE: 
        if drawing: 
            top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) 
            img[y_init:y, x_init:x] = 255 - img_orig[y_init:y, x_init:x] 
            cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2) 

    # Detecting the mouse button up event 
    elif event == cv2.EVENT_LBUTTONUP: 
        drawing = False 
        top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) 

        # Create the "negative" film effect for the selected 
         # region 
        img[y_init:y, x_init:x] = 255 - img[y_init:y, x_init:x] 

        # Draw rectangle around the selected region 
        cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2) 
        rect_final = (x_init, y_init, x-x_init, y-y_init) 

        # Remove the object in the selected region 
        remove_object(img_orig, rect_final) 

# Computing the energy matrix using modified algorithm 
def compute_energy_matrix_modified(img, rect_roi):
    # Compute weighted summation i.e. 0.5*X + 0.5*Y 
    energy_matrix = compute_energy_matrix(img)
    x,y,w,h = rect_roi 

    # We want the seams to pass through this region, so make sure the energy values in this region are set to 0 
    energy_matrix[y:y+h, x:x+w] = 0 

    return energy_matrix 

# Remove the object from the input region of interest 
def remove_object(img, rect_roi): 
    num_seams = rect_roi[2] + 10 
    energy = compute_energy_matrix_modified(img, rect_roi) 

    # Start a loop and rsemove one seam at a time 
    for i in range(num_seams): 
        # Find the vertical seam that can be removed 
        seam = find_vertical_seam(img, energy) 

        # Remove that vertical seam 
        img = remove_vertical_seam(img, seam) 
        x,y,w,h = rect_roi 

        # Compute energy matrix after removing the seam 
        energy = compute_energy_matrix_modified(img, (x,y,w-i,h)) 
        print('Number of seams removed =', i+1)

    img_output = np.copy(img) 

    # Fill up the region with surrounding values so that the size 
    # of the image remains unchanged 
    for i in range(num_seams): 
        seam = find_vertical_seam(img, energy) 
        img = remove_vertical_seam(img, seam)
        img_output = add_vertical_seam(img_output, seam, i)
        energy = compute_energy_matrix(img) 
        print('Number of seams added =', i+1) 

    cv2.imshow('Input', img_input) 
    cv2.imshow('Output', img_output) 
    cv2.waitKey() 

if __name__=='__main__': 
    img_input = cv2.imread(sys.argv[1])
    drawing = False 
    img = np.copy(img_input) 
    img_orig = np.copy(img_input) 

    cv2.namedWindow('Input') 
    cv2.setMouseCallback('Input', draw_rectangle)
    print('Draw a rectangle with the mouse over the object to be removed')
    while True:
        cv2.imshow('Input', img) 
        c = cv2.waitKey(10) 
        if c == 27: 
            break 

    cv2.destroyAllWindows()

我们是怎么做的？

这里的基本逻辑保持不变。我们正在使用接缝雕刻来移除对象。一旦选择了兴趣区域，我们就使所有接缝都穿过该区域。我们通过在每次迭代后操纵能量矩阵来做到这一点。我们添加了一个名为compute_energy_matrix_modified的新函数来实现此目的。一旦我们计算了能量矩阵，便将零值分配给该兴趣区域。这样，我们强制所有接缝穿过该区域。删除与该区域相关的所有接缝后，我们将继续添加接缝，直到将图像扩展到其原始宽度为止。

总结

在本章中，我们了解了内容感知图像的大小调整。我们讨论了如何量化图像中有趣和无趣的区域。我们学习了如何计算图像中的接缝，以及如何使用动态规划有效地进行处理。我们讨论了如何使用接缝雕刻来减小图像的宽度，以及如何使用相同的逻辑来扩展图像。我们还学习了如何从图像中完全删除对象。

在下一章中，我们将讨论如何进行形状分析和图像分割。我们将看到如何使用这些原理来找到图像中感兴趣对象的确切边界。

七、检测形状和分割图像

在本章中，我们将学习形状分析和图像分割。我们将学习如何识别形状并估计确切边界。我们将讨论如何使用各种方法将图像分割成其组成部分。我们还将学习如何将前景与背景分开。

在本章结束时，您将了解：

什么是轮廓分析和形状匹配
如何匹配形状
什么是图像分割
如何将图像分割成其组成部分
如何将前景与背景分开
如何使用各种技术分割图像

轮廓分析和形状匹配

轮廓分析是计算机视觉领域中非常有用的工具。我们处理现实世界中的许多形状，轮廓分析有助于使用各种算法分析这些形状。当我们将图像转换为灰度并对其进行阈值处理时，我们会留下一堆线条和轮廓。一旦了解了不同形状的属性，便可以从图像中提取详细信息。

假设我们要在下图中标识飞旋镖形状：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bzRIF2IW-1681870996684)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/4457dbe8-14d8-4bd9-bf79-3f32dfaa6c81.png)]

为此，我们首先需要了解常规回旋镖的外观：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CdjrO6Wn-1681870996684)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/3a4eb77c-e2a7-4e2a-8b8d-5fd77ce08f3f.png)]

现在，以前面的图像为参考，我们是否可以识别原始图像中与回旋镖相对应的形状？如果您注意到，我们不能使用简单的基于相关性的方法，因为形状都会变形。这意味着我们寻找精确匹配的方法几乎行不通！我们需要了解形状的特征并匹配相应的特征以识别飞旋镖形状。 OpenCV 提供了几个形状匹配器工具，我们可以使用它们来实现此目的。如果您想了解更多信息，请访问这里了解更多信息。

匹配基于胡矩的概念，而后者又与图像矩有关。您可以参考以下论文以了解有关矩的更多信息。 图像矩的概念基本上是指形状内像素的加权和乘幂加和。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AOsCAzMH-1681870996685)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/5839b1bc-a240-4da1-ba2c-168d03ebea0a.jpg)]

在上式中，p表示轮廓内的像素，w表示权重，N表示轮廓内的点数，k表示功率，I表示矩。根据我们为w和k选择的值，我们可以从该轮廓提取不同的特征。

也许最简单的例子是计算轮廓的面积。为此，我们需要计算该区域内的像素数。因此，从数学上讲，在加权和加幂求和形式中，我们只需要将w设置为 1，将k设置为零。这将为我们提供轮廓区域。根据我们如何计算这些矩，它们将帮助我们理解这些不同的形状。这也产生了一些有趣的属性，可以帮助我们确定形状相似度。

如果我们匹配形状，您将看到类似以下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sFlP8Q9e-1681870996685)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/07453c2f-478b-444c-802b-aeeccf6fedee.png)]

让我们看一下执行此操作的代码：

import cv2 
import numpy as np 

# Extract all the contours from the image 
def get_all_contours(img): 
    ref_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
    ret, thresh = cv2.threshold(ref_gray, 127, 255, 0) 
    # Find all the contours in the thresholded image. The values 
    # for the second and third parameters are restricted to a 
    # certain number of possible values.
    im2, contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_LIST, \
       cv2.CHAIN_APPROX_SIMPLE )
    return contours

# Extract reference contour from the image 
def get_ref_contour(img): 
    contours = get_all_contours(img)

    # Extract the relevant contour based on area ratio. We use the 
    # area ratio because the main image boundary contour is 
    # extracted as well and we don't want that. This area ratio 
    # threshold will ensure that we only take the contour inside the image. 
    for contour in contours: 
        area = cv2.contourArea(contour) 
        img_area = img.shape[0] * img.shape[1] 
        if 0.05 < area/float(img_area) < 0.8: 
            return contour 

if __name__=='__main__': 
    # Boomerang reference image 
    img1 = cv2.imread(sys.argv[1]) 

    # Input image containing all the different shapes 
    img2 = cv2.imread(sys.argv[2]) 

    # Extract the reference contour 
    ref_contour = get_ref_contour(img1)

    # Extract all the contours from the input image 
    input_contours = get_all_contours(img2) 

        closest_contour = None
    min_dist = None
    contour_img = img2.copy()
    cv2.drawContours(contour_img, input_contours, -1, color=(0,0,0), thickness=3) 
    cv2.imshow('Contours', contour_img)
    # Finding the closest contour 
    for contour in input_contours: 
        # Matching the shapes and taking the closest one using 
        # Comparison method CV_CONTOURS_MATCH_I3 (second argument)
        ret = cv2.matchShapes(ref_contour, contour, 3, 0.0)
        print("Contour %d matchs in %f" % (i, ret))
        if min_dist is None or ret < min_dist:
            min_dist = ret 
            closest_contour = contour

    cv2.drawContours(img2, [closest_contour], 0 , color=(0,0,0), thickness=3) 
    cv2.imshow('Best Matching', img2)
    cv2.waitKey()

matchShapes方法的使用可能与胡矩不变量（CV_CONTOUR_MATCH_I1,2,3）不同，后者由于轮廓的大小，方向或旋转而可能产生不同的最佳匹配形状。要了解更多信息，可以在这个页面上查看官方文档。

近似轮廓

我们在现实生活中遇到的许多轮廓都很嘈杂。这意味着轮廓看起来不平滑，因此我们的分析受到了打击。那么，我们该如何处理呢？一种解决方法是获取轮廓上的所有点，然后使用平滑多边形对其进行近似。

让我们再次考虑飞旋镖的形象。如果使用各种阈值近似轮廓，则会看到轮廓改变其形状。让我们从 0.05 开始：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iFkftHPy-1681870996685)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/96c6beb5-67bf-402b-98a5-d171eba96594.png)]

如果减小此因子，轮廓将变得更平滑。让我们使其为 0.01：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NWiY9ZDz-1681870996685)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/c22bcfe0-8b7e-4cf3-a238-7dbac4c06863.png)]

如果您将其缩小，例如说 0.00001，那么它将看起来像原始图像：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p0q7Bdpv-1681870996686)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/9db1dc72-2960-4683-a271-66d1f76cc7dc.png)]

以下代码表示如何将这些轮廓转换为多边形的近似平滑化：

import sys 
import cv2 
import numpy as np 

if __name__=='__main__': 
    # Input image containing all the different shapes 
    img1 = cv2.imread(sys.argv[1]) 
    # Extract all the contours from the input image 
    input_contours = get_all_contours(img1) 

    contour_img = img1.copy()
    smoothen_contours = []
    factor = 0.05

    # Finding the closest contour 
    for contour in input_contours: 
        epsilon = factor * cv2.arcLength(contour, True) 
        smoothen_contours.append(cv2.approxPolyDP(contour, epsilon, True)) 

    cv2.drawContours(contour_img, smoothen_contours, -1, color=(0,0,0), thickness=3) 
    cv2.imshow('Contours', contour_img)
    cv2.waitKey()

识别出一片突出的披萨

标题可能会引起误导，因为我们不会谈论披萨片。但是，假设您所处的图像包含不同类型的不同形状的比萨饼。现在，有人从其中一个比萨饼中切出一片。我们如何自动识别这一点？

我们无法采用之前采用的方法，因为我们不知道形状是什么样子，因此我们没有任何模板。我们甚至不确定我们要寻找的形状，因此我们无法基于任何先验信息构建模板。我们所知道的是从一个比萨饼上切下一片的事实。让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NrvD8Gyr-1681870996686)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/e21f6ac3-67d3-4d4e-8f2b-132608b754bf.png)]

这不完全是真实的图像，但是您可以理解。你知道我们在谈论什么形状。由于我们不知道要寻找的是什么，因此需要使用这些形状的某些属性来识别切片的比萨饼。如果您注意到，所有其他形状都很好地封闭了；也就是说，您可以在这些形状中选取任意两个点并在它们之间画一条线，并且该线将始终位于该形状之内。这些形状称为凸形。

如果您查看切片的比萨饼形状，我们可以选择两个点，使它们之间的线超出形状，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tkuBFPZk-1681870996686)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/33acdeef-c4a0-4f97-9ffd-dcc824633c8b.png)]

因此，我们要做的就是检测图像中的非凸形状，然后就可以完成。让我们继续这样做：

import sys 
import cv2 
import numpy as np 

if __name__=='__main__': 
    img = cv2.imread(sys.argv[1]) 

    # Iterate over the extracted contours
    # Using previous get_all_contours() method
    for contour in get_all_contours(img): 
        # Extract convex hull from the contour 
        hull = cv2.convexHull(contour, returnPoints=False) 

        # Extract convexity defects from the above hull
        # Being a convexity defect the cavities in the hull segments
        defects = cv2.convexityDefects(contour, hull) 

        if defects is None: 
            continue 

        # Draw lines and circles to show the defects 
        for i in range(defects.shape[0]):
            start_defect, end_defect, far_defect, _ = defects[i,0] 
            start = tuple(contour[start_defect][0]) 
            end = tuple(contour[end_defect][0]) 
            far = tuple(contour[far_defect][0]) 
            cv2.circle(img, far, 5, [128,0,0], -1) 
            cv2.drawContours(img, [contour], -1, (0,0,0), 3) 

    cv2.imshow('Convexity defects',img) 
    cv2.waitKey(0) 
    cv2.destroyAllWindows()

要了解有关convexityDefects工作原理的更多信息，请访问这里。

如果运行前面的代码，您将看到类似以下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-68jRvqAL-1681870996686)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/f4c18d7e-e61f-40fa-bef3-e949d7949a94.png)]

等一下，这是怎么回事？看起来很混乱。我们做错了吗？事实证明，曲线并不是很平滑。如果仔细观察，曲线上到处都有细小的山脊。因此，如果仅运行凸度检测器，它将无法正常工作。

这是轮廓近似非常有用的地方。一旦检测到轮廓，就需要对其进行平滑处理，以免脊不影响它们。让我们继续这样做：

factor = 0.01
epsilon = factor * cv2.arcLength(contour, True) 
contour = cv2.approxPolyDP(contour, epsilon, True)

如果使用平滑轮廓运行前面的代码，则输出将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QkX1THc2-1681870996686)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/76ea2953-abbe-4d38-b55c-0a02934bd469.png)]

如何检查形状？

假设您正在处理图像，并且想要遮挡特定的形状。现在，您可能会说您将使用形状匹配来识别形状，然后将其屏蔽掉，对吗？但是这里的问题是我们没有可用的模板。那么，我们如何去做呢？形状分析有多种形式，我们需要根据情况构建算法。让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Kftfxz2O-1681870996687)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/a8e59706-67b1-4f7c-948f-77b109d8c0e8.png)]

假设我们要识别所有回旋镖形状，然后不使用任何模板图像就将它们遮挡掉。如您所见，该图像中还有其他各种怪异的形状，而飞旋镖形状并不是很平滑。我们需要确定将飞旋镖形状与当前其他形状区分开的属性。让我们考虑凸包。如果采用每种形状的面积与凸包的面积之比，我们可以看到这可以作为区别指标。该度量在形状分析中称为坚固性因子。该度量标准对于回旋镖形状而言具有较低的值，因为将留出空白区域，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-G2f0XTeb-1681870996687)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/b811519f-7754-4034-bc1a-755c02303376.png)]

黑色边界代表凸包。一旦为所有形状计算了这些值，我们如何将它们分开？我们可以仅使用固定的阈值来检测回旋镖形状吗？并不是的！我们无法使用固定的阈值，因为您永远不知道以后会遇到哪种形状。因此，更好的方法是使用 K 均值聚类。 K 均值是一种无监督的学习技术，可用于将输入数据分离为 K 类。在继续之前，您可以在这里快速掌握 K 均值。

我们知道我们想将形状分为两组，即回旋镖形状和其他形状。因此，我们知道 K 均值中的K是什么。一旦使用该值并对值进行聚类，我们将选择具有最低实体因子的聚类，这将为我们提供回旋镖形状。请记住，这种方法仅在特定情况下有效。如果要处理其他类型的形状，则必须使用其他指标来确保形状检测有效。正如我们前面所讨论的，这在很大程度上取决于情况。如果检测到形状并将其屏蔽掉，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tweGiI0f-1681870996687)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/2214870b-706c-4cae-956f-cb954255a2d7.png)]

以下是执行此操作的代码：

import sys 
import cv2 
import numpy as np 

if __name__=='__main__': 
    # Input image containing all the shapes 
    img = cv2.imread(sys.argv[1]) 

    img_orig = np.copy(img) 
    input_contours = get_all_contours(img) 
    solidity_values = [] 

    # Compute solidity factors of all the contours 
    for contour in input_contours: 
        area_contour = cv2.contourArea(contour) 
        convex_hull = cv2.convexHull(contour) 
        area_hull = cv2.contourArea(convex_hull) 
        solidity = float(area_contour)/area_hull 
        solidity_values.append(solidity) 

    # Clustering using KMeans 
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0) 
    flags = cv2.KMEANS_RANDOM_CENTERS 
    solidity_values = \               np.array(solidity_values).reshape((len(solidity_values),1)).astype('float32') 
    compactness, labels, centers = cv2.kmeans(solidity_values, 2, None, criteria, 10, flags) 

    closest_class = np.argmin(centers) 
    output_contours = [] 
    for i in solidity_values[labels==closest_class]: 
        index = np.where(solidity_values==i)[0][0] 
        output_contours.append(input_contours[index]) 

    cv2.drawContours(img, output_contours, -1, (0,0,0), 3) 
    cv2.imshow('Output', img) 

    # Censoring 
    for contour in output_contours: 
        rect = cv2.minAreaRect(contour) 
        box = cv2.boxPoints(rect) 
        box = np.int0(box) 
        cv2.drawContours(img_orig, [box], 0, (0,0,0), -1) 

    cv2.imshow('Censored', img_orig) 
    cv2.waitKey()

什么是图像分割？

图像分割是将图像分成其组成部分的过程。这是现实世界中许多计算机视觉应用中的重要一步。分割图像有很多不同的方法。分割图像时，我们会根据各种指标（例如颜色，纹理，位置等）将区域分开。每个区域内的所有像素都有一些共同点，具体取决于我们使用的指标。让我们看看这里的一些流行方法。

首先，我们将研究一种称为 GrabCut 的技术。这是基于称为图切割的更通用方法的图像分割方法。在图切方法中，我们将整个图像视为一个图，然后根据该图边缘的强度对图进行分段。我们通过将每个像素视为一个节点来构造图，并在节点之间构造边缘，其中边缘权重是这两个节点的像素值的函数。只要有边界，像素值就会更高。因此，边缘权重也将更高。然后通过最小化图的吉布斯能量来对该图进行分段。这类似于找到最大熵分割。您可以在这个页面上参考原始论文以了解更多信息。

让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pQwxgMd9-1681870996687)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/47a34ddd-a2db-4d61-8ecc-13cf848a4199.jpg)]

让我们选择兴趣区域：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VOSOpIHP-1681870996687)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/a85b7791-5704-4486-95e5-fbdf7f6fdbe5.png)]

分割图像后，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fNwnAL58-1681870996688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/b62e10ec-35e8-489a-b978-b7e42ac8633c.png)]

以下是执行此操作的代码：

import sys
import cv2 
import numpy as np 

# Draw rectangle based on the input selection 
def draw_rectangle(event, x, y, flags, params): 
    global x_init, y_init, drawing, top_left_pt, bottom_right_pt, img_orig 

    # Detecting mouse button down event 
    if event == cv2.EVENT_LBUTTONDOWN: 
        drawing = True 
        x_init, y_init = x, y 

    # Detecting mouse movement 
    elif event == cv2.EVENT_MOUSEMOVE: 
        if drawing: 
            top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) 
            img[y_init:y, x_init:x] = 255 - img_orig[y_init:y, x_init:x]
            cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2) 

    # Detecting mouse button up event 
    elif event == cv2.EVENT_LBUTTONUP: 
        drawing = False 
        top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) 
        img[y_init:y, x_init:x] = 255 - img[y_init:y, x_init:x] 
        cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2)
        rect_final = (x_init, y_init, x-x_init, y-y_init) 

        # Run Grabcut on the region of interest 
        run_grabcut(img_orig, rect_final) 

# Grabcut algorithm 
def run_grabcut(img_orig, rect_final): 
    # Initialize the mask 
    mask = np.zeros(img_orig.shape[:2],np.uint8) 

    # Extract the rectangle and set the region of 
    # interest in the above mask 
    x,y,w,h = rect_final 
    mask[y:y+h, x:x+w] = 1 

    # Initialize background and foreground models 
    bgdModel = np.zeros((1,65), np.float64) 
    fgdModel = np.zeros((1,65), np.float64) 

    # Run Grabcut algorithm 
    cv2.grabCut(img_orig, mask, rect_final, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) 

    # Extract new mask 
    mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8') 

    # Apply the above mask to the image 
    img_orig = img_orig*mask2[:,:,np.newaxis] 

    # Display the image 
    cv2.imshow('Output', img_orig) 

if __name__=='__main__': 
    drawing = False 
    top_left_pt, bottom_right_pt = (-1,-1), (-1,-1) 

    # Read the input image 
    img_orig = cv2.imread(sys.argv[1]) 
    img = img_orig.copy() 

    cv2.namedWindow('Input') 
    cv2.setMouseCallback('Input', draw_rectangle) 

    while True: 
        cv2.imshow('Input', img) 
        c = cv2.waitKey(1) 
        if c == 27: 
            break 

    cv2.destroyAllWindows()

它是如何工作的？

我们从用户指定的种子点开始。这是我们有兴趣的对象的边界框。该算法在表面之下估计对象和背景的颜色分布。该算法将图像的颜色分布表示为高斯混合马尔可夫随机场（GMMRF）。您可以在这个页面上参考详细的文章以了解有关 GMMRF 的更多信息。我们需要对象和背景的颜色分布，因为我们将使用此知识来分离对象。通过将最小割算法应用于 Markov 随机场，此信息可用于找到最大熵分割。一旦有了这个，我们就可以使用图切割优化方法来推断标签。

分水岭算法

OpenCV 随附分水岭算法的默认实现，该理论认为，任何灰度图像都可以被视为地形表面，高强度表示山峰和丘陵，而低强度表示山谷。该算法非常有名，并且有很多实现方式。

考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LDEiqUF9-1681870996688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/c4cadede-a133-4d49-b6e1-d554234b5e00.jpg)]

让我们根据其地形表面选择区域：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Hiergvfx-1681870996688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/db6b57e3-0f29-4b16-9b6f-847d0141d710.png)]

如果对此运行分水岭算法，输出将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nh3HjJeK-1681870996688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/cb16da87-c963-4d04-abc5-eafc8f827eed.png)]

可以在前面给出的链接上找到示例代码，以及分水岭算法的许多其他应用。

总结

在本章中，我们学习了轮廓分析和图像分割。我们学习了如何根据模板匹配形状。我们了解了形状的各种不同属性，以及如何使用它们识别不同种类的形状。我们讨论了图像分割以及如何使用基于图的方法对图像中的区域进行分割。我们还简要讨论了分水岭的改造。

在下一章中，我们将讨论如何在实时视频中跟踪对象。

八、对象跟踪

在本章中，我们将学习有关在实时视频中跟踪对象的信息。我们将讨论可用于跟踪对象的不同特征。我们还将学习有关对象跟踪的不同方法和技术。

在本章结束时，您将了解：

如何使用帧差异
如何使用色彩空间跟踪有色对象
如何构建交互式对象跟踪器
如何构建特征跟踪器
如何建立视频监控系统

帧差异

这可能是我们可以用来查看视频的哪些部分正在移动的最简单的技术。当我们考虑实时视频流时，连续帧之间的差异为我们提供了很多信息。这个概念非常简单！我们只求连续帧之间的差异并显示差异。

如果我们从左向右快速移动，我们将看到类似以下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZPSNNIMU-1681870996688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/3fb8de3d-4303-4ae1-8dbc-71a00d7f4a3b.png)]

从上一张图像可以看到，只有视频中的移动部分被突出显示。这为我们提供了一个很好的起点，以了解视频中哪些区域正在移动。这是执行此操作的代码：

import cv2 

# Compute the frame difference 
def frame_diff(prev_frame, cur_frame, next_frame): 
    # Absolute difference between current frame and next frame 
    diff_frames1 = cv2.absdiff(next_frame, cur_frame) 

    # Absolute difference between current frame and 
     # previous frame 
    diff_frames2 = cv2.absdiff(cur_frame, prev_frame) 

    # Return the result of bitwise 'AND' between the 
    # above two resultant images to obtain a mask where
    # only the areas with white pixels are shown
    return cv2.bitwise_and(diff_frames1, diff_frames2) 

# Capture the frame from webcam 
def get_frame(cap, scaling_factor): 
    # Capture the frame 
    ret, frame = cap.read() 

    # Resize the image 
    frame = cv2.resize(frame, None, fx=scaling_factor, 
            fy=scaling_factor, interpolation=cv2.INTER_AREA) 

    return frame

if __name__=='__main__': 
    cap = cv2.VideoCapture(0) 
    scaling_factor = 0.5

    cur_frame, prev_frame, next_frame = None, None, None
    while True:
        frame = get_frame(cap, scaling_factor)
        prev_frame = cur_frame 
        cur_frame = next_frame
        # Convert frame to grayscale image 
        next_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)

        if prev_frame is not None:
            cv2.imshow("Object Movement", frame_diff(prev_frame, cur_frame, next_frame)) 

        key = cv2.waitKey(delay=10) 
        if key == 27: 
            break 

    cv2.destroyAllWindows()

已使用 10 毫秒的延迟来使帧之间有足够的时间来产生实际的显着差异。

基于色彩空间的跟踪

帧差异为我们提供了一些有用的信息，但是我们不能使用它来构建有意义的任何东西。为了构建一个好的对象跟踪器，我们需要了解可以使用哪些特征来使我们的跟踪功能强大而准确。因此，让我们朝这个方向迈出一步，看看如何使用颜色空间提出一个好的跟踪器。正如我们在前几章中讨论的那样，当涉及到人类感知时， HSV 颜色空间非常有用。我们可以将图像转换为 HSV 空间，然后使用颜色空间阈值跟踪给定的对象。

考虑视频中的以下帧：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XyU1ggEd-1681870996689)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/92761528-d91a-4ead-810a-475dd56eec7d.png)]

如果通过颜色空间过滤器运行它并跟踪对象，则会看到以下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LPbn4SaN-1681870996689)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8eff3765-8cb3-4e1c-aee7-c4fab8339208.png)]

就像我们在这里看到的那样，我们的跟踪器会根据颜色特征识别视频中的特定对象。为了使用此跟踪器，我们需要知道目标对象的颜色分布。以下是代码：

import cv2 
import numpy as np 

if __name__=='__main__': 
    cap = cv2.VideoCapture(0) 
    scaling_factor = 0.5 

    # Define 'blue' range in HSV color space 
    lower = np.array([60,100,100]) 
    upper = np.array([180,255,255]) 

    while True: 
        frame = get_frame(cap, scaling_factor) 

        # Convert the HSV color space 
        hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) 

        # Threshold the HSV image to get only blue color 
        mask = cv2.inRange(hsv_frame, lower, upper) 

        # Bitwise-AND mask and original image 
        res = cv2.bitwise_and(frame, frame, mask=mask) 
        res = cv2.medianBlur(res, ksize=5) 

        cv2.imshow('Original image', frame) 
        cv2.imshow('Color Detector', res) 

        # Check if the user pressed ESC key 
        c = cv2.waitKey(delay=10) 
        if c == 27: 
            break 

    cv2.destroyAllWindows()

构建一个交互式对象跟踪器

基于颜色空间的跟踪器使我们可以自由跟踪有色对象，但是我们也只能使用预定义的颜色。如果我们只想随机选择一个对象怎么办？我们如何构建一个对象跟踪器，以了解所选对象的特征并自动跟踪它？这就是 CAMShift 算法的意思，它代表连续自适应均值偏移。它基本上是 Meanshift 算法的改进版本。

Meanshift 的概念实际上很简单。假设我们选择了一个兴趣区域，并且希望我们的对象跟踪器跟踪该对象。在该区域中，我们基于颜色直方图选择一堆点并计算质心。如果质心位于该区域的中心，则说明该对象没有移动。但是，如果质心不在此区域的中心，那么我们知道对象正在朝某个方向移动。重心的移动控制对象移动的方向。因此，我们将边界框移动到新位置，以便新质心成为此边界框的中心。因此，此算法称为均值移位，因为均值（即质心）正在移动。这样，我们就可以使用对象的当前位置进行更新。

但是，Meanshift 的问题在于不允许更改边界框的大小。当您将物体移离相机时，人眼看起来物体会变小，但是 Meanshift 不会考虑这一点。在整个跟踪会话中，边界框的大小将保持不变。因此，我们需要使用 CAMShift。 CAMShift 的优点在于，它可以使边界框的大小适合对象的大小。除此之外，它还可以跟踪对象的方向。

让我们考虑以下框架，其中对象以橙色突出显示（我手中的框）：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p2U9JB38-1681870996690)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/59da5971-702e-4a60-9a44-74dc732af878.png)]

现在我们已经选择了对象，该算法将计算直方图反投影并提取所有信息。让我们移动对象，看看它是如何被跟踪的：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9q6giMl6-1681870996690)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/1cc303f9-9af3-47eb-9226-104d71abfed8.png)]

看起来该对象已被很好地跟踪。让我们更改方向，看看是否保持跟踪：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LT0SOiPc-1681870996690)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/17a86061-be3a-4a84-9425-990cb3fbfce8.png)]

如我们所见，边界椭圆改变了它的位置和方向。让我们更改对象的视角，看看它是否仍然可以跟踪它：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uU9BqLbr-1681870996691)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/ba67d9bd-590e-49ca-af1d-4b57f80ed501.png)]

我们还是很好！边界椭圆更改了宽高比，以反映对象现在看起来偏斜的事实（由于透视变换）。

以下是代码：

import sys 
import cv2 
import numpy as np 

class ObjectTracker(): 
    def __init__(self): 
        # Initialize the video capture object 
        # 0 -> indicates that frame should be captured 
        # from webcam 
        self.cap = cv2.VideoCapture(0) 

        # Capture the frame from the webcam 
        ret, self.frame = self.cap.read() 

        # Downsampling factor for the input frame 
        self.scaling_factor = 0.8 
        self.frame = cv2.resize(self.frame, None, fx=self.scaling_factor, fy=self.scaling_factor, interpolation=cv2.INTER_AREA) 

        cv2.namedWindow('Object Tracker') 
        cv2.setMouseCallback('Object Tracker', self.mouse_event) 

        self.selection = None 
        self.drag_start = None 
        self.tracking_state = 0 

    # Method to track mouse events 
    def mouse_event(self, event, x, y, flags, param): 
        x, y = np.int16([x, y]) 

        # Detecting the mouse button down event 
        if event == cv2.EVENT_LBUTTONDOWN: 
            self.drag_start = (x, y) 
            self.tracking_state = 0 

        if self.drag_start:
            if event == cv2.EVENT_MOUSEMOVE:
                h, w = self.frame.shape[:2] 
                xo, yo = self.drag_start 
                x0, y0 = np.maximum(0, np.minimum([xo, yo], [x, y])) 
                x1, y1 = np.minimum([w, h], np.maximum([xo, yo], [x, y])) 
                self.selection = None 

                if x1-x0 > 0 and y1-y0 > 0:
                    self.selection = (x0, y0, x1, y1) 

            elif event == cv2.EVENT_LBUTTONUP:
                self.drag_start = None 
                if self.selection is not None: 
                    self.tracking_state = 1 

    # Method to start tracking the object 
    def start_tracking(self): 
        # Iterate until the user presses the Esc key 
        while True: 
            # Capture the frame from webcam 
            ret, self.frame = self.cap.read() 
            # Resize the input frame 
            self.frame = cv2.resize(self.frame, None, fx=self.scaling_factor, fy=self.scaling_factor, interpolation=cv2.INTER_AREA) 

            vis = self.frame.copy() 

            # Convert to HSV color space 
            hsv = cv2.cvtColor(self.frame, cv2.COLOR_BGR2HSV) 

            # Create the mask based on predefined thresholds. 
            mask = cv2.inRange(hsv, np.array((0., 60., 32.)), np.array((180., 255., 255.))) 

            if self.selection: 
                x0, y0, x1, y1 = self.selection 
                self.track_window = (x0, y0, x1-x0, y1-y0) 
                hsv_roi = hsv[y0:y1, x0:x1] 
                mask_roi = mask[y0:y1, x0:x1] 

                # Compute the histogram 
                hist = cv2.calcHist( [hsv_roi], [0], mask_roi, [16], [0, 180] ) 

                # Normalize and reshape the histogram 
                cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX); 
                self.hist = hist.reshape(-1) 

                vis_roi = vis[y0:y1, x0:x1] 
                cv2.bitwise_not(vis_roi, vis_roi) 
                vis[mask == 0] = 0 

            if self.tracking_state == 1:
                self.selection = None 

                # Compute the histogram back projection 
                prob = cv2.calcBackProject([hsv], [0], self.hist, [0, 180], 1) 

                prob &= mask 
                term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) 

                # Apply CAMShift on 'prob' 
                track_box, self.track_window = cv2.CamShift(prob, self.track_window, term_crit) 

                # Draw an ellipse around the object 
                cv2.ellipse(vis, track_box, (0, 255, 0), 2) 

            cv2.imshow('Object Tracker', vis) 

            c = cv2.waitKey(delay=5) 
            if c == 27: 
                break 

        cv2.destroyAllWindows() 

if __name__ == '__main__': 
    ObjectTracker().start_tracking()

基于特征的跟踪

基于特征的跟踪是指跟踪视频中连续帧中的各个特征点。我们使用一种称为光流的技术来跟踪这些特征。光流是计算机视觉中最流行的技术之一。我们选择了一堆特征点并通过视频流对其进行跟踪。

当我们检测到特征点时，我们将计算位移向量并显示这些关键点在连续帧之间的运动。这些向量称为运动向量。有很多方法可以做到这一点，但是卢卡斯-卡纳德方法也许是所有这些方法中最流行的方法。您可以在 OpenCV 官方文档中了解更多信息。

我们通过提取特征点开始该过程。对于每个特征点，我们以特征点为中心创建3x3色块。这里的假设是每个面片内的所有点都将具有相似的运动。我们可以根据眼前的问题来调整此窗口的大小。

对于当前帧中的每个特征点，我们将周围的3x3面片作为参考点。对于此补丁，我们在前一帧中查看其邻域以获得最佳匹配。该邻域通常大于3x3，因为我们要获取最接近所考虑补丁的补丁。现在，从前一帧中匹配的补丁的中心像素到当前帧中正在考虑的补丁的中心像素的路径将成为运动向量。我们对所有特征点都这样做，并提取所有运动向量。

让我们考虑以下框架：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4gdnthg-1681870996691)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/0b1086b0-d018-4c8e-9cf0-4a21bcfef236.png)]

如果我沿水平方向移动，您将看到沿水平方向的运动向量：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Fqfo5CfD-1681870996691)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/c9b177e8-d398-414d-b996-c9c15b74e9db.png)]

如果我离开网络摄像头，您将看到以下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jXWYoFA5-1681870996691)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/47033fa1-7766-48e7-9d3b-ab92f50e93b8.png)]

首先，我们将实现一个函数，以从给定图像中提取特征点，以使用前一帧获取运动向量：

def compute_feature_points(tracking_paths, prev_img, current_img):
    feature_points = [tp[-1] for tp in tracking_paths]
    # Vector of 2D points for which the flow needs to be found
    feature_points_0 = np.float32(feature_points).reshape(-1, 1, 2) 

    feature_points_1, status_1, err_1 = cv2.calcOpticalFlowPyrLK(prev_img, current_img, \
        feature_points_0, None, **tracking_params) 
    feature_points_0_rev, status_2, err_2 = cv2.calcOpticalFlowPyrLK(current_img, prev_img, \
        feature_points_1, None, **tracking_params)

    # Compute the difference of the feature points 
    diff_feature_points = abs(feature_points_0-feature_points_0_rev).reshape(-1, 2).max(-1) 

    # threshold and keep only the good points 
    good_points = diff_feature_points < 1
    return feature_points_1.reshape(-1, 2), good_points

现在，我们可以实现一种跟踪方法，其中给定获得的兴趣区域，并基于通过上述方法获得的特征点，我们可以显示运动向量（跟踪路径）：

# Extract area of interest based on the tracking_paths
# In case there is none, entire frame is used
def calculate_region_of_interest(frame, tracking_paths):
    mask = np.zeros_like(frame) 
    mask[:] = 255 
    for x, y in [np.int32(tp[-1]) for tp in tracking_paths]: 
        cv2.circle(mask, (x, y), 6, 0, -1) 
    return mask

def add_tracking_paths(frame, tracking_paths):
    mask = calculate_region_of_interest(frame, tracking_paths)

    # Extract good features to track. You can learn more 
    # about the parameters here: http://goo.gl/BI2Kml 
    feature_points = cv2.goodFeaturesToTrack(frame, mask = mask, maxCorners = 500, \
        qualityLevel = 0.3, minDistance = 7, blockSize = 7) 

    if feature_points is not None: 
        for x, y in np.float32(feature_points).reshape(-1, 2): 
            tracking_paths.append([(x, y)])

def start_tracking(cap, scaling_factor, num_frames_to_track, num_frames_jump, tracking_params): 
    tracking_paths = [] 
    frame_index = 0 

    # Iterate until the user presses the ESC key 
    while True: 
        # read the input frame 
        ret, frame = cap.read() 

        # downsample the input frame 
        frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor, \                            interpolation=cv2.INTER_AREA) 

        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 
        output_img = frame.copy() 

        if len(tracking_paths) > 0: 
            prev_img, current_img = prev_gray, frame_gray
            # Compute feature points using optical flow. You can 
            # refer to the documentation to learn more about the 
            # parameters here: http://goo.gl/t6P4SE
            feature_points, good_points = compute_feature_points(tracking_paths, \
                prev_img, current_img)

            new_tracking_paths = []
            for tp, (x, y), good_points_flag in \
                zip(tracking_paths, feature_points, good_points): 
                if not good_points_flag: continue 

                tp.append((x, y)) 

                # Using the queue structure i.e. first in, first out 
                if len(tp) > num_frames_to_track: del tp[0] 

                new_tracking_paths.append(tp) 

                # draw green circles on top of the output image 
                cv2.circle(output_img, (x, y), 3, (0, 255, 0), -1) 

            tracking_paths = new_tracking_paths 

            # draw green lines on top of the output image 
            point_paths = [np.int32(tp) for tp in tracking_paths]
            cv2.polylines(output_img, point_paths, False, (0, 150, 0)) 

        # 'if' condition to skip every 'n'th frame 
        if not frame_index % num_frames_jump: 
            add_tracking_paths(frame_gray, tracking_paths)

        frame_index += 1 
        prev_gray = frame_gray 

        cv2.imshow('Optical Flow', output_img) 

        # Check if the user pressed the ESC key 
        c = cv2.waitKey(1) 
        if c == 27: 
            break

这是使用上面的代码执行基于光流的跟踪：

import cv2 
import numpy as np 

if __name__ == '__main__': 
    # Capture the input frame 
    cap = cv2.VideoCapture(1) 

    # Downsampling factor for the image 
    scaling_factor = 0.5 

    # Number of frames to keep in the buffer when you 
    # are tracking. If you increase this number, 
    # feature points will have more "inertia" 
    num_frames_to_track = 5 

    # Skip every 'n' frames. This is just to increase the speed. 
    num_frames_jump = 2 

    # 'winSize' refers to the size of each patch. These patches 
    # are the smallest blocks on which we operate and track 
    # the feature points. You can read more about the parameters 
    # here: http://goo.gl/ulwqLk 
    tracking_params = dict(winSize = (11, 11), maxLevel = 2, \
        criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03)) 

    start_tracking(cap, scaling_factor, num_frames_to_track, \
        num_frames_jump, tracking_params) 
    cv2.destroyAllWindows()

因此，如果您想玩转它，可以让用户在输入视频中选择兴趣区域（就像我们之前所做的那样）。然后，您可以从该兴趣区域提取特征点，并通过绘制边界框来跟踪对象。这将是一个有趣的练习！

背景减法

背景减法在视频监视中非常有用。基本上，背景减法技术在必须检测静态场景中的运动物体的情况下表现非常好。顾名思义，该算法通过检测背景并将其从当前帧中减去以获取前景（即运动对象）来工作。

为了检测运动物体，我们需要首先建立背景模型。这与帧差异不同，因为我们实际上是在对背景建模并使用此模型来检测运动对象。因此，这比简单的帧差分技术要好得多。该技术尝试检测场景中的静态部分，然后将其包括在背景模型中。因此，这是一种自适应技术，可以根据场景进行调整。

让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VhysqOhN-1681870996692)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/c6d577b8-7af8-4968-9f1b-ec11780fb541.png)]

现在，当我们在该场景中收集更多帧时，图像的每个部分将逐渐成为背景模型的一部分。这也是我们前面讨论的内容。如果场景是静态的，则模型会进行自我调整以确保更新了背景模型。一开始就是这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Vh9TYl0m-1681870996693)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/36084281-6bcb-443e-a3dd-58a5a4013b06.png)]

请注意，我的脸部如何已成为背景模型（变黑区域）的一部分。以下屏幕截图显示了几秒钟后我们将看到的内容。如果我们继续前进，一切最终都会成为背景模型的一部分：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Hb92hPSZ-1681870996693)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/682921d7-7142-4ecc-ba10-c04085f19e04.png)]

现在，如果我们引入一个新的运动对象，则将清晰地检测到它，如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-h8btpZyw-1681870996694)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/54696675-ad75-42ad-ad04-313733a1f4a5.png)]

这是执行此操作的代码：

import cv2 
import numpy as np 

# Capture the input frame 
def get_frame(cap, scaling_factor=0.5): 
    ret, frame = cap.read() 

    # Resize the frame 
    frame = cv2.resize(frame, None, fx=scaling_factor, 
            fy=scaling_factor, interpolation=cv2.INTER_AREA) 

    return frame 

if __name__=='__main__': 
    # Initialize the video capture object 
    cap = cv2.VideoCapture(1) 

    # Create the background subtractor object 
    bgSubtractor = cv2.createBackgroundSubtractorMOG2()

    # This factor controls the learning rate of the algorithm. 
    # The learning rate refers to the rate at which your model 
    # will learn about the background. Higher value for 
    # 'history' indicates a slower learning rate. You 
    # can play with this parameter to see how it affects 
    # the output. 
    history = 100 

    # Iterate until the user presses the ESC key 
    while True: 
        frame = get_frame(cap, 0.5) 

        # Apply the background subtraction model to the input frame 
        mask = bgSubtractor.apply(frame, learningRate=1.0/history)

        # Convert from grayscale to 3-channel RGB 
        mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR) 

        cv2.imshow('Input frame', frame)
        cv2.imshow('Moving Objects MOG', mask & frame)

        # Check if the user pressed the ESC key 
        c = cv2.waitKey(delay=30) 
        if c == 27: 
            break 

    cap.release() 
    cv2.destroyAllWindows()

在前面的示例中，我们使用了称为BackgroundSubtractorMOG的背景减法，这是一种基于高斯混合的背景/前景分割算法。在该算法中，每个背景像素都放入一个矩阵中，并通过应用高斯分布进行混合。每种颜色都会获得一个权重，以代表它们停留在场景中的时间；这样，将保持静态的颜色用于定义背景：

if __name__=='__main__': 
    # Initialize the video capture object 
    cap = cv2.VideoCapture(1) 

    # Create the background subtractor object 
    bgSubtractor= cv2.bgsegm.createBackgroundSubtractorGMG()
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ksize=(3,3))

    # Iterate until the user presses the ESC key 
    while True: 
        frame = get_frame(cap, 0.5) 

        # Apply the background subtraction model to the input frame
        mask = bgSubtractor.apply(frame)
        # Removing noise from background
        mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)

        cv2.imshow('Input frame', frame)
        cv2.imshow('Moving Objects', mask)

        # Check if the user pressed the ESC key 
        c = cv2.waitKey(delay=30) 
        if c == 27: 
            break 

    cap.release() 
    cv2.destroyAllWindows()

还有其他选择可能会更好。例如，去除图像噪声，这是BackgroundSubtractorGMG的情况。如果您想进一步了解它们，请访问这个页面。

总结

在本章中，我们学习了对象跟踪。我们学习了如何使用帧差分获得运动信息，以及当我们要跟踪不同类型的对象时如何限制运动信息。我们了解了色彩空间阈值及其如何用于跟踪彩色对象的知识。我们讨论了用于对象跟踪的聚类技术，以及如何使用 CAMShift 算法构建交互式的对象跟踪器。我们讨论了如何跟踪视频中的特征，以及如何使用光流来实现相同功能。我们了解了背景减法及其如何用于视频监控。

在下一章中，我们将讨论对象识别以及如何构建视觉搜索引擎。

九、对象识别

在本章中，我们将学习对象识别以及如何使用它来构建视觉搜索引擎。我们将讨论特征检测，构建特征向量以及使用机器学习构建分类器。我们将学习如何使用这些不同的块来构建对象识别系统。

在本章结束时，您将了解：

对象检测和对象识别之间的区别
什么是密集特征检测器
什么是视觉词典
如何建立特征向量
什么是监督和无监督学习
什么是支持向量机以及如何使用它们来构建分类器
如何识别未知图像中的对象

对象检测与对象识别

在继续之前，我们需要了解本章将要讨论的内容。您必须经常听到术语“对象检测”和“对象识别”，并且它们经常被误认为是同一件事。两者之间有非常明显的区别。

对象检测是指在给定场景中检测特定对象的存在。我们不知道对象可能是什么。例如，我们在第 4 章，“检测和跟踪不同身体部位”中讨论了面部检测。在讨论过程中，我们仅检测到给定图像中是否存在面部。我们不认识这个人！我们之所以不认识这个人，是因为我们在讨论中并不在意。我们的目标是找到给定图像中人脸的位置。商业面部识别系统同时使用面部检测和面部识别来识别人。首先，我们需要找到面部，然后在裁剪的面部上运行面部识别器。

对象识别是在给定图像中识别对象的过程。例如，对象识别系统可以告诉您给定的图像是否包含衣服或鞋子。实际上，我们可以训练对象识别系统来识别许多不同的对象。问题在于对象识别是一个非常难以解决的问题。数十年来，它一直使计算机视觉研究人员望而却步，并且已成为计算机视觉的圣杯。人类可以很容易地识别出各种各样的物体。我们每天都会这样做，而且我们会毫不费力地这样做，但是计算机无法做到这种准确率。

让我们考虑一下拿铁杯的图片：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rDro0tzq-1681870996694)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/3d4c1e20-0ed4-43e4-9227-31c43a428800.jpg)]

对象检测器将为您提供以下信息：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-En4zG5FF-1681870996694)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/3c60d162-056c-4d90-a6f3-a8fa9e7da033.png)]

现在，考虑下面的茶杯图像：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A0b1yKwW-1681870996695)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/da49384a-e097-4a97-b357-306c423f7379.jpg)]

如果通过对象检测器运行它，将看到以下结果：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mJzZGlMI-1681870996695)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/a9377f64-c517-48ac-80ba-d08ed618cb20.png)]

如您所见，对象检测器可以检测到茶杯的存在，仅此而已。如果训练对象识别器，它将为您提供以下信息，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bxG61DBM-1681870996695)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/31acf73f-234f-45a0-a5ae-69da042cb882.png)]

如果考虑第二张图像，它将为您提供以下信息：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-C1Vx308N-1681870996695)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/649f4c71-894d-49ac-855d-75651974035c.png)]

如您所见，完美的对象识别器将为您提供与该对象关联的所有信息。如果对象识别器知道对象位于何处，则其特征将更加准确。如果您的图像很大，而杯子只是其中的一小部分，则对象识别器可能无法识别它。因此，第一步是检测对象并获得边界框。一旦有了这些，就可以运行对象识别器来提取更多信息。

什么是密集特征检测器？

为了从图像中提取有意义的信息，我们需要确保特征提取器从给定图像的所有部分中提取特征。考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-diALk5AN-1681870996695)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/22ec105f-d998-40b3-82da-16c4dfbfa94e.jpg)]

如果像第 5 章，“从图像中提取特征”那样，使用特征提取器提取特征，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OXHQUMNl-1681870996696)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8605e054-e604-451e-88cc-add76ab9b2aa.png)]

不幸的是，如果您曾经使用cv2.FeaturetureDetector_create("Dense")检测器，则该检测器已从 OpenCV 3.2 以后的版本中删除，因此我们需要实现自己的一个遍历网格并获取关键点的方法：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6CygcfpJ-1681870996696)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/9571e6b1-88ad-41f7-91c2-7229d2bd4c75.png)]

我们也可以控制密度。让我们使其稀疏：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1jce1Sot-1681870996696)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/63713862-03b3-4d2a-bb62-5b9adb2f8674.png)]

通过这样做，我们可以确保处理图像中的每个部分。这是执行此操作的代码：

import sys
import cv2 
import numpy as np 

class DenseDetector(): 
    def __init__(self, step_size=20, feature_scale=20, img_bound=20): 
        # Create a dense feature detector 
        self.initXyStep = step_size
        self.initFeatureScale = feature_scale
        self.initImgBound = img_bound

    def detect(self, img):
        keypoints = []
        rows, cols = img.shape[:2]
        for x in range(self.initImgBound, rows, self.initFeatureScale):
            for y in range(self.initImgBound, cols, self.initFeatureScale):
                keypoints.append(cv2.KeyPoint(float(x), float(y), self.initXyStep))
        return keypoints 

class SIFTDetector():
    def __init__(self):
        self.detector = cv2.xfeatures2d.SIFT_create()

    def detect(self, img):
        # Convert to grayscale 
        gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
        # Detect keypoints using SIFT 
        return self.detector.detect(gray_image, None) 

if __name__=='__main__': 
    input_image = cv2.imread(sys.argv[1]) 
    input_image_dense = np.copy(input_image)
    input_image_sift = np.copy(input_image)

    keypoints = DenseDetector(20,20,5).detect(input_image)
    # Draw keypoints on top of the input image 
    input_image_dense = cv2.drawKeypoints(input_image_dense, keypoints, None,\                 
        flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) 
    # Display the output image 
    cv2.imshow('Dense feature detector', input_image_dense) 

    keypoints = SIFTDetector().detect(input_image)
    # Draw SIFT keypoints on the input image 
    input_image_sift = cv2.drawKeypoints(input_image_sift, keypoints, None,\
        flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) 
    # Display the output image 
    cv2.imshow('SIFT detector', input_image_sift) 

    # Wait until user presses a key 
    cv2.waitKey()

这使我们可以严格控制要提取的信息量。当我们使用 SIFT 检测器时，图像的某些部分被忽略了。当我们要处理突出特征时，这种方法效果很好，但是当我们构建对象识别器时，我们需要评估图像的所有部分。因此，我们使用一个密集的检测器，然后从那些关键点提取特征。

什么是视觉词典？

我们将使用词袋模型来构建我们的对象识别器。每个图像表示为视觉单词的直方图。这些视觉词基本上是使用从训练图像中提取的所有关键点构建的N重心。管道如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-soX6FlvX-1681870996696)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/02a1f092-a5f4-4525-8229-9ec2e4b10ec6.png)]

从每个训练图像中，我们检测一组关键点，并为每个关键点提取特征。每个图像都会产生不同数量的关键点。为了训练分类器，必须使用固定长度的特征向量来表示每个图像。该特征向量仅是直方图，其中每个箱子对应一个视觉单词。

当从训练图像中的所有关键点提取所有特征时，我们执行K-均值聚类并提取N重心。N是给定图像的特征向量的长度。现在将每个图像表示为一个直方图，其中每个箱子对应于N重心之一。为简单起见，假设N设置为四个。现在，在给定的图像中，我们提取K个关键点。在这些K关键点中，其中一些将最接近第一个质心，其中一些将最接近第二质心，依此类推。因此，我们基于最接近每个关键点的质心构建直方图。该直方图成为我们的特征向量。该过程称为向量量化。

为了理解向量量化，让我们考虑一个例子。假设我们有一幅图像，并且已经从中提取了一定数量的特征点。现在我们的目标是以特征向量的形式表示该图像。考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ltE3GCxj-1681870996697)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/9cc32392-4e42-4882-a2a9-41ae9bb11312.png)]

如您所见，我们有四个质心。请记住，图中显示的点代表特征空间，而不是图像中这些特征点的实际几何位置。上图中以这种方式显示了它，因此很容易可视化。图像中许多不同几何位置的点在特征空间中可以彼此靠近。我们的目标是将该图像表示为直方图，其中每个箱子对应于这些质心之一。这样，无论我们从图像中提取多少个特征点，都将始终将其转换为固定长度的特征向量。因此，我们将每个特征点四舍五入到其最近的质心，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e0nJNIlE-1681870996697)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/334229e1-9573-400b-b12a-85f78f630aab.png)]

如果您为此图像构建直方图，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-B2QosIYn-1681870996697)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/facf508d-ff1b-42b5-9259-6de794cd8f5b.png)]

现在，如果您考虑具有不同特征点分布的其他图像，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NmgzosAS-1681870996697)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/40e15180-084d-4e6c-8faa-abb4d67a95ad.png)]

群集如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aONrUqrT-1681870996698)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/2ed0f7bf-a08c-40de-8884-9542321d3dfa.png)]

直方图如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cg8UqEds-1681870996698)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/9bdd86ca-35ba-45b4-879f-0225e28da747.png)]

如您所见，即使这些点似乎是随机分布的，两个图像的直方图也有很大不同。这是一项非常强大的技术，已广泛用于计算机视觉和信号处理中。有许多不同的方法可以做到这一点，而准确率取决于您希望它有多细。如果增加质心的数量，将可以更好地表示图像，从而增加特征向量的唯一性。话虽如此，重要的是要提到您不能仅仅无限期地增加质心的数量。如果这样做，它将变得过于嘈杂并失去功能。

什么是监督和无监督学习？

如果您熟悉机器学习的基础知识，那么您一定会知道什么是监督学习和无监督学习。

为了让您快速入门，监督学习指的是基于标记的样本构建函数。例如，如果我们要构建一个将服装图像与鞋类图像分离的系统，那么我们首先需要建立一个数据库并对其进行标签。我们需要告诉我们的算法，哪些图像对应于衣服，哪些图像对应于鞋。基于这些数据，该算法将学习如何识别衣服和鞋类，以便当出现未知图像时，它可以识别该图像中的内容。

无监督学习与我们刚刚讨论的相反。这里没有标签数据。假设我们有一堆图像，我们只想将它们分成三组。我们不知道标准是什么。因此，无监督学习算法将尝试以最佳可能的方式将给定的数据集分为三组。之所以讨论这一点，是因为我们将结合有监督和无监督学习来构建对象识别系统。

什么是支持向量机？

支持向量机（SVM）是在机器学习领域中非常流行的监督学习模型。 SVM 确实擅长分析标记的数据和检测模式。给定一堆数据点和相关的标签，SVM 将以最佳方式构建分离的超平面。

等一下，什么是超平面？为了理解这一点，让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hr5vm3t0-1681870996698)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/f0e71a05-27cb-46f3-893a-1e8d16b86928.png)]

如您所见，这些点被与这些点等距的线边界分隔开。这很容易在两个维度上可视化。如果是三维尺寸，则分隔符将是平面。当我们为图像构建特征时，特征向量的长度通常在六位数范围内。因此，当我们进入如此高的维度空间时，线的等效物将是超平面。制定超平面后，我们将使用数学模型根据未知数据在地图上的位置进行分类。

如果我们不能用简单的直线分开数据怎么办？

我们在 SVM 中使用了核技巧。考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6mNkDPig-1681870996698)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/edb9192f-ac2c-4ea3-9b22-2ae1736be64a.png)]

如我们所见，我们不能画一条简单的直线将红色点和蓝色点分开。提出一个可以满足所有要求的曲线优美的边界非常昂贵。 SVM 确实擅长绘制直线。那么，我们在这里的答案是什么？关于 SVM 的好处是它们可以绘制任意数量的这些直线。因此，从技术上讲，如果将这些点投影到一个高维空间中，可以通过一个简单的超平面将它们分开，则 SVM 会提供一个确切的边界。一旦有了该边界，就可以将其投影回原始空间。该超平面在我们原始的较低维空间上的投影看起来是弯曲的，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LiwBotwv-1681870996699)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/86bbb177-e521-4c63-aa4f-d3b964e8b836.png)]

SVM 的主题确实很深，我们将无法在此处进行详细讨论。如果您真的有兴趣，可以在线找到大量材料。您可以通过简单的教程来更好地理解它。此外，OpenCV 官方文档还包含一些示例，可以使您更好地理解它们。

我们如何实际实现呢？

我们现在已经到达核心。前面的介绍是必要的，因为它为您提供了构建对象检测和识别系统所需的背景。现在，让我们构建一个对象识别器，该对象识别器可以识别给定的图像是否包含衣服，一双鞋子或一个包。我们可以轻松扩展此系统以检测任意数量的项目。我们从三个不同的项目开始，以便您以后可以开始进行试验。

在开始之前，我们需要确保我们具有一组训练图像。在线有许多数据库，其中图像已经按组排列。 Caltech256 可能是最流行的对象识别数据库之一。您可以从这里下载。创建一个名为images的文件夹，并在其中创建三个子文件夹，即dress，footwear和bag。在每个子文件夹中，添加与该项目相对应的 20 张图像。您可以只从互联网下载这些图像，但要确保这些图像的背景干净。

例如，礼服图片将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FzrnoSNo-1681870996699)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8068a975-e341-451d-b46d-91d62850089b.jpg)]

鞋类图片如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tCcsRG8X-1681870996699)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8f7380f7-a91b-453d-865f-ac42a6016527.jpg)]

袋子图像如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6Y3jsDxt-1681870996700)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/17dc3e2a-d046-43c1-afff-86d1fafd6a11.png)]

现在我们有 60 张训练图像，我们准备开始。附带说明一下，对象识别系统实际上需要成千上万张训练图像才能在现实世界中表现良好。由于我们正在构建一个对象识别器来检测三种类型的对象，因此每个对象仅拍摄 20 张训练图像。添加更多的训练图像将提高我们系统的准确率和鲁棒性。

这里的第一步是从所有训练图像中提取特征向量，并建立可视词典（也称为码本）。

首先，重用我们先前的DenseDetector类，再加上 SIFT 特征检测器：

class SIFTExtractor():
    def __init__(self):
        self.extractor = cv2.xfeatures2d.SIFT_create()

    def compute(self, image, kps): 
        if image is None: 
            print "Not a valid image"
            raise TypeError 

        gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
        kps, des = self.extractor.detectAndCompute(gray_image, None) 
        return kps, des

然后，我们的Quantizer类计算向量量化并构建特征向量：

from sklearn.cluster import KMeans 

# Vector quantization 
class Quantizer(object): 
    def __init__(self, num_clusters=32): 
        self.num_dims = 128 
        self.extractor = SIFTExtractor() 
        self.num_clusters = num_clusters 
        self.num_retries = 10 

    def quantize(self, datapoints): 
        # Create KMeans object 
        kmeans = KMeans(self.num_clusters, 
                        n_init=max(self.num_retries, 1), 
                        max_iter=10, tol=1.0) 

        # Run KMeans on the datapoints 
        res = kmeans.fit(datapoints) 

        # Extract the centroids of those clusters 
        centroids = res.cluster_centers_ 

        return kmeans, centroids 

    def normalize(self, input_data): 
        sum_input = np.sum(input_data) 
        if sum_input > 0: 
            return input_data / sum_input 
        else: 
            return input_data 

    # Extract feature vector from the image 
    def get_feature_vector(self, img, kmeans, centroids): 
        kps = DenseDetector().detect(img) 
        kps, fvs = self.extractor.compute(img, kps) 
        labels = kmeans.predict(fvs) 
        fv = np.zeros(self.num_clusters) 

        for i, item in enumerate(fvs): 
            fv[labels[i]] += 1 

        fv_image = np.reshape(fv, ((1, fv.shape[0]))) 
        return self.normalize(fv_image)

重用以前的实现，另一个需要的类是FeatureExtractor类，该类旨在提取每个图像的质心：

class FeatureExtractor(object): 
    def extract_image_features(self, img): 
        # Dense feature detector 
        kps = DenseDetector().detect(img) 

        # SIFT feature extractor 
        kps, fvs = SIFTExtractor().compute(img, kps) 

        return fvs 

    # Extract the centroids from the feature points 
    def get_centroids(self, input_map, num_samples_to_fit=10): 
        kps_all = [] 

        count = 0 
        cur_label = '' 
        for item in input_map: 
            if count >= num_samples_to_fit: 
                if cur_label != item['label']: 
                    count = 0 
                else: 
                    continue 

            count += 1 

            if count == num_samples_to_fit: 
                print("Built centroids for", item['label'])

            cur_label = item['label'] 
            img = cv2.imread(item['image']) 
            img = resize_to_size(img, 150) 

            num_dims = 128 
            fvs = self.extract_image_features(img) 
            kps_all.extend(fvs) 

        kmeans, centroids = Quantizer().quantize(kps_all) 
        return kmeans, centroids 

    def get_feature_vector(self, img, kmeans, centroids): 
        return Quantizer().get_feature_vector(img, kmeans, centroids)

以下脚本将为我们提供特征字典，以对未来的图像进行分类：

########################
# create_features.py
########################

import os 
import sys 
import argparse 
import json 

import cv2 
import numpy as np 

import cPickle as pickle 
# In case of Python 2.7 use:
# import cPickle as pickle

def build_arg_parser(): 
    parser = argparse.ArgumentParser(description='Creates features for given images')
    parser.add_argument("--samples", dest="cls", nargs="+", action="append", required=True,\
        help="Folders containing the training images.\nThe first element needs to be the class label.") 
    parser.add_argument("--codebook-file", dest='codebook_file', required=True, 
        help="Base file name to store the codebook") 
    parser.add_argument("--feature-map-file", dest='feature_map_file', required=True,\
        help="Base file name to store the feature map") 

    return parser 

# Loading the images from the input folder 
def load_input_map(label, input_folder): 
    combined_data = [] 

    if not os.path.isdir(input_folder): 
        raise IOError("The folder " + input_folder + " doesn't exist") 

    # Parse the input folder and assign the labels 
    for root, dirs, files in os.walk(input_folder): 
        for filename in (x for x in files if x.endswith('.jpg')): 
            combined_data.append({'label': label, 'image': 
             os.path.join(root, filename)}) 

    return combined_data 

def extract_feature_map(input_map, kmeans, centroids): 
    feature_map = [] 

    for item in input_map: 
        temp_dict = {} 
        temp_dict['label'] = item['label'] 

        print("Extracting features for", item['image'])
        img = cv2.imread(item['image']) 
        img = resize_to_size(img, 150) 

        temp_dict['feature_vector'] = FeatureExtractor().get_feature_vector(img, kmeans, centroids) 

        if temp_dict['feature_vector'] is not None: 
            feature_map.append(temp_dict) 

    return feature_map 

# Resize the shorter dimension to 'new_size' 
# while maintaining the aspect ratio 
def resize_to_size(input_image, new_size=150): 
    h, w = input_image.shape[0], input_image.shape[1] 
    ds_factor = new_size / float(h) 

    if w < h: 
        ds_factor = new_size / float(w) 

    new_size = (int(w * ds_factor), int(h * ds_factor)) 
    return cv2.resize(input_image, new_size) 

if __name__=='__main__': 
    args = build_arg_parser().parse_args() 

    input_map = [] 
    for cls in args.cls:
        assert len(cls) >= 2, "Format for classes is `<label> file`" 
        label = cls[0] 
        input_map += load_input_map(label, cls[1]) 

    # Building the codebook 
    print("===== Building codebook =====")
    kmeans, centroids = FeatureExtractor().get_centroids(input_map) 
    if args.codebook_file: 
        with open(args.codebook_file, 'wb') as f: 
            print('kmeans', kmeans)
            print('centroids', centroids)
            pickle.dump((kmeans, centroids), f) 

    # Input data and labels 
    print("===== Building feature map =====")
    feature_map = extract_feature_map(input_map, kmeans, 
     centroids) 
    if args.feature_map_file: 
        with open(args.feature_map_file, 'wb') as f: 
            pickle.dump(feature_map, f)

代码内部发生了什么？

我们需要做的第一件事是提取质心。这就是我们要构建可视词典的方式。 FeatureExtractor类中的get_centroids方法旨在实现此目的。我们会不断收集从关键点提取的图像特征，直到有足够数量的特征为止。由于我们使用的是密集探测器，因此 10 张图像就足够了。我们只拍摄 10 张图像的原因是因为它们会产生大量特征。即使添加更多特征点，质心也不会改变太多。

提取质心后，就可以继续进行特征提取的下一步了。形心集是我们的视觉词典。函数extract_feature_map将从每个图像中提取特征向量，并将其与相应的标签关联。这样做的原因是因为我们需要此映射来训练我们的分类器。我们需要一组关键点，并且每个关键点都应与一个标签关联。因此，我们从图像开始，提取特征向量，然后将其与相应的标签（例如包，衣服或鞋类）相关联。

Quantizer类旨在实现向量量化并构建特征向量。对于从图像中提取的每个关键点，get_feature_vector方法会在我们的词典中找到最接近的视觉单词。通过这样做，我们最终基于可视词典构建了直方图。现在，将每个图像表示为一组视觉单词的组合。因此，名称为词袋。

下一步是使用这些特征训练分类器。为此，我们实现了另一个类：

from sklearn.multiclass import OneVsOneClassifier 
from sklearn.svm import LinearSVC 
from sklearn import preprocessing 

# To train the classifier 
class ClassifierTrainer(object): 
    def __init__(self, X, label_words): 
        # Encoding the labels (words to numbers) 
        self.le = preprocessing.LabelEncoder() 

        # Initialize One versus One Classifier using a linear kernel 
        self.clf = OneVsOneClassifier(LinearSVC(random_state=0)) 

        y = self._encodeLabels(label_words) 
        X = np.asarray(X) 
        self.clf.fit(X, y) 

    # Predict the output class for the input datapoint 
    def _fit(self, X): 
        X = np.asarray(X) 
        return self.clf.predict(X) 

    # Encode the labels (convert words to numbers) 
    def _encodeLabels(self, labels_words): 
        self.le.fit(labels_words) 
        return np.array(self.le.transform(labels_words), 
         dtype=np.float32) 

    # Classify the input datapoint 
    def classify(self, X): 
        labels_nums = self._fit(X) 
        labels_words = self.le.inverse_transform([int(x) for x in 
         labels_nums]) 
        return labels_words

现在，基于先前的特征字典，我们生成了 SVM 文件：

###############
# training.py
###############

import os 
import sys 
import argparse 

import _pickle as pickle 
import numpy as np 

def build_arg_parser(): 
    parser = argparse.ArgumentParser(description='Trains the classifier models')
    parser.add_argument("--feature-map-file", dest="feature_map_file", required=True,\
        help="Input pickle file containing the feature map") 
    parser.add_argument("--svm-file", dest="svm_file", required=False,\
        help="Output file where the pickled SVM model will be stored") 
    return parser 

if __name__=='__main__': 
    args = build_arg_parser().parse_args() 
    feature_map_file = args.feature_map_file 
    svm_file = args.svm_file 

    # Load the feature map 
    with open(feature_map_file, 'rb') as f: 
        feature_map = pickle.load(f) 

    # Extract feature vectors and the labels 
    labels_words = [x['label'] for x in feature_map] 

    # Here, 0 refers to the first element in the 
    # feature_map, and 1 refers to the second 
    # element in the shape vector of that element 
    # (which gives us the size) 
    dim_size = feature_map[0]['feature_vector'].shape[1] 

    X = [np.reshape(x['feature_vector'], (dim_size,)) for x in feature_map] 

    # Train the SVM 
    svm = ClassifierTrainer(X, labels_words) 
    if args.svm_file: 
        with open(args.svm_file, 'wb') as f: 
            pickle.dump(svm, f)

请注意，我们正在以二进制模式进行写入/读取，这就是打开文件时使用rb和wb模式的原因。

我们如何建立训练器？

我们使用scikit-learn包来构建 SVM 模型，并使用scipy来建立数学优化工具。您可以按以下方式安装它：

    $ pip install scikit-learn scipy

我们从标记的数据开始，并将其提供给OneVsOneClassifier方法。我们有一个classify方法，该方法可以对输入图像进行分类并为其关联标签。

让我们试一下吧？确保有一个名为images的文件夹，其中有这三个类的训练图像。创建一个名为models的文件夹，该文件夹将存储学习模型。在终端上运行以下命令以创建特征并训练分类器：

    $ python create_features.py --samples bag images/bag/ --samples dress 
 images/dress/ --samples footwear images/footwear/ --codebook-file 
 models/codebook.pkl --feature-map-file models/feature_map.pkl

    $ python training.py --feature-map-file models/feature_map.pkl 
 --svm-file models/svm.pkl

现在已经对分类器进行了训练，我们只需要一个模块即可对输入图像进行分类并检测其中的对象：

import create_features as cf 
from training import ClassifierTrainer 

# Classifying an image 
class ImageClassifier(object): 
    def __init__(self, svm_file, codebook_file): 
        # Load the SVM classifier 
        with open(svm_file, 'rb') as f: 
            self.svm = pickle.load(f) 

        # Load the codebook 
        with open(codebook_file, 'rb') as f: 
            self.kmeans, self.centroids = pickle.load(f) 

    # Method to get the output image tag 
    def getImageTag(self, img): 
        # Resize the input image 
        img = cf.resize_to_size(img) 

        # Extract the feature vector 
        feature_vector = cf.FeatureExtractor().get_feature_vector(img, self.kmeans, \            
            self.centroids) 

        # Classify the feature vector and get the output tag 
        image_tag = self.svm.classify(feature_vector) 

        return image_tag

以下是对数据进行分类的脚本，该脚本可以根据我们之前的训练过程为图像加标签：

###############
# classify_data.py
###############
import os 
import sys 
import argparse 
import _pickle as pickle

import cv2 
import numpy as np 

def build_arg_parser(): 
    parser = argparse.ArgumentParser(description='Extracts features from each line and classifies the data') 
    parser.add_argument("--input-image", dest="input_image", required=True,\
        help="Input image to be classified") 
    parser.add_argument("--svm-file", dest="svm_file", required=True,\
        help="File containing the trained SVM model") 
    parser.add_argument("--codebook-file", dest="codebook_file", required=True,\
        help="File containing the codebook") 
    return parser 

if __name__=='__main__': 
    args = build_arg_parser().parse_args() 
    svm_file = args.svm_file 
    codebook_file = args.codebook_file 
    input_image = cv2.imread(args.input_image) 

    tag = ImageClassifier(svm_file, codebook_file).getImageTag(input_image)
    print("Output class:", tag)

我们都准备好了！我们只是从输入图像中提取feature向量，并将其用作分类器的输入参数。让我们继续看看是否可行。从互联网上下载随机的鞋类图片，并确保其背景干净。通过用正确的文件名替换new_image.jpg，运行以下命令：

    $ python classify_data.py --input-image new_image.jpg --svm-file 
 models/svm.pkl --codebook-file models/codebook.pkl

我们可以使用相同的技术来构建视觉搜索引擎。视觉搜索引擎查看输入图像并显示一堆与其相似的图像。我们可以重用对象识别框架来构建它。从输入图像中提取特征向量，并将其与训练数据集中的所有特征向量进行比较。选择最热门的比赛并显示结果。这是一种简单的做事方式！

在现实世界中，我们必须处理数十亿张图像。因此，在显示输出之前，您无力搜索每个图像。有很多算法可用于确保在现实世界中高效而快速。深度学习已在该领域广泛使用，并且在最近几年中显示出很大的希望。这是机器学习的一个分支，专注于学习数据的最佳表示，因此机器可以更轻松地学习新任务。您可以在以下网址了解更多信息。

总结

在本章中，我们学习了如何构建对象识别系统。详细讨论了对象检测和对象识别之间的区别。我们了解了密集特征检测器，可视词典，向量量化，以及如何使用这些概念来构建特征向量。讨论了有监督和无监督学习的概念。我们讨论了支持向量机以及如何使用它们构建分类器。我们学习了如何识别未知图像中的物体，以及如何扩展该概念以构建视觉搜索引擎。

在下一章中，我们将讨论立体成像和 3D 重建。我们将讨论如何构建深度图并从给定场景中提取 3D 信息。

十、增强现实

在本章中，您将学习增强现实以及如何使用它来构建出色的应用。我们将讨论姿势估计和平面跟踪。您将学习如何将坐标从 3D 映射到 2D，以及如何在实时视频的顶部叠加图形。

在本章结束时，您将了解：

增强现实的前提
什么是姿势估计
如何追踪平面对象
如何将坐标从 3D 映射到 2D
如何在视频上实时叠加图形

增强现实的前提是什么？

在介绍所有有趣的东西之前，让我们了解增强现实的含义。您可能已经看到了增强现实一词在各种环境中使用。因此，在开始讨论实现细节之前，我们应该了解增强现实的前提。增强现实是指将计算机生成的输入（例如图像，声音，图形和文本）叠加在现实世界之上。

增强现实试图通过无缝地合并信息并增强我们所看到和感觉到的东西来模糊真实的东西和计算机生成的东西之间的界限。实际上，它与称为中介现实的概念密切相关，在中介中，计算机可以修改我们对现实的看法。结果，该技术通过增强我们当前对现实的感知而起作用。现在，这里的挑战是使它对用户看起来无缝。只需在输入视频的顶部覆盖一些内容即可，但是我们需要使其看起来像是视频的一部分。用户应该感觉到计算机生成的输入紧密反映了现实世界。这是我们构建增强现实系统时想要实现的目标。

在这种情况下，计算机视觉研究探索了如何将计算机生成的图像应用于实时视频流，以便我们可以增强对现实世界的感知。增强现实技术具有广泛的应用，包括但不限于头戴式显示器，汽车，数据可视化，游戏，建筑等。现在，我们拥有功能强大的智能手机和更智能的机器，我们可以轻松构建高端增强现实应用。

增强现实系统是什么样的？

让我们考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rhYlY93x-1681870996700)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/daae4765-0d39-4752-9791-8653cf7415de.png)]

正如我们在此处看到的那样，摄像机会捕获现实世界的视频以获取参考点。图形系统生成需要覆盖在视频顶部的虚拟对象。现在，视频合并块是所有魔术发生的地方。该块应该足够聪明，以了解如何以最佳方式将虚拟对象叠加在现实世界的顶部。

增强现实的几何变换

增强现实的结果是惊人的，但是在下面却发生了许多数学事情。增强现实利用了大量的几何变换和相关的数学函数来确保一切看起来都平滑。在谈论增强现实的实时视频时，我们需要在现实世界的顶部精确地注册虚拟对象。为了更好地理解这一点，让我们将其视为两个摄像机的对准：通过它可以看到世界的真实摄像机，以及投射计算机生成的图形对象的虚拟摄像机。

为了构建增强现实系统，需要建立以下几何变换：

对象到场景：此转换是指转换虚拟对象的 3D 坐标，并在我们真实世界场景的坐标框中表达它们。这样可以确保我们将虚拟对象放置在正确的位置。
场景到摄像机：此转换是指现实世界中摄像机的姿势。通过姿势，我们表示摄像机的方向和位置。我们需要估计摄像机的视点，以便我们知道如何覆盖虚拟对象。
摄像机到图像：这是指摄像机的校准参数。这定义了我们如何将 3D 对象投影到 2D 图像平面上。这是我们最终将实际看到的图像。

考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rknxFos3-1681870996700)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/5bf66d42-f187-4231-b576-33c314740f34.png)]

正如我们在这里看到的那样，这辆车正试图适应场景，但看起来非常虚假。如果我们没有以正确的方式转换坐标，汽车将看起来不自然。这就是我们所说的对象到场景转换！将虚拟对象的 3D 坐标转换为现实世界的坐标框架后，我们需要估计相机的姿态：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6RqL9xzz-1681870996700)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/fd87fab1-cfd0-4f15-98c4-c2ce0b96a641.png)]

我们需要了解相机的位置和旋转，因为这是用户会看到的。一旦估计了相机的姿势，就可以将 3D 场景放置在 2D 图像上了：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-88X8Ryky-1681870996701)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/87feb898-9fa2-45fe-847e-504bda802234.png)]

一旦进行了这些转换，就可以构建完整的系统。

什么是姿势估计？

在继续之前，我们需要了解如何估计相机的姿势。这是增强现实系统中非常关键的一步，如果我们想要无缝的体验，我们需要正确处理。在增强现实世界中，我们实时将图形叠加在对象之上。为此，我们需要知道相机的位置和方向，并且需要快速进行操作。这是姿势估计变得非常重要的地方。如果未正确跟踪姿势，则叠加的图形将看起来不自然。

考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VIGQuurp-1681870996701)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/057708a9-f82f-40e2-b915-2c151287ab9c.png)]

箭头表示表面是法线。假设对象改变了方向：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c5jfiwx8-1681870996701)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/b5f3ae4d-2d3d-4b95-8d30-d2805997565c.png)]

现在，即使位置相同，方向也已更改。我们需要掌握这些信息，以便叠加的图形看起来自然。我们需要确保图形与此方向和位置对齐。

如何追踪平面对象

既然您已经了解了什么是姿态估计，那么让我们看看如何使用它来跟踪平面对象。让我们考虑以下平面对象：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ggOSzkIg-1681870996701)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/5b9449b3-07dd-4afa-bed0-b784deb15782.jpg)]

现在，如果我们从这张图片中提取特征点，我们将看到如下内容：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Wuo0K3vt-1681870996702)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/6408a06c-8847-45d0-8d71-7b51c3eaa7e0.png)]

让我们倾斜纸箱：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ocdJzcX7-1681870996702)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/5c41d6f7-f680-45bf-af3b-d7155c386eeb.jpg)]

如我们所见，纸板箱在此图像中倾斜。现在，如果要确保我们的虚拟对象覆盖在该表面的顶部，则需要收集此平面倾斜信息。一种方法是使用特征点的相对位置。如果我们从前面的图像中提取特征点，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gVXYtoFk-1681870996702)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/0eee5729-143e-4fde-8669-6d4a9bb0233b.png)]

如您所见，特征点在平面的远端与近端的特征点在水平方向上更加接近：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ldUJ4fbL-1681870996702)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/1ae7a72a-092f-437e-9ccc-37deef66f6e5.png)]

因此，我们可以利用此信息从图像中提取方向信息。如果您还记得的话，我们在讨论几何变换和全景成像时会详细讨论透视变换。我们需要做的就是使用这两组点并提取单应性矩阵。该单应性矩阵将告诉我们纸板箱如何旋转。

考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GseOtpUG-1681870996702)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/424365b2-49f6-4616-8fe0-bacda1a248eb.png)]

首先，我们将使用ROISelector类选择兴趣区域，然后，将这些坐标传递给PoseEstimator：

class ROISelector(object): 
    def __init__(self, win_name, init_frame, callback_func): 
        self.callback_func = callback_func 
        self.selected_rect = None 
        self.drag_start = None 
        self.tracking_state = 0
        event_params = {"frame": init_frame}
        cv2.namedWindow(win_name)
        cv2.setMouseCallback(win_name, self.mouse_event, event_params)

    def mouse_event(self, event, x, y, flags, param):
        x, y = np.int16([x, y]) 

        # Detecting the mouse button down event 
        if event == cv2.EVENT_LBUTTONDOWN: 
            self.drag_start = (x, y) 
            self.tracking_state = 0 

        if self.drag_start:
            if event == cv2.EVENT_MOUSEMOVE:
                h, w = param["frame"].shape[:2] 
                xo, yo = self.drag_start 
                x0, y0 = np.maximum(0, np.minimum([xo, yo], [x, y])) 
                x1, y1 = np.minimum([w, h], np.maximum([xo, yo], [x, y])) 
                self.selected_rect = None 

                if x1-x0 > 0 and y1-y0 > 0:
                    self.selected_rect = (x0, y0, x1, y1) 

            elif event == cv2.EVENT_LBUTTONUP:
                self.drag_start = None 
                if self.selected_rect is not None: 
                    self.callback_func(self.selected_rect)
                    self.selected_rect = None
                    self.tracking_state = 1

    def draw_rect(self, img, rect): 
        if not rect: return False 
        x_start, y_start, x_end, y_end = rect
        cv2.rectangle(img, (x_start, y_start), (x_end, y_end), (0, 255, 0), 2) 
        return True

在下图中，兴趣区域为绿色矩形：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-82CtE7j9-1681870996703)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/44243ae0-4961-427e-8178-bd72becce637.png)]

然后，我们从该兴趣区域提取特征点。由于我们跟踪的是平面物体，因此该算法假定此关注区域为平面。这是显而易见的，但是最好明确声明！因此，选择此兴趣区域时，请确保手中有一个纸板箱。另外，如果纸板箱上有一堆图案和独特点会更好，这样可以很容易地检测和跟踪其特征点。

PoseEstimator类将从其方法add_target()中获得兴趣区域，并从它们中提取这些特征点，这将使我们能够跟踪物体的运动：

class PoseEstimator(object): 
    def __init__(self): 
        # Use locality sensitive hashing algorithm 
        flann_params = dict(algorithm = 6, table_number = 6, key_size = 12, multi_probe_level = 1) 

        self.min_matches = 10 
        self.cur_target = namedtuple('Current', 'image, rect, keypoints, descriptors, data')
        self.tracked_target = namedtuple('Tracked', 'target, points_prev, points_cur, H, quad') 

        self.feature_detector = cv2.ORB_create()
        self.feature_detector.setMaxFeatures(1000)
        self.feature_matcher = cv2.FlannBasedMatcher(flann_params, {}) 
        self.tracking_targets = [] 

    # Function to add a new target for tracking 
    def add_target(self, image, rect, data=None): 
        x_start, y_start, x_end, y_end = rect 
        keypoints, descriptors = [], [] 
        for keypoint, descriptor in zip(*self.detect_features(image)): 
            x, y = keypoint.pt 
            if x_start <= x <= x_end and y_start <= y <= y_end: 
                keypoints.append(keypoint) 
                descriptors.append(descriptor) 

        descriptors = np.array(descriptors, dtype='uint8') 
        self.feature_matcher.add([descriptors]) 
        target = self.cur_target(image=image, rect=rect, keypoints=keypoints, descriptors=descriptors, data=None) 
        self.tracking_targets.append(target) 

    # To get a list of detected objects 
    def track_target(self, frame): 
        self.cur_keypoints, self.cur_descriptors = self.detect_features(frame) 

        if len(self.cur_keypoints) < self.min_matches: return []
        try: matches = self.feature_matcher.knnMatch(self.cur_descriptors, k=2)
        except Exception as e:
            print('Invalid target, please select another with features to extract')
            return []
        matches = [match[0] for match in matches if len(match) == 2 and match[0].distance < match[1].distance * 0.75] 
        if len(matches) < self.min_matches: return [] 

        matches_using_index = [[] for _ in range(len(self.tracking_targets))] 
        for match in matches: 
            matches_using_index[match.imgIdx].append(match) 

        tracked = [] 
        for image_index, matches in enumerate(matches_using_index): 
            if len(matches) < self.min_matches: continue 

            target = self.tracking_targets[image_index] 
            points_prev = [target.keypoints[m.trainIdx].pt for m in matches]
            points_cur = [self.cur_keypoints[m.queryIdx].pt for m in matches]
            points_prev, points_cur = np.float32((points_prev, points_cur))
            H, status = cv2.findHomography(points_prev, points_cur, cv2.RANSAC, 3.0) 
            status = status.ravel() != 0

            if status.sum() < self.min_matches: continue 

            points_prev, points_cur = points_prev[status], points_cur[status] 

            x_start, y_start, x_end, y_end = target.rect 
            quad = np.float32([[x_start, y_start], [x_end, y_start], [x_end, y_end], [x_start, y_end]])
            quad = cv2.perspectiveTransform(quad.reshape(1, -1, 2), H).reshape(-1, 2)
            track = self.tracked_target(target=target, points_prev=points_prev, points_cur=points_cur, H=H, quad=quad) 
            tracked.append(track) 

        tracked.sort(key = lambda x: len(x.points_prev), reverse=True) 
        return tracked 

    # Detect features in the selected ROIs and return the keypoints and descriptors 
    def detect_features(self, frame): 
        keypoints, descriptors = self.feature_detector.detectAndCompute(frame, None) 
        if descriptors is None: descriptors = [] 
        return keypoints, descriptors 

    # Function to clear all the existing targets 
    def clear_targets(self): 
        self.feature_matcher.clear() 
        self.tracking_targets = []

让跟踪开始！我们将移动纸箱看看会发生什么：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zdldb9gP-1681870996703)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/3a30807a-d43f-4524-9bb8-262be261d6be.png)]

如您所见，特征点正在关注区域内跟踪。让我们将其倾斜一下，看看会发生什么：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-59c6PRU7-1681870996703)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/907a2a06-e376-4b41-8723-630d3903d897.png)]

似乎正在正确跟踪特征点。我们可以看到，覆盖的矩形根据纸板箱的表面改变其方向。

这是执行此操作的代码：

import sys 
from collections import namedtuple 

import cv2 
import numpy as np

class VideoHandler(object): 
    def __init__(self, capId, scaling_factor, win_name): 
        self.cap = cv2.VideoCapture(capId)
        self.pose_tracker = PoseEstimator() 
        self.win_name = win_name
        self.scaling_factor = scaling_factor

        ret, frame = self.cap.read()
        self.rect = None
        self.frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)
        self.roi_selector = ROISelector(win_name, self.frame, self.set_rect) 

    def set_rect(self, rect): 
        self.rect = rect
        self.pose_tracker.add_target(self.frame, rect) 

    def start(self):
        paused = False
        while True:
            if not paused or self.frame is None: 
                ret, frame = self.cap.read()
                scaling_factor = self.scaling_factor
                frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA) 
                if not ret: break 
                self.frame = frame.copy() 

            img = self.frame.copy() 
            if not paused and self.rect is not None: 
                tracked = self.pose_tracker.track_target(self.frame) 
                for item in tracked: 
                    cv2.polylines(img, [np.int32(item.quad)], True, (255, 255, 255), 2) 
                    for (x, y) in np.int32(item.points_cur): 
                        cv2.circle(img, (x, y), 2, (255, 255, 255)) 

            self.roi_selector.draw_rect(img, self.rect) 
            cv2.imshow(self.win_name, img) 
            ch = cv2.waitKey(1) 
            if ch == ord(' '): paused = not paused 
            if ch == ord('c'): self.pose_tracker.clear_targets() 
            if ch == 27: break

if __name__ == '__main__': 
    VideoHandler(0, 0.8, 'Tracker').start()

代码内部发生了什么？

首先，我们有一个PoseEstimator类，在这里进行所有繁重的工作。我们需要一些东西来检测图像中的特征，并需要一些东西来匹配连续图像之间的特征。因此，我们使用 ORB 特征检测器和 Flann 特征匹配器在提取的特征中进行快速最近邻搜索。如您所见，我们在构造器中使用这些参数初始化类。

每当我们选择兴趣区域时，我们都会调用add_target方法将其添加到我们的跟踪目标列表中。此方法只是从兴趣区域提取特征并将其存储在一个类变量中。现在我们有了目标，我们已经准备好追踪它！

track_target方法处理所有跟踪。我们采用当前框架并提取所有关键点。但是，我们对视频当前帧中的所有关键点并不真正感兴趣。我们只想要属于我们目标对象的关键点。因此，现在我们的工作是在当前帧中找到最接近的关键点。

现在，我们在当前帧中有一组关键点，而在上一帧中还有来自目标对象的另一组关键点。下一步是从这些匹配点中提取单应性矩阵。这个单应性矩阵告诉我们如何变换覆盖的矩形，使其与纸板箱的表面对齐。我们只需要获取此单应性矩阵并将其应用于覆盖的矩形即可获得所有纸板箱点的新位置。

如何增强我们的现实

现在我们知道了如何跟踪平面对象，让我们看看如何将 3D 对象叠加在现实世界的顶部。对象是 3D，但屏幕上的视频是 2D。因此，这里的第一步是了解如何将那些 3D 对象映射到 2D 曲面，以使它们看起来逼真。我们只需要将这些 3D 点投影到平面上即可。

将坐标从 3D 映射到 2D

估计姿势后，我们会将点从 3D 投影到 2D。考虑下图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-B4pESzkz-1681870996704)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/4bd6b0ae-9b07-4896-b700-063662c4bba2.jpg)]

正如我们在此处看到的那样，电视遥控器是 3D 对象，但我们在 2D 平面上看到它。现在，如果我们四处移动，它将看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UJ5tJPTX-1681870996704)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/f0f3b54e-d053-4b36-93ee-0929474896cf.jpg)]

该 3D 对象仍在 2D 平面上。对象已移动到其他位置，并且距相机的距离也已更改。我们如何计算这些坐标？我们需要一种将该 3D 对象映射到 2D 曲面的机制。这是 3D 到 2D 投影真正重要的地方。

我们只需要估计摄像机的初始姿势即可。现在，假设摄像机的固有参数是已知的。因此，我们可以仅使用 OpenCV 中的solvePnP函数来估计摄像机的姿势。此函数用于使用一组点来估计对象的姿势，如以下代码所示。您可以在这个页面上阅读有关的更多信息：

solvePnP(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess, int flags)

完成此操作后，我们需要将这些点投影到 2D 平面上。我们使用 OpenCV projectPoints函数来执行此操作。此函数计算这些 3D 点在 2D 平面上的投影。

如何在视频上叠加 3D 对象

现在我们有了所有不同的模块，我们已经准备好构建最终系统。假设我们要在纸板箱顶部覆盖一个金字塔，如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iyrvFw04-1681870996704)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/6ed6118c-d277-4c3d-9d24-e0234012ae01.png)]

让我们倾斜纸箱看看会发生什么：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aOaqXI3F-1681870996704)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/9228ec7d-7308-409a-9bb2-8c5f517d4e00.png)]

看起来像金字塔在跟随表面。让我们添加第二个目标：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BBEcXZYF-1681870996705)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/234230d9-0a60-438d-8121-77e40c18e4ac.png)]

您可以继续添加更多目标，所有这些金字塔都会得到很好的跟踪。让我们看看如何使用 OpenCV Python 做到这一点。确保将以前的文件另存为pose_estimation.py，因为我们将从那里导入几个类：

import cv2 
import numpy as np 

from pose_estimation import PoseEstimator, ROISelector 

class Tracker(object): 
    def __init__(self, capId, scaling_factor, win_name): 
        self.cap = cv2.VideoCapture(capId) 
        self.rect = None
        self.win_name = win_name
        self.scaling_factor = scaling_factor
        self.tracker = PoseEstimator() 

        ret, frame = self.cap.read()
        self.rect = None
        self.frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)

        self.roi_selector = ROISelector(win_name, self.frame, self.set_rect)
        self.overlay_vertices = np.float32([[0, 0, 0], [0, 1, 0], [1, 1, 0], [1, 0, 0], [0.5, 0.5, 4]]) 
        self.overlay_edges = [(0, 1), (1, 2), (2, 3), (3, 0), (0,4), (1,4), (2,4), (3,4)] 
        self.color_base = (0, 255, 0) 
        self.color_lines = (0, 0, 0) 

    def set_rect(self, rect): 
        self.rect = rect
        self.tracker.add_target(self.frame, rect) 

    def start(self): 
        paused = False
        while True:
            if not paused or self.frame is None: 
                ret, frame = self.cap.read() 
                scaling_factor = self.scaling_factor
                frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor,\
                    interpolation=cv2.INTER_AREA) 
                if not ret: break 

                self.frame = frame.copy() 

            img = self.frame.copy() 
            if not paused: 
                tracked = self.tracker.track_target(self.frame) 
                for item in tracked: 
                    cv2.polylines(img, [np.int32(item.quad)], 
                     True, self.color_lines, 2) 
                    for (x, y) in np.int32(item.points_cur): 
                        cv2.circle(img, (x, y), 2, 
                         self.color_lines) 

                    self.overlay_graphics(img, item) 

            self.roi_selector.draw_rect(img, self.rect) 
            cv2.imshow(self.win_name, img) 
            ch = cv2.waitKey(1) 
            if ch == ord(' '): self.paused = not self.paused 
            if ch == ord('c'): self.tracker.clear_targets() 
            if ch == 27: break 

    def overlay_graphics(self, img, tracked):
        x_start, y_start, x_end, y_end = tracked.target.rect 
        quad_3d = np.float32([[x_start, y_start, 0], [x_end, 
         y_start, 0], 
                    [x_end, y_end, 0], [x_start, y_end, 0]]) 
        h, w = img.shape[:2] 
        K = np.float64([[w, 0, 0.5*(w-1)], 
                        [0, w, 0.5*(h-1)], 
                        [0, 0, 1.0]]) 
        dist_coef = np.zeros(4) 
        ret, rvec, tvec = cv2.solvePnP(objectPoints=quad_3d, imagePoints=tracked.quad,
                                       cameraMatrix=K, distCoeffs=dist_coef)
        verts = self.overlay_vertices * \
            [(x_end-x_start), (y_end-y_start), -(x_end-x_start)*0.3] + (x_start, y_start, 0) 
        verts = cv2.projectPoints(verts, rvec, tvec, cameraMatrix=K, distCoeffs=dist_coef)[0].reshape(-1, 2)

        verts_floor = np.int32(verts).reshape(-1,2) 
        cv2.drawContours(img, contours=[verts_floor[:4]], contourIdx=-1, color=self.color_base, thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[:2], verts_floor[4:5]))], contourIdx=-1, color=(0,255,0), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[1:3], verts_floor[4:5]))], contourIdx=-1, color=(255,0,0), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[2:4], verts_floor[4:5]))], contourIdx=-1, color=(0,0,150), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[3:4], verts_floor[0:1], verts_floor[4:5]))], contourIdx=-1, color=(255,255,0), thickness=-3)

        for i, j in self.overlay_edges: 
            (x_start, y_start), (x_end, y_end) = verts[i], verts[j]
            cv2.line(img, (int(x_start), int(y_start)), (int(x_end), int(y_end)), self.color_lines, 2) 

if __name__ == '__main__': 
    Tracker(0, 0.8, 'Augmented Reality').start()

让我们看一下代码

Tracker类用于执行此处的所有计算。我们使用通过边和顶点定义的金字塔结构初始化该类。我们用来跟踪表面的逻辑与我们之前讨论的相同，因为我们使用的是同一类。我们只需要使用solvePnP和projectPoints将 3D 金字塔映射到 2D 曲面即可。

让我们添加一些动作

现在我们知道如何添加虚拟金字塔，让我们看看是否可以添加一些移动。让我们看看如何动态更改金字塔的高度。当您开始时，金字塔将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dyrcsUey-1681870996705)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/fe2e0da8-0424-4630-beeb-8d7836cb050b.png)]

如果您等待一段时间，金字塔会更高，看起来像这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tLurfLU3-1681870996705)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8ed72801-e512-417c-b8df-58a3d0b8a739.png)]

让我们看看如何在 OpenCV Python 中做到这一点。在我们刚刚讨论的增强现实代码中，在Tracker类的__init__方法的末尾添加以下代码段：

self.overlay_vertices = np.float32([[0, 0, 0], [0, 1, 0], [1, 1, 0], [1, 0, 0], [0.5, 0.5, 4]]) 
self.overlay_edges = [(0, 1), (1, 2), (2, 3), (3, 0), 
            (0,4), (1,4), (2,4), (3,4)] 
self.color_base = (0, 255, 0) 
self.color_lines = (0, 0, 0) 

self.graphics_counter = 0 
self.time_counter = 0

现在我们有了结构，我们需要添加代码以动态更改高度。用以下方法替换overlay_graphics()方法：

    def overlay_graphics(self, img, tracked):
        x_start, y_start, x_end, y_end = tracked.target.rect 
        quad_3d = np.float32([[x_start, y_start, 0], [x_end, 
         y_start, 0], 
                    [x_end, y_end, 0], [x_start, y_end, 0]]) 
        h, w = img.shape[:2] 
        K = np.float64([[w, 0, 0.5*(w-1)], 
                        [0, w, 0.5*(h-1)], 
                        [0, 0, 1.0]]) 
        dist_coef = np.zeros(4) 
        ret, rvec, tvec = cv2.solvePnP(objectPoints=quad_3d, imagePoints=tracked.quad,
                                       cameraMatrix=K, distCoeffs=dist_coef)
        verts = self.overlay_vertices * \
            [(x_end-x_start), (y_end-y_start), -(x_end-x_start)*0.3] + (x_start, y_start, 0) 
        verts = cv2.projectPoints(verts, rvec, tvec, cameraMatrix=K,
                                  distCoeffs=dist_coef)[0].reshape(-1, 2)

        verts_floor = np.int32(verts).reshape(-1,2) 
        cv2.drawContours(img, contours=[verts_floor[:4]],
             contourIdx=-1, color=self.color_base, thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[:2],
            verts_floor[4:5]))], contourIdx=-1, color=(0,255,0), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[1:3],
            verts_floor[4:5]))], contourIdx=-1, color=(255,0,0), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[2:4],
            verts_floor[4:5]))], contourIdx=-1, color=(0,0,150), thickness=-3)
        cv2.drawContours(img, contours=[np.vstack((verts_floor[3:4],
            verts_floor[0:1], verts_floor[4:5]))], contourIdx=-1, color=(255,255,0),thickness=-3)

        for i, j in self.overlay_edges: 
            (x_start, y_start), (x_end, y_end) = verts[i], verts[j]
            cv2.line(img, (int(x_start), int(y_start)), (int(x_end), int(y_end)),     
                self.color_lines, 2)

现在我们知道了如何改变高度，让我们继续为我们做金字塔舞。我们可以使金字塔的尖端周期性地振荡。因此，当您开始时，它将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qjTA1Jtv-1681870996705)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/732ee17c-a040-4564-b545-644c64c6c6a4.png)]

如果您等待一段时间，它将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OYo5SlbH-1681870996706)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/61f100ee-5d20-4fc8-b8d9-c2e5901ac88d.png)]

您可以在augmented_reality_motion.py中查看实现的详细信息。

在下一个实验中，我们将使整个金字塔在兴趣区域内移动。我们可以使它以我们想要的任何方式移动。让我们开始添加围绕选定兴趣区域的线性对角线运动。当您开始时，它将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xpO7gG6f-1681870996706)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/8b7a6d15-04a4-41ae-95d4-5aff79501411.png)]

一段时间后，它将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9boKEyB0-1681870996706)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/fc62fd35-7143-46c3-ba6b-d51a7d0b423e.png)]

请参阅augmented_reality_dancing.py以了解如何更改overlay_graphics()方法使其跳舞。让我们看看是否可以使金字塔绕我们兴趣区域旋转。当您开始时，它将如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ffNX7gvk-1681870996706)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/762dc997-aea2-4b7e-8ce3-a027799710d2.png)]

一段时间后，它将移至新位置：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aOn9vjXa-1681870996707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/7c171210-b56f-4f74-93f5-754afda77d43.png)]

您可以参考augmented_reality_circular_motion.py来了解如何实现。您可以使其做任何您想做的事情。您只需要提出正确的数学公式，金字塔就会如您所愿地跳动！您还可以尝试其他虚拟对象，以了解如何使用它。您可以使用许多不同的对象执行很多操作。这些示例提供了很好的参考点，您可以在这些参考点上构建许多有趣的增强现实应用。

总结

在本章中，您了解了增强现实的前提，并了解了增强现实系统的外观。我们讨论了增强现实所需的几何变换。您还学习了如何使用这些转换来估计相机的姿势，并学习了如何跟踪平面对象。我们讨论了如何在现实世界的顶部添加虚拟对象。您学习了如何以不同方式修改虚拟对象以添加炫酷效果。

在下一章中，我们将学习如何将机器学习技术与人工神经网络一起应用，这将有助于我们增强第 9 章，“对象识别”中已经获得的知识。

十一、通过人工神经网络的机器学习

在本章中，您将学习如何构建 ANN 并对其进行训练以执行图像分类和对象识别。人工神经网络是机器学习的子集之一，我们将特别讨论 MLP 网络，它是模式识别范围内最常见的神经网络类型。

在本章的最后，我们将介绍以下内容：

机器学习（ML）和人工神经网络（ANN）之间的区别
多层感知器（MLP）网络
如何定义和实现 MLP 网络
评估和改善我们的人工神经网络
如何使用经过训练的 ANN 识别图像中的对象

机器学习（ML）与人工神经网络（ANN）

如前所述，ANN 是 ML 的子集。人工神经网络的灵感来自人类的理解；它们像大脑一样工作，由不同的相互连接的神经元层组成，每个神经元从上一层接收信息，对其进行处理，然后将其发送到下一层，直到接收到最终输出。在监督学习的情况下，此输出可以来自标记的输出，在监督学习的情况下，此输出可以来自某些匹配的条件。

人工神经网络的特点是什么？机器学习被定义为计算机科学领域，专注于尝试在数据集中查找模式，而 ANN 则更侧重于模拟人脑如何连接以完成这项工作，将模式检测划分为多个层（称为节点）神经元。

同时，其他的机器学习算法，例如支持向量机（SVM），在对象模式识别和分类上也越来越流行。 SVM 具有机器学习算法中最好的准确率之一。 ANN 具有更多的应用集，能够检测大多数类型的数据结构上的模式（SVM 主要与特征向量一起使用），并且可以进行更多参数化以在同一实现中实现不同的目标。

此外，与其他 ML 策略（例如 SVM）相比，ANN 的另一个优势是 ANN 是一种概率分类器，允许进行多类分类。这意味着它可以检测图像中的多个物体。另一方面，SVM 是一种非概率二分类器。

ANN 什么时候有用？想象一下，我们已经实现了一个对象识别器，该对象识别器经过训练可以识别背包和鞋子，然后得到以下图像：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3UmXXdQj-1681870996707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/a0881f1d-6f6a-41cf-8e13-b695cd53cd57.jpg)]

我们在其上运行特征检测器，并获得如下结果：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-10tiFiQn-1681870996707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/4e67239e-7bcf-4fad-bf4a-bbd498945939.png)]

从上一张图像可以看到，我们的特征检测器算法从女孩的背包和鞋子中获得了特征向量。因此，如果我们在此图像上运行第 9 章，“对象识别”的 SVM 分类器，由于采用了线性分类器，它宁愿只检测背包即使图像中也包含鞋类。

SVM 还可以使用称为核技巧的东西执行非线性分类，并将输入隐式映射到高维特征空间。

ANN 如何工作？

在本节中，我们将看到哪些元素参与了 ANN-MLP。首先，我们将代表一个常规的 ANN-MLP 形状，每层的输入，输出和隐藏以及信息如何在它们之间流动：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4HlUDiAZ-1681870996707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/318a67e3-513c-4ee1-aefb-5e2ffc0d72d9.jpg)]

MLP 网络至少由三层组成：

输入层：每个 MLP 始终具有这些层之一。它是一个被动层，这意味着它不会修改数据。它从外界接收信息，并将其发送到网络。该层中节点（神经元）的数量将取决于我们要从图像中提取的特征或描述性信息的数量。例如，在使用特征向量的情况下，向量中的每一列将有一个节点。
隐藏层：所有基础工作都在此层进行。它将输入转换为输出层或另一个隐藏层可以使用的东西（可以有多个）。该层充当黑匣子，在接收到的输入中检测模式并评估每个输入的权重。其行为将由其激活函数提供的方程式定义。
输出层：此层也将始终存在，但是在这种情况下，节点数将由所选的神经网络定义。该层可能具有三个神经元。输出层可以由单个节点构建（线性回归），也就是说，我们想知道图像是否带有背包。但是在进行多类分类的情况下，该层将包含几个可以识别的节点，每个对象一个。默认情况下，每个节点都会产生一个值，该值的范围为[-1,1]，该值定义对象是否存在的可能性，并允许在单个输入图像上进行多类检测。

假设我们要构建一个三层神经网络，每个神经网络之一：输入，隐藏和输出。输入层中的节点数将由我们数据的维数决定。输出层中的节点数将由我们拥有的模型数来定义。关于隐藏层，节点或什至层的数量将取决于问题的复杂性和我们要添加到网络中的准确率。高维数将提高结果的准确率，但也会增加计算成本。对于隐藏层要采取的另一个决定是使用激活函数，该函数使我们能够拟合非线性假设并根据所提供的数据获得更好的模式检测。激活函数的常见选择是Sigmoid函数，默认情况下使用该函数，其中会根据概率评估输出，但还有其他选择，例如 tanh 或 ReLU 。

更深入地研究具有隐藏层的每个神经元，我们可以说它们所有的行为都类似。从上一层（输入节点）中检索值，并与某些权重（每个神经元各自）加偏差项相加。使用激活函数f转换总和，该函数对于不同的神经元也可能有所不同，如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1isyiGBb-1681870996708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/c54aca33-cf6d-48db-b56e-c0a841f4652c.png)]

如何定义多层感知器（MLP）

MLP 是 ANN 的一个分支，由于其能够在嘈杂或意外环境中识别模式，因此广泛用于模式识别。 MLP 可用于实现有监督和无监督的学习（在第 9 章，“对象识别”中都对它们进行了讨论）。除此之外，MLP 还可以用于实现另一种学习，例如受行为心理学启发的强化学习，其中使用奖励/惩罚行为来调整网络学习。

定义 ANN-MLP 包括确定组成我们的网络的层的结构以及每个层中有多少个节点。首先，我们需要确定我们网络的目标是什么。例如，我们可以实现一个对象识别器，在这种情况下，属于输出层的节点数量将与我们要识别的不同对象的数量相同。模拟第 9 章中的示例，对象识别，在识别手袋，鞋类和衣服的情况下，输出层将具有三个节点，其值将映射为概率元组而不是固定的元组值，例如[1,0,0]，[0,1,0]和[0,0,1]。因此，有可能在同一幅图像中识别一个以上的类，例如，一个背包穿拖鞋的女孩。

一旦确定了网络的结果，就应该定义可以将每个物体要识别的有意义的信息插入到我们的网络中，从而能够将对象识别为未知图像。有几种方法作为图像的特征描述符。我们可以使用定向直方图（HOG）来统计图像局部区域中梯度方向的出现，或使用彩色直方图来表示图像中的颜色分布，或者我们也可以使用具有 SIFT 或 SURF 算法的密集特征检测器提取图像特征。由于插入输入层的每个图像的描述符数量必须相同，因此我们将使用词袋策略，将所有描述符集收集到单个视觉词直方图中，就像我们在第 9 章，“对象识别”，供使用 SVM 识别器。直方图如下所示，其中每个条形值都将链接到输入层中的一个节点：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wc3ytq7y-1681870996708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/be7835c3-47a8-409e-872e-e2ed62b0f408.png)]

最后，我们进入隐藏层。该层没有严格定义的结构，因此将是一个复杂的决定。关于如何确定隐藏层的数量以及其中的节点数量，不同的研究人员之间进行了大量讨论。它们全都依靠问题的复杂性来解决，并在性能和准确率之间找到平衡—更多的节点/层将具有更高的准确率，但性能却很差。同样，可能会导致大量节点，并且网络过度安装不仅会导致性能降低，而且还会导致精度降低。对于只有三个模型的简单对象识别器，它不需要一个以上的隐藏层，对于其中的节点数，我们可以采用 Heaton 研究，它设置了以下规则：

隐藏神经元的数量应在输入层的大小和输出层的大小之间
隐藏神经元的数量应为输入层大小的三分之二加上输出层大小的三分之二
隐藏神经元的数量应小于输入层大小的两倍

如何实现 ANN-MLP 分类器？

在对如何实现人工神经网络进行了所有理论解释之后，我们将自己实现。为此，就像我们在 SVM 分类器中所做的一样，我们将从相同的源下载训练图像。。我们将从几个可以轻松扩展到其他项目的项目开始，创建一个文件夹images，为我们要分类的每个类别创建一个子文件夹：dresses，footwear和bagpack。我们将为它们分别拍摄一堆图像；大约 20 到 25 张图像应该足以进行训练，最重要的是，我们将包括另一组样本图像，我们将使用它们来评估训练后网络的准确率。

如前所述，我们需要使用词袋（BOW）对齐每个图像的描述符数量。为此，我们将首先使用密集特征检测器为每个图像馈送的关键点提取每个图像的特征向量，然后将向量转发到 K 均值聚类以提取质心，这将帮助我们最终获得 BOW 。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rfwgv1fo-1681870996708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/d7433038-0ce3-4c7f-a3ad-b672ae44a354.png)]

从上一张图像可以看出，这与我们在 SVM 分类器中实现的过程相同。为了节省时间和代码，我们将利用先前创建的create_features.py文件来提取所有将用作 MLP 网络输入的特征描述符。

通过运行以下命令，我们将获得下一步所需的每个映射文件：

$ python create_features.py --samples bag images/bagpack/ --samples dress images/dress/ --samples footwear images/footwear/ --codebook-file models/codebook.pkl --feature-map-file models/feature_map.pkl

在feature_map.pkl文件中，我们拥有训练期间将参与的每个图像的特征向量。首先，我们将为 ANN 分类器创建一个类，在其中设置网络层的大小：

from sklearn import preprocessing
import numpy as np
import cv2
import random

class ClassifierANN(object):
    def __init__(self, feature_vector_size, label_words):
        self.ann = cv2.ml.ANN_MLP_create()
        self.label_words = label_words
        # Number of centroids used to build the feature vectors
        input_size = feature_vector_size
        # Number of models to recongnize
        output_size = len(label_words)
        # Applying Heaton rules
        hidden_size = (input_size * (2/3)) + output_size 
        nn_config = np.array([input_size, hidden_size, output_size], dtype=np.uint8)
        self.ann.setLayerSizes(np.array(nn_config))
        # Symmetrical Sigmoid as activation function
        self.ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM, 2, 1)
        # Map models as tuples of probabilities
        self.le = preprocessing.LabelBinarizer() 
        self.le.fit(label_words) # Label words are ['dress', 'footwear', 'backpack']

作为输出，我们决定用二进制数[0,0,1]，[0,1,0]，[1,0,0]实现一个概率元组，目的是通过这种方式获得多类检测。作为激活函数对称 Sigmoid（NN_MLP_SIGMOID_SYM），这是 MLP 的默认选择，其中输出将在[-1,1]范围内。这样，我们的网络生成的输出将定义概率而不是分类结果，从而能够识别同一样本图像中的两个或三个对象。

在训练过程中，我们将数据集分为两个不同的集：训练和测试。我们将为其定义一个比率（通常，大多数示例建议使用 75% 作为训练集，但可以对其进行调整，直到获得最佳准确率为止），并随机选择项目以防止偏差。这是如何运作的？

class ClassifierANN(object):
...
    def train(self, training_set):
        label_words = [ item['label'] for item in training_set]
        dim_size = training_set[0]['feature_vector'].shape[1]
        train_samples = np.asarray(
            [np.reshape(x['feature_vector'], (dim_size,)) for x in training_set]
        )
        # Convert item labels into encoded binary tuples
        train_response = np.array(self.le.transform(label_words), dtype=np.float32)
        self.ann.train(np.array(train_samples, 
            dtype=np.float32),cv2.ml.ROW_SAMPLE,
            np.array(train_response, dtype=np.float32)
        )

在这种情况下，我们对输入层的每个节点使用了相同的权重（默认行为），但是我们可以指定它们为特征向量中的列提供更多权重，并提供更多重要信息。

评估训练后的网络

为了评估我们训练有素的 MLP 网络的鲁棒性和准确率，我们将计算混淆矩阵（也称为误差矩阵）。该矩阵将描述我们分类模型的表现。混淆矩阵的每一行代表预测类中的实例，而每一列代表实际类中的实例（反之亦然）。为了填充矩阵，我们将使用测试集对其进行评估：

from collections import OrderedDict

def init_confusion_matrix(self, label_words):
    confusion_matrix = OrderedDict()
    for label in label_words:
        confusion_matrix[label] = OrderedDict()
        for label2 in label_words: confusion_matrix[label][label2] = 0
    return confusion_matrix

# Chooses the class with the greatest value, only one, in the tuple(encoded_word)
def classify(self, encoded_word, threshold=0.5):
    models = self.le.inverse_transform(np.asarray([encoded_word]), threshold)
    return models[0]

# Calculate the confusion matrix from given testing data set
def get_confusion_matrix(self, testing_set):
    label_words = [item['label'] for item in testing_set]
    dim_size = testing_set[0]['feature_vector'].shape[1]
    test_samples = np.asarray(
        [np.reshape(x['feature_vector'], (dim_size,)) for x in testing_set]
    )
    expected_outputs = np.array(self.le.transform(label_words), dtype=np.float32)
    confusion_matrix = self._init_confusion_matrix(label_words)
    retval, test_outputs = self.ann.predict(test_samples)
    for expected_output, test_output in zip(expected_outputs, test_outputs):
        expected_model = self.classify(expected_output)
        predicted_model = self.classify(test_output)
        confusion_matrix[expected_model][predicted_model] += 1
    return confusion_matrix

作为样本混淆矩阵，并考虑一个包含 30 个元素的测试集，我们可能获得以下结果：

	鞋子	背包	连衣裙
鞋子	8	2	0
背包	2	7	1
连衣裙	2	2	6

考虑到先前的矩阵，我们可以通过以下公式计算训练网络的准确率：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2qN0oi8r-1681870996709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/02113d23-f371-47c3-b086-8d2bab56864e.jpg)]

在此公式中，我们表示真正例（TP），真负例（TN），假正例（FP）和假负例（FN）。就鞋类而言，我们可以说其准确率为 80%。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TbaZQ84p-1681870996709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/ca95db47-227f-4a1c-9c72-eebe21e923fa.jpg)]

上式的实现代码如下：

def calculate_accuracy(confusion_matrix):
    acc_models = OrderedDict()
    for model in confusion_matrix.keys():
        acc_models[model] = {'TP':0, 'TN':0, 'FP':0, 'FN': 0}
    for expected_model, predicted_models in confusion_matrix.items():
        for predicted_model, value in predicted_models.items():
            if predicted_model == expected_model:
                acc_models[expected_model]['TP'] += value
                acc_models[predicted_model]['TN'] += value
            else:
                acc_models[expected_model]['FN'] += value
                acc_models[predicted_model]['FP'] += value

    for model, rep in acc_models.items():
        acc = (rep['TP']+rep['TN'])/(rep['TP']+rep['TN']+rep['FN']+rep['FP'])
        print('%s \t %f' % (model,acc))

收集本节中的每个代码块，我们已经实现了ClassifierANN类以供使用：

###############
# training.py
###############

import pickle

def build_arg_parser():
    parser = argparse.ArgumentParser(description='Creates features for given images')
    parser.add_argument("--feature-map-file", dest="feature_map_file", required=True,
        help="Input pickle file containing the feature map")
    parser.add_argument("--training-set", dest="training_set", required=True,
        help="Percentage taken for training. ie 0.75")
    parser.add_argument("--ann-file", dest="ann_file", required=False,
        help="Output file where ANN will be stored")
    parser.add_argument("--le-file", dest="le_file", required=False,
                        help="Output file where LabelEncoder class will be stored")

if __name__ == '__main__':
    args = build_arg_parser().parse_args()

    # Load the Feature Map
    with open(args.feature_map_file, 'rb') as f:
        feature_map = pickle.load(f)

    training_set, testing_set = split_feature_map(feature_map, float(args.training_set))
    label_words = np.unique([item['label'] for item in training_set])
    cnn = ClassifierANN(len(feature_map[0]['feature_vector'][0]), label_words)
    cnn.train(training_set)
    print("===== Confusion Matrix =====")
    confusion_matrix = cnn.get_confusion_matrix(testing_set)
    print(confusion_matrix)
    print("===== ANN Accuracy =====")
    print_accuracy(confusion_matrix)

    if 'ann_file' in args and 'le_file' in args:
        print("===== Saving ANN =====")
        with open(args.ann_file, 'wb') as f:
            cnn.ann.save(args.ann_file)
        with open(args.le_file, 'wb') as f:
            pickle.dump(cnn.le, f)
        print('Saved in: ', args.ann_file)

您可能已经注意到，我们已经将 ANN 保存到两个单独的文件中，因为ANN_MLP类具有自己的保存和加载方法。我们需要保存用于训练网络的label_words。 Pickle 为我们提供了对对象结构进行序列化和反序列化以及从磁盘保存和加载它们的功能，除了ann这样的结构有自己的实现。

运行以下命令以获取模型文件。混淆矩阵和准确率概率将与其一起显示：

$ python training.py --feature-map-file models/feature_map.pkl --training-set 0.8 --ann-file models/ann.yaml --le-file models/le.pkl

为了获得训练有素的网络，我们可以根据需要重复执行多次，直到获得良好的精度结果为止。发生这种情况是因为训练和测试集是随机抽取的，因此我们应该保留结果更好的那个。

图片分类

要实现我们的 ANN 分类器，我们将需要重用第 9 章，“对象识别”中create_feature.py文件中FeatureExtractor类的方法，这将使我们能够计算我们要评估的图像中的特征向量：

class FeatureExtractor(object):
   def get_feature_vector(self, img, kmeans, centroids):
        return Quantizer().get_feature_vector(img, kmeans, centroids)

考虑将create_feature文件包含在同一文件夹中。现在，我们准备实现分类器：

###############
# classify_data.py
###############

import argparse 
import _pickle as pickle 

import cv2 
import numpy as np 

import create_features as cf

# Classifying an image 
class ImageClassifier(object): 
    def __init__(self, ann_file, le_file, codebook_file):
        with open(ann_file, 'rb') as f:
            self.ann = cv2.ml.ANN_MLP_load(ann_file)
        with open(le_file, 'rb') as f:
            self.le = pickle.load(f)

        # Load the codebook 
        with open(codebook_file, 'rb') as f: 
            self.kmeans, self.centroids = pickle.load(f)

    def classify(self, encoded_word, threshold=None):
        models = self.le.inverse_transform(np.asarray(encoded_word), threshold)
        return models[0]

    # Method to get the output image tag 
    def getImageTag(self, img): 
        # Resize the input image 
        img = cf.resize_to_size(img) 
        # Extract the feature vector
        feature_vector = cf.FeatureExtractor().get_feature_vector(img, self.kmeans, self.centroids) 
        # Classify the feature vector and get the output tag
        retval, image_tag = self.ann.predict(feature_vector)
        return self.classify(image_tag)

def build_arg_parser(): 
    parser = argparse.ArgumentParser(
        description='Extracts features from each line and classifies the data') 
    parser.add_argument("--input-image", dest="input_image", required=True,
        help="Input image to be classified")
    parser.add_argument("--codebook-file", dest="codebook_file", required=True,
        help="File containing the codebook")
    parser.add_argument("--ann-file", dest="ann_file", required=True,
        help="File containing trained ANN")
    parser.add_argument("--le-file", dest="le_file", required=True,
        help="File containing LabelEncoder class")
    return parser 

if __name__=='__main__': 
    args = build_arg_parser().parse_args() 
    codebook_file = args.codebook_file
    input_image = cv2.imread(args.input_image) 

    tag = ImageClassifier(args.ann_file, args.le_file, codebook_file).getImageTag(input_image)
    print("Output class:", tag)

运行以下命令对图像进行分类：

$ python classify_data.py --codebook-file models/codebook.pkl --ann-file models/ann.yaml --le-file models/le.pkl --input-imagehttps://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/opencv-3x-py-example/img/test.png

总结

在本章中，您学习了 ANN 的概念。您还了解到，它在对象识别领域的用途之一是 MLP 的实现，包括 MLP 相对于其他机器学习策略（例如 SVM）的优缺点。关于 ANN-MLP，您了解了哪些层形成其结构，以及如何定义和实现它们以构建图像分类器，然后学习了如何评估 MLP，训练其鲁棒性和准确率。在上一节中，我们实现了一个 MLP 的示例来检测未知图像中的物体。

请记住，计算机视觉世界充满了无限的可能性！本书旨在教您入门各种项目所需的技能。现在，由您和您的想象力来使用您在这里获得的技能来构建一些独特而有趣的东西。