检测图像中的多个对象并在视频中跟踪它们

最新推荐文章于 2024-01-06 00:41:52 发布

Adam婷

最新推荐文章于 2024-01-06 00:41:52 发布

阅读量1k

点赞数 1

分类专栏：计算机视觉算法深度学习

本文链接：https://blog.csdn.net/weixin_41697507/article/details/116640493

版权

算法同时被 3 个专栏收录

161 篇文章 4 订阅

订阅专栏

深度学习

94 篇文章 2 订阅

订阅专栏

计算机视觉

9 篇文章 5 订阅

订阅专栏

现在，我将向您展示如何使用预先训练的分类器来检测图像中的多个对象，然后在视频中跟踪它们。

图像分类（识别）和物体检测之间有什么区别？在分类中，您确定图像中的主要对象是什么，整个图像按一个类别进行分类。在检测中，在图像中识别出多个对象，进行分类，并确定位置（作为边界框）。

Object Detection in Images

有多种对象检测算法，其中最流行的是YOLO和SSD。对于这个故事，我将使用YOLOv3。我不会深入探讨YOLO（只看一次）的工作原理的技术细节-您可以在此处阅读该书-但要专注于如何在自己的作品中使用它自己的应用程序。
因此，让我们跳入代码！此处的Yolo检测代码基于Erik Lindernoren’s Joseph Redmon和Ali Farhadi的论文的实现
。下面的代码段来自Jupyter Notebook，您可以在Github repo中找到。在运行此程序之前，您需要在config 文件夹中运行download_weights.sh 脚本以下载Yolo weights文件。我们首先导入所需的模块：

from models import *
from utils import *
import os, sys, time, datetime, random
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image

然后，我们加载预训练的配置和权重，以及在其上训练了Darknet模型的COCO数据集的类名称。与在PyTorch中一样，加载后不要忘了将模型设置为评估模式。

config_path='config/yolov3.cfg'
weights_path='config/yolov3.weights'
class_path='config/coco.names'
img_size=416
conf_thres=0.8
nms_thres=0.4
# Load model and weights
model = Darknet(config_path, img_size=img_size)
model.load_weights(weights_path)
model.cuda()
model.eval()
classes = utils.load_classes(class_path)
Tensor = torch.cuda.FloatTensor

上面还有一些预定义的值：图像大小（416像素正方形），置信度阈值和非最大抑制阈值。
以下是将返回指定图像检测结果的基本功能。请注意，它需要一个枕头图像作为输入。大多数代码将图像调整为416px正方形，同时保持其宽高比并填充溢出。实际检测在最后4行中。

def detect_image(img):
    # scale and pad image
    ratio = min(img_size/img.size[0], img_size/img.size[1])
    imw = round(img.size[0] * ratio)
    imh = round(img.size[1] * ratio)
    img_transforms=transforms.Compose([transforms.Resize((imh,imw)),
         transforms.Pad((max(int((imh-imw)/2),0), 
              max(int((imw-imh)/2),0), max(int((imh-imw)/2),0),
              max(int((imw-imh)/2),0)), (128,128,128)),
         transforms.ToTensor(),
         ])
    # convert image to Tensor
    image_tensor = img_transforms(img).float()
    image_tensor = image_tensor.unsqueeze_(0)
    input_img = Variable(image_tensor.type(Tensor))
    # run inference on the model and get detections
    with torch.no_grad():
        detections = model(input_img)
        detections = utils.non_max_suppression(detections, 80, 
                        conf_thres, nms_thres)
    return detections[0]

最后，通过加载图像，获取检测结果，然后将其与检测到的物体周围的边界框一起显示，将其放在一起。同样，这里的大多数代码都处理图像的缩放和填充，以及为每个检测到的类获取不同的颜色。

# load image and get detections
img_path = "images/blueangels.jpg"
prev_time = time.time()
img = Image.open(img_path)
detections = detect_image(img)
inference_time = datetime.timedelta(seconds=time.time() - prev_time)
print ('Inference Time: %s' % (inference_time))
# Get bounding-box colors
cmap = plt.get_cmap('tab20b')
colors = [cmap(i) for i in np.linspace(0, 1, 20)]
img = np.array(img)
plt.figure()
fig, ax = plt.subplots(1, figsize=(12,9))
ax.imshow(img)
pad_x = max(img.shape[0] - img.shape[1], 0) * (img_size / max(img.shape))
pad_y = max(img.shape[1] - img.shape[0], 0) * (img_size / max(img.shape))
unpad_h = img_size - pad_y
unpad_w = img_size - pad_x
if detections is not None:
    unique_labels = detections[:, -1].cpu().unique()
    n_cls_preds = len(unique_labels)
    bbox_colors = random.sample(colors, n_cls_preds)
    # browse detections and draw bounding boxes
    for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
        box_h = ((y2 - y1) / unpad_h) * img.shape[0]
        box_w = ((x2 - x1) / unpad_w) * img.shape[1]
        y1 = ((y1 - pad_y // 2) / unpad_h) * img.shape[0]
        x1 = ((x1 - pad_x // 2) / unpad_w) * img.shape[1]
        color = bbox_colors[int(np.where(
             unique_labels == int(cls_pred))[0])]
        bbox = patches.Rectangle((x1, y1), box_w, box_h,
             linewidth=2, edgecolor=color, facecolor='none')
        ax.add_patch(bbox)
        plt.text(x1, y1, s=classes[int(cls_pred)], 
                color='white', verticalalignment='top',
                bbox={'color': color, 'pad': 0})
plt.axis('off')
# save image
plt.savefig(img_path.replace(".jpg", "-det.jpg"),        
                  bbox_inches='tight', pad_inches=0.0)
plt.show()

您可以将这些代码片段放在一起以运行代码，或从Github 下载笔记本。以下是图像中物体检测的一些示例：
在这里插入图片描述

Object Tracking in Videos

因此，现在您知道了如何检测图像中的不同对象。当您在视频中逐帧执行可视化效果时，您会发现这些跟踪框在四处移动。但是，如果这些视频帧中有多个对象，您如何知道一帧中的对象是否与前一帧中的对象相同？这就是所谓的对象跟踪，它会随着时间的推移使用多次检测来识别特定的对象。
有几种算法可以做到这一点，我决定使用SORT，它非常易于使用，而且速度非常快。 SORT（简单的在线和实时跟踪）是Alex Bewley，Zongyuan Ge，Lionel Ott，Fabio Ramos，Ben Upcroft于2017年发表的一篇论文，建议使用卡尔曼滤波器预测先前识别出的物体的轨迹，并将其与新的检测结果进行匹配。作者Alex Bewley还写了一个通用的Python实现，我将用它来讲述这个故事。确保您从Github存储库下载了Sort版本，因为我必须进行一些小的更改才能将其集成到我的项目中。

现在进入代码，前3个代码段将与单幅图像检测中的相同，因为它们处理的是在单个帧上进行YOLO检测。区别在于最后一部分，对于每次检测，我们都调用Sort对象的Update函数，以获取对图像中对象的引用。因此，我们将获得跟踪的对象，这些对象除了上面的参数之外，还包括对象ID。然后，我们以几乎相同的方式显示，但添加该ID并使用不同的颜色，以便您可以轻松地看到视频帧中的对象。

我还使用OpenCV读取视频并显示视频帧。请注意，Jupyter笔记本电脑处理视频的速度很慢。您可以将其用于测试和简单的可视化，但是我还提供了一个独立的Python脚本，该脚本将读取源视频，并输出包含跟踪对象的副本。在笔记本电脑上播放OpenCV视频并不容易，因此您可以保留此代码用于其他实验。

videopath = 'video/intersection.mp4'
%pylab inline 
import cv2
from IPython.display import clear_output
cmap = plt.get_cmap('tab20b')
colors = [cmap(i)[:3] for i in np.linspace(0, 1, 20)]
# initialize Sort object and video capture
from sort import *
vid = cv2.VideoCapture(videopath)
mot_tracker = Sort()
#while(True):
for ii in range(40):
    ret, frame = vid.read()
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    pilimg = Image.fromarray(frame)
    detections = detect_image(pilimg)
    img = np.array(pilimg)
    pad_x = max(img.shape[0] - img.shape[1], 0) * 
            (img_size / max(img.shape))
    pad_y = max(img.shape[1] - img.shape[0], 0) * 
            (img_size / max(img.shape))
    unpad_h = img_size - pad_y
    unpad_w = img_size - pad_x
    if detections is not None:
        tracked_objects = mot_tracker.update(detections.cpu())
        unique_labels = detections[:, -1].cpu().unique()
        n_cls_preds = len(unique_labels)
        for x1, y1, x2, y2, obj_id, cls_pred in tracked_objects:
            box_h = int(((y2 - y1) / unpad_h) * img.shape[0])
            box_w = int(((x2 - x1) / unpad_w) * img.shape[1])
            y1 = int(((y1 - pad_y // 2) / unpad_h) * img.shape[0])
            x1 = int(((x1 - pad_x // 2) / unpad_w) * img.shape[1])
            color = colors[int(obj_id) % len(colors)]
            color = [i * 255 for i in color]
            cls = classes[int(cls_pred)]
            cv2.rectangle(frame, (x1, y1), (x1+box_w, y1+box_h),
                         color, 4)
            cv2.rectangle(frame, (x1, y1-35), (x1+len(cls)*19+60,
                         y1), color, -1)
            cv2.putText(frame, cls + "-" + str(int(obj_id)), 
                        (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 
                        1, (255,255,255), 3)
    fig=figure(figsize=(12, 8))
    title("Video Stream")
    imshow(frame)
    show()
    clear_output(wait=True)

使用笔记本电脑玩游戏后，您可以使用常规的Python脚本进行实时处理（可以从摄像机获取输入）并保存视频。这是我通过此程序生成的视频示例。

就是这样，您现在可以自己尝试检测图像中的多个对象，并跨视频帧跟踪这些对象。
如果您想在自定义图像数据集上检测和跟踪自己的对象，则可以阅读有关下一个故事 Training Yolo for Object Detection on a Custom Dataset .

Adam婷

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
打赏
5
评论
检测图像中的多个对象并在视频中跟踪它们

现在，我将向您展示如何使用预先训练的分类器来检测图像中的多个对象，然后在视频中跟踪它们。图像分类（识别）和物体检测之间有什么区别？在分类中，您确定图像中的主要对象是什么，整个图像按一个类别进行分类。在检测中，在图像中识别出多个对象，进行分类，并确定位置（作为边界框）。Object Detection in Images有多种对象检测算法，其中最流行的是YOLO和SSD。对于这个故事，我将使用YOLOv3。我不会深入探讨YOLO（只看一次）的工作原理的技术细节-您可以在此处阅读该书-但要专注于如
复制链接

扫一扫