pytorch使用模型预测_使用PyTorch从零开始对边界框进行预测

最新推荐文章于 2024-06-27 13:45:25 发布

weixin_26632369

最新推荐文章于 2024-06-27 13:45:25 发布

阅读量1.5k

点赞数

文章标签：机器学习 python 人工智能深度学习 java

原文链接：https://towardsdatascience.com/bounding-box-prediction-from-scratch-using-pytorch-a8525da51ddc

版权

这篇博客介绍了如何使用PyTorch从零开始进行边界框预测，详细阐述了在深度学习框架下进行目标检测的方法。

摘要由CSDN通过智能技术生成

pytorch使用模型预测

Object detection is a very popular task in Computer Vision, where, given an image, you predict (usually rectangular) boxes around objects present in the image and also recognize the types of objects. There could be multiple objects in your image and there are various state-of-the-art techniques and architectures to tackle this problem like Faster-RCNN and YOLO v3.

对象检测是Computer Vision中非常流行的任务，在给定图像的情况下，您可以预测图像中存在的对象周围的框(通常为矩形)，并识别对象的类型。您的图像中可能有多个对象，并且有各种各样的最新技术和体系结构可以解决此问题，例如Faster-RCNN和YOLO v3 。

This article talks about the case when there is only one object of interest present in an image. The focus here is more on how to read an image and its bounding box, resize and perform augmentations correctly, rather than on the model itself. The goal is to have a good grasp of the fundamental ideas behind object detection, which you can extend to get a better understanding of the more complex techniques.

本文讨论了图像中仅存在一个感兴趣的对象的情况。这里的重点更多地放在如何读取图像及其边界框，正确调整大小和执行增强上，而不是模型本身上。目的是要很好地掌握对象检测背后的基本思想，您可以对其进行扩展以更好地理解更复杂的技术。

Here’s a link to the notebook consisting of all the code I’ve used for this article: https://jovian.ml/aakanksha-ns/road-signs-bounding-box-prediction

这是笔记本的链接，包含我在本文中使用的所有代码： https : //jovian.ml/aakanksha-ns/road-signs-bounding-box-prediction

If you’re new to Deep Learning or PyTorch, or just need a refresher, this might interest you:

如果您是深度学习或PyTorch的新手，或者只是需要复习，那么您可能会感兴趣：

问题陈述 (Problem Statement)

Given an image consisting of a road sign, predict a bounding box around the road sign and identify the type of road sign.

给定包含路标的图像，请预测路标周围的边界框并确定路标的类型。

There are four distinct classes these signs could belong to:

这些迹象可能属于四个不同的类别：

Traffic Light
红绿灯
Stop
停止
Speed Limit
速度极限
Crosswalk
人行横道

This is called a multi-task learning problem as it involves performing two tasks — 1) regression to find the bounding box coordinates, 2) classification to identify the type of road sign

这被称为多任务学习问题，因为它涉及执行两个任务-1)回归以找到边界框坐标，2)分类以识别路标的类型

数据集 (Dataset)

I’ve used the Road Sign Detection Dataset from Kaggle:

我使用了Kaggle的道路标志检测数据集：

It consists of 877 images. It’s a pretty imbalanced dataset, with most images belonging to the speed limit class, but since we’re more focused on the bounding box prediction, we can ignore the imbalance.

它包含877张图像。这是一个非常不平衡的数据集，大多数图像都属于speed limit类，但是由于我们更加关注边界框预测，因此可以忽略不平衡。

加载数据 (Loading the Data)

The annotations for each image were stored in separate XML files. I followed the following steps to create the training dataframe:

每个图像的注释存储在单独的XML文件中。我按照以下步骤创建了训练数据框：

Walk through the training directory to get a list of all the .xml files.
浏览培训目录以获取所有.xml文件的列表。
Parse the .xml file using xml.etree.ElementTree
使用xml.etree.ElementTree解析.xml文件
Create a dictionary consisting of filepath, width , height , the bounding box coordinates ( xmin , xmax , ymin , ymax ) and class for each image and append the dictionary to a list.
为每个图像创建一个包含filepath ， width ， height ，边界框坐标( xmin ， xmax ， ymin ， ymax )和class的字典，并将该字典附加到列表中。
Create a pandas dataframe using the list of dictionaries of image stats.
使用图像统计字典列表创建一个熊猫数据框。

Label encode the class column
标签编码class列

调整图像大小和边框 (Resizing Images and Bounding Boxes)

Since training a computer vision model needs images to be of the same size, we need to resize our images and their corresponding bounding boxes. Resizing an image is straightforward but resizing the bounding box is a little tricky because each box is relative to an image and its dimensions.

由于训练计算机视觉模型需要图像的大小相同，因此我们需要调整图像及其对应的边界框的大小。调整图像大小很简单，但是调整边界框的大小有些棘手，因为每个框都与图像及其尺寸有关。

Here’s how resizing a bounding box works:

调整边界框大小的方法如下：

Convert the bounding box into an image (called mask) of the same size as the image it corresponds to. This mask would just have 0 for background and 1 for the area covered by the bounding box.
将边界框转换为与其对应的图像大小相同的图像(称为蒙版)。此蒙版的背景只有0 ，边框所覆盖的区域只有1 。

Original Image

原始图片

Mask of the bounding box

边框遮罩

Resize the mask to the required dimensions.
将蒙版调整为所需的尺寸。
Extract bounding box coordinates from the resized mask.
从调整大小的蒙版中提取边界框坐标。

Helper functions to create mask from bounding box, extract bounding box coordinates from a mask

Helper函数从边界框创建遮罩，从遮罩中提取边界框坐标

Function to resize an image, write to a new path, and get resized bounding box coordinates

用于调整图像大小，写入新路径并获取调整大小的边框坐标的函数

数据扩充 (Data Augmentation)

Data Augmentation is a technique to generalize our model better by creating new training images by using different variations of the existing images. We have only 800 images in our current training set, so data augmentation is very important to ensure our model doesn’t overfit.

数据增强是一种通过使用现有图像的不同变体来创建新的训练图像，从而更好地泛化模型的技术。当前的训练集中只有800张图像，因此数据增强对于确保模型不会过拟合非常重要。

For this problem, I’ve used flip, rotation, center crop and random crop. I’ve talked about various data augmentation techniques in this article:

对于这个问题，我使用了翻转，旋转，中心裁剪和随机裁剪。在本文中，我讨论了各种数据增强技术：

The only thing to remember here is ensuring that the bounding box is also transformed the same way as the image. To do this we follow the same approach as resizing — convert bounding box to a mask, apply the same transformations to the mask as the original image, and extract the bounding box coordinates.

这里唯一要记住的是确保边界框也以与图像相同的方式进行变换。为此，我们采用与调整大小相同的方法-将边界框转换为蒙版，对蒙版应用与原始图像相同的转换，然后提取边界框坐标。

Helper functions to center crop and random crop an image

辅助功能可对图像进行中心裁剪和随机裁剪

Transforming image and mask

变换图像和遮罩

Displaying bounding box

显示边界框

PyTorch数据集 (PyTorch Dataset)

Now that we have our data augmentations in place, we can do the train-validation split and create our PyTorch dataset. We normalize the images using ImageNet stats because we’re using a pre-trained ResNet model and apply data augmentations in our dataset while training.

现在我们已经有了数据扩充，我们可以进行训练验证拆分并创建我们的PyTorch数据集。我们使用ImageNet统计数据对图像进行归一化，因为我们使用的是预先训练的ResNet模型，并在训练时在数据集中应用数据增强。

train-valid split

训练有效拆分

Creating train and valid datasets

创建训练和有效数据集

Setting the batch size and creating data loaders

设置批处理大小并创建数据加载器

PyTorch模型 (PyTorch Model)

For the model, I’ve used a very simple pre-trained resNet-34 model. Since we have two tasks to accomplish here, there are two final layers — the bounding box regressor and the image classifier.

对于该模型，我使用了一个非常简单的经过预训练的resNet-34模型。由于我们要在此处完成两项任务，因此需要最后两层-边界框回归器和图像分类器。

训练 (Training)

For the loss, we need to take into both classification loss and the bounding box regression loss, so we use a combination of cross-entropy and L1-loss (sum of all the absolute differences between the true value and the predicted coordinates). I’ve scaled the L1-loss by a factor of 1000 because to have both the classification and regression losses in a similar range. Apart from this, it’s a standard PyTorch training loop (using a GPU):

对于损失，我们需要同时考虑分类损失和边界框回归损失，因此我们使用交叉熵和L1损失(真实值与预测坐标之间的所有绝对差之和)的组合。我将L1损失缩放了1000倍，因为分类损失和回归损失都在相似的范围内。除此之外，这是一个标准的PyTorch训练循环(使用GPU)：

对测试图像的预测 (Prediction on Test Images)

Now that we’re done with training, we can pick a random image and test our model on it. Even though we had a fairly small number of training images, we end up getting a pretty decent prediction on our test image.

现在我们已经完成了训练，我们可以选择随机图像并在其上测试模型。即使我们的训练图像数量很少，但最终还是会在测试图像上得到相当不错的预测。

It’ll be a fun exercise to take a real photo using your phone and test out the model. Another interesting experiment would be to not perform any data augmentations and train the model and compare the two models.

使用手机拍摄真实照片并测试模型将是一个有趣的练习。另一个有趣的实验是不执行任何数据扩充并训练模型并比较两个模型。

结论 (Conclusion)

Now that we’ve covered the fundamentals of object detection and implemented it from scratch, you can extend these ideas to the multi-object case and try out more complex models like RCNN and YOLO! Also, check out this super cool library called albumentations to perform data augmentations easily.

既然我们已经涵盖了对象检测的基础知识并从头开始实现了它，那么您可以将这些思想扩展到多对象案例，并尝试更复杂的模型，例如RCNN和YOLO！另外，请查看这个称为“ 专辑”的超酷库，以轻松执行数据扩充。