PyTorch又一个利好消息！torchvision 0.3发布了！

最新推荐文章于 2024-08-08 15:51:54 发布

ronghuaiyang

最新推荐文章于 2024-08-08 15:51:54 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/u011984148/article/details/99440217

版权

点击上方“AI公园”，关注公众号，选择加“星标“或“置顶”

作者：Francisco Massa

编译：ronghuaiyang

导读

torchvision 0.3中提供了更多的分割，检测模型，更多的数据集，还有更多新特性，喜欢用PyTorch的小伙伴们一定不能错过！

torchvision这样的PyTorch库提供了对公共数据集和模型的方便访问，可以用来快速创建最先进的基线。此外，它们还提供通用的抽象类，以减少用户重复编写的样板代码的工作。torchvision 0.3版本带来了几个新特性，包括用于语义分割、对象检测、实例分割和人关键点检测的模型，以及特定于计算机视觉的自定义c++ / CUDA操作。

640?wx_fmt=png

新特性包括：

参考训练/评估脚本：torchvision现在在references/文件夹下提供了用于训练和评估以下任务的脚本：分类、语义分割、物体检测、实例分割和人关键点检测。这些日志记录了如何训练特定的模型，并提供了基线训练和评估脚本来快速引导研究。

torchvision操作符： torchvision现在包含自定义c++ / CUDA操作符。这些操作符是特定于计算机视觉的，可以更容易地构建目标检测模型。这些操作符目前不支持PyTorch脚本模式，但是计划在下一个版本中支持它。所支持的一些操作包括：

roi_pool (以及模型版本的RoIPool)
roi_align (以及模型版本的RoIAlign)
nms，包围框的非极大值抑制
box_iou，计算两组包围框的交并比
box_area，计算一组包围框的面积

下面是使用torchvision操作符的几个例子：

import torch	
import torchvision	
# create 10 random boxes	
boxes = torch.rand(10, 4) * 100	
# they need to be in [x0, y0, x1, y1] format	
boxes[:, 2:] += boxes[:, :2]	
# create a random image	
image = torch.rand(1, 3, 200, 200)	
# extract regions in `image` defined in `boxes`, rescaling	
# them to have a size of 3x3	
pooled_regions = torchvision.ops.roi_align(image, [boxes], output_size=(3, 3))	
# check the size	
print(pooled_regions.shape)	
# torch.Size([10, 3, 3, 3])	
# or compute the intersection over union between	
# all pairs of boxes	
print(torchvision.ops.box_iou(boxes, boxes).shape)	
# torch.Size([10, 10])

新的模型和数据集： torchvision现在增加了对物体检测、实例分割和人关键点检测模型的支持。此外，还添加了几个流行的数据集。注意：该API目前处于试验阶段，可能会在torchvision的未来版本中进行更改。新模型包括：

分割模型

0.3版还包含了对图像进行密集像素预测的模型。它添加了FCN和DeepLabV3分割模型，使用了一个ResNet50和ResNet101骨干网络。ResNet101骨干网络的预训练权重是可用的，并已在COCO train2017的子集上进行了训练，其中包含与Pascal VOC相同的20个类别。

经过预训练的模型给出了以下关于COCO val2017子集的结果，其中包含与Pascal VOC相同的20个类别：

Network	mean IoU	global pixelwise acc
FCN ResNet101	63.7	91.9
DeepLabV3 ResNet101	67.4	92.4

检测模型

Network	box AP	mask AP	keypoint AP
Faster R-CNN ResNet-50 FPN trained on COCO	37.0
Mask R-CNN ResNet-50 FPN trained on COCO	37.9	34.6
Keypoint R-CNN ResNet-50 FPN trained on COCO	54.6		65.0

特别是在训练过程中，目标检测、实例分割和关键点检测模型的实现速度较快。

在下表中，我们使用8个V100 gpu，CUDA 10.0和CUDNN 7.4报告结果。在训练中，我们使用每个GPU的批大小为2，在测试中使用批大小为1。

对于测试时间，我们报告了模型评估和后处理的时间(包括图像中的掩模粘贴)，但没有报告计算精度-召回率的时间。

Network	train time (s / it)	test time (s / it)	memory (GB)
Faster R-CNN ResNet-50 FPN	0.2288	0.0590	5.2
Mask R-CNN ResNet-50 FPN	0.2728	0.0903	5.4
Keypoint R-CNN ResNet-50 FPN	0.3789	0.1242	6.8

你可以用几行代码加载和使用预训练的检测和分割模型

import torchvision	
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)	
# set it to evaluation mode, as the model behaves differently	
# during training and during evaluation	
model.eval()	
image = PIL.Image.open('/path/to/an/image.jpg')	
image_tensor = torchvision.transforms.functional.to_tensor(image)	
# pass a list of (potentially different sized) tensors	
# to the model, in 0-1 range. The model will take care of	
# batching them together and normalizing	
output = model([image_tensor])	
# output is a list of dict, containing the postprocessed predictions