6.torchvision

最新推荐文章于 2023-03-21 17:40:00 发布

beebabo

最新推荐文章于 2023-03-21 17:40:00 发布

阅读量386

点赞数

分类专栏： Pytorch

Pytorch 专栏收录该内容

9 篇文章 2 订阅

订阅专栏

本文主要介绍了Torchvision库，包括其包含的流行数据集如MNIST、COCO和LSUN，以及数据加载器和转换。此外，还涵盖了预训练模型如AlexNet、ResNet等。Torchvision提供了一系列用于计算机视觉任务的图像变换，如随机裁剪、调整亮度和归一化。文章通过示例代码展示了如何使用这些工具进行图象识别。

摘要由CSDN通过智能技术生成

参考网站：PyTorch官网
推荐网站：Python图像处理PIL各模块详细介绍
今天心情有点躁乱，经历了ZH后从自我怀疑—发现问题—意识到问题大部分不在我—又烦又*—自我排遣—看穿一切的复杂心理过程后严重上火，起了两个水泡后我觉得不值得因为别人的话影响到自己的心态。我除了不自信加稍微傲娇以外还是对自己有比较客观的认识的。所以这一切不值得我去上火难受。所以还是静下来接着学习，然后写点东西吧。
图象识别时用到了torchvision库，因为之前对这个库的认识比较乱，所以还是写下来，配合官方文档自己总结总结，提高自己也能帮助别人。文档还是摘录官方文档，然后附加自己的理解与注释。
开始正文吧。

TORCHVISION

所以什么是TorchVision呢：
The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.
说白了就是百宝箱。都包括以下宝贝（本人能力不足又懒，不会整目录，所以就凑活看吧）
Package Reference

torchvision.datasets

MNIST
Fashion-MNIST
KMNIST
EMNIST
FakeData
COCO
LSUN
ImageFolder
DatasetFolder
Imagenet-12
CIFAR
STL10
SVHN
PhotoTour
SBU
Flickr
VOC
Cityscapes

torchvision.models

Alexnet
VGG
ResNet
SqueezeNet
DenseNet
Inception v3
GoogLeNet

torchvision.transforms

Transforms on PIL Image
Transforms on torch.*Tensor
Conversion Transforms
Generic Transforms
Functional Transforms

torchvision.utils

torchvision.get_image_backend()

Gets the name of the package used to load images

torchvision.set_image_backend(backend)

Specifies the package used to load images.

Parameters: backend (string) – Name of the image backend. one of {‘PIL’, ‘accimage’}. The accimage package uses the Intel IPP library. It is generally faster than PIL, but does not support as many operations.

TORCHVISION.DATASETS

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence（因此）, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers.
For example:

imagenet_data = torchvision.datasets.ImageFolder('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

1. MNIST/Fashion-MNIST/KMNIST

这三个用法相同

CLASS torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)

CLASS torchvision.datasets.FashionMNIST(root, train=True, transform=None, target_transform=None, download=False)

CLASS torchvision.datasets.KMNIST(root, train=True, transform=None, target_transform=None, download=False)

Dataset

Parameters:

root (string) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.
train (bool, optional) – If True, creates dataset from training.pt otherwise from test.pt.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

P.S. 以后再有相同的参数：如：transform,target_transform等就不特殊说明了。通用

2.EMNIST

与上面一样，多了一个split

CLASS torchvision.datasets.EMNIST(root, split, **kwargs)

加注：

split (string) – The dataset has 6 different splits: byclass, bymerge, balanced, letters, digits and mnist. This argument specifies which one to use.

3.FakeData

CLASS torchvision.datasets.FakeData(size=1000, image_size=(3, 224, 224), num_classes=10, transform=None, target_transform=None, random_offset=0)

A fake dataset that returns randomly generated images and returns them as PIL images

Parameters:

size (int, optional) – Size of the dataset. Default: 1000 images
image_size (tuple, optional) – Size if the returned images. Default: (3, 224, 224)
num_classes (int, optional) – Number of classes in the datset. Default: 10
random_offset (int) – Offsets the index-based random seed used to generate each image. Default: 0

4.COCO

These require the COCO API to be installed

包括：Captions，Detection

Captions

CLASS torchvision.datasets.CocoCaptions(root, annFile, transform=None, target_transform=None)

MS Coco Captions Dataset.
Parameters：

root (string) – Root directory where images are downloaded to.
annFile (string) – Path to json annotation file.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

Example

```
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)
```
**Output:**
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

附注：

__getitem__(index)

Parameters:	index (int) – Index
Returns:  Tuple (image, target). target is a list of captions for the image.
Return type: tuple

Detection

CLASS torchvision.datasets.CocoDetection(root, annFile, transform=None, target_transform=None)

MS Coco Detection Dataset.
Parameters：

同CoCoCaptions

5.LSUN

CLASS torchvision.datasets.LSUN(root, classes='train', transform=None, target_transform=None)

Parameters:

root (string) – Root directory for the database files.
classes (string or list) – One of {‘train’, ‘val’, ‘test’} or a list of categories to load. e,g. [‘bedroom_train’, ‘church_train’].
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

__getitem__(index)

Parameters: index (int) – Index
Returns: Tuple (image, target) where target is the index of the target category.
Return type: tuple

太多了不想整理了，内容用法都差不多。点击获取更多数据集
还有两个：

ImageFolder

这个博文有一些值得参考的地方：PyTorch—ImageFolder/自定义类读取图片数据

CLASS torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=<function default_loader>)

A generic data loader where the images are arranged in this way:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

Parameters:

root (string) – Root directory path.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
loader – A function to load an image given its path.
```
__getitem__(index)
```
Parameters: index (int) – Index
Returns: (sample, target) where target is class_index of the target class.
Return type: tuple

DatasetFolder

CLASS torchvision.datasets.DatasetFolder(root, loader, extensions, transform=None, target_transform=None)

A generic data loader where the samples are arranged in this way:

root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext

root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext

Parameters:

root (string) – Root directory path.
loader (callable) – A function to load a sample given its path.
extensions (list[string]) – A list of allowed extensions.
transform (callable, optional) – A function/transform that takes in a sample and returns a transformed version. E.g, transforms.RandomCrop for images.
target_transform – A function/transform that takes in the target and transforms it.
```
__getitem__(index)
```

index (int) – Index
Returns: (sample, target) where target is class_index of the target class.
Return type: tuple

TORCHVISION.MODELS

The models subpackage contains definitions for the following model architectures:

AlexNet
VGG
ResNet
SqueezeNet
DenseNet
I- nception v3
GoogLeNet

You can construct a model with random weights by calling its constructor:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()

We provide pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True：

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

for more detail about models,click here
具体的神经网络调用方法都一样，详细见?

TORCHVISION.TRANSFORMS

Transforms are common image transformations. They can be chained together using Compose. Additionally, there is the torchvision.transforms.functional module. Functional transforms give fine-grained control over the transformations. This is useful if you have to build a more complex transformation pipeline (e.g. in the case of segmentation tasks).

CLASS torchvision.transforms.Compose(transforms)

Composes several transforms together.上面函数就相当于一个容器，组合各种图数据集的修改转变函数

Parameters: transforms (list of Transform objects) – list of transforms to compose.
Example

>>> transforms.Compose([
>>>     transforms.CenterCrop(10),
>>>     transforms.ToTensor(),
>>> ])

Transforms on PIL Image

CLASS torchvision.transforms.CenterCrop(size)

Crops the given PIL Image at the center.#从中心（切割）
Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

CLASS torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)

Randomly change the brightness, contrast and saturation of an image. #改变图片亮度，对比度，饱和度

Parameters:

brightness (float or tuple of python:float (min, max)) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of python:float (min, max)) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of python:float (min, max)) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float or tuple of python:float (min, max)) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.#色调

CLASS torchvision.transforms.FiveCrop(size)

Crop the given PIL Image into four corners and the central crop

NOTE
This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop of size (size, size) is made.
Example：

>>> transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

CLASS torchvision.transforms.Pad(padding, fill=0, padding_mode='constant')

Pad the given PIL Image on all sides with the given “pad” value.

Parameters:

padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
fill (int or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.

constant: pads with a constant value, this value is specified with fill
edge: pads with the last value at the edge of the image
reflect: pads with reflection of image without repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
symmetric: pads with reflection of image repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

CLASS torchvision.transforms.Resize(size, interpolation=2)

Resize the input PIL Image to the given size.

Parameters:

size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR

Transforms on torch.*Tensor

CLASS torchvision.transforms.Normalize(mean, std, inplace=False)

Normalize a tensor image with mean and standard deviation. Given mean: (M1,…,Mn) and std: (S1,…,Sn) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]

NOTE
This transform acts out of place, i.e., it does not mutates the input tensor.

Parameters:

mean (sequence) – Sequence of means for each channel.
std (sequence) – Sequence of standard(标准差) deviations for each channel.
```
__call__(tensor)
```
- Parameters: tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
- Returns: Normalized Tensor image.
- Return type: Tensor

Conversion Transforms

CLASS torchvision.transforms.ToPILImage(mode=None)

Convert a tensor or an ndarray to PIL Image.

Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range.

Parameters:
mode (PIL.Image mode) –
color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data:

If the input has 4 channels, the mode is assumed to be RGBA.
If the input has 3 channels, the mode is assumed to be RGB.
If the input has 2 channels, the mode is assumed to be LA.
If the input has 1 channel, the mode is determined by the data type (i.e int, float,
short).
```
__call__(pic)
```
- Parameters: pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image.
- Returns: Image converted to PIL Image.
- Return type: PIL Image

CLASS torchvision.transforms.ToTensor

Convert a PIL Image or numpy.ndarray to tensor.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

In the other cases, tensors are returned without scaling.

__call__(pic)

Parameters: pic (PIL Image or numpy.ndarray) – Image to be converted to tensor.
Returns: Converted image.
Return type: Tensor

Functional Transforms

Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform. For example, you can apply a functional transform to multiple images like this:
就是说，索然Functional.Transforms与上面很多函数功能是类似的，但是不同之处在于Functional.Transforms并不能直接生成随机数，所以随机数要手动设置，不过设置的随机数是可以复用的。看下面这个例子就明白了：

import torchvision.transforms.functional as TF
import random

def my_segmentation_transforms(image, segmentation):
    if random.random() > 5:
        angle = random.randint(-30, 30)
        image = TF.rotate(image, angle)
        segmentation = TF.rotate(segmentation, angle)
    # more transforms ...
    return image, segmentation

具体常用函数：

torchvision.transforms.functional.adjust_brightness(img, brightness_factor)

Adjust brightness of an Image.

Parameters:

img (PIL Image) – PIL Image to be adjusted.
brightness_factor (float) – How much to adjust the brightness. Can be any non negative number. 0 gives a black image, 1 gives the original image while 2 increases the brightness by a factor of 2.

Returns:
Brightness adjusted image.

Return type:
PIL Image

torchvision.transforms.functional.crop(img, i, j, h, w)

Crop the given PIL Image.

Parameters:

img (PIL Image) – Image to be cropped.
i – Upper pixel coordinate.
j – Left pixel coordinate.
h – Height of the cropped image.
w – Width of the cropped image.

Returns:
Cropped image.

Return type:
PIL Image

torchvision.transforms.functional.normalize(tensor, mean, std, inplace=False)

Normalize a tensor image with mean and standard deviation.

NOTE
This transform acts out of place by default, i.e., it does not mutates the input tensor.

See Normalize for more details.（和上面的Normalize的功能一样的）

Parameters:

tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
mean (sequence) – Sequence of means for each channel.
std (sequence) – Sequence of standard deviations for each channely.

Returns:
Normalized Tensor image.

Return type:
Tensor

torchvision.transforms.functional.pad(img, padding, fill=0, padding_mode='constant')

Pad the given PIL Image on all sides with specified padding mode and fill value.

Parameters:

img (PIL Image) – Image to be padded.
padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
fill – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant
padding_mode –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value on the edge of the image
- reflect: pads with reflection of image (without repeating the last value on the edge)
  padding [1, 2, 3, 4] with 2 elements on both sides in reflect
  mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image (repeating the last value on the edge)
  padding [1, 2, 3, 4] with 2 elements on both sides in symmetric
  mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

Returns:
Padded image.

Return type:
PIL Image

torchvision.transforms.functional.resize(img, size, interpolation=2)

Resize the input PIL Image to the given size.

Parameters:

img (PIL Image) – Image to be resized.
size (sequence or int) – Desired output size. If size is a sequence like $(h, w)$ , the output size will be matched to this. If size is an int, the smaller edge of the image will be matched to this number maintaing the aspect ratio. i.e, if height > width, then image will be rescaled to $\left(\text{size} \times \frac{\text{height}}{\text{width}}, \text{size} \right)$
interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR

Returns:
Resized image.

Return type:
PIL Image

torchvision.transforms.functional.rotate(img, angle, resample=False, expand=False, center=None) #旋转功能

Rotate the image by angle.

Parameters:

img (PIL Image) – PIL Image to be rotated.
angle (float or int) – In degrees degrees counter clockwise order.
resample (PIL.Image.NEAREST or PIL.Image.BILINEAR or PIL.Image.BICUBIC, optional) – An optional resampling filter. See filters for more information. If omitted, or if the image has mode “1” or “P”, it is set to PIL.Image.NEAREST.
expand (bool, optional) – Optional expansion flag. If true, expands the output image to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation.
center (2-tuple, optional) – Optional center of rotation. Origin is the upper left corner. Default is the center of the image.

torchvision.transforms.functional.to_grayscale(img, num_output_channels=1) #转灰度

Convert image to grayscale version of image.

Parameters: img (PIL Image) – Image to be converted to grayscale.
Returns:
Grayscale version of the image:
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b

Return type: PIL Image

torchvision.transforms.functional.to_pil_image(pic, mode=None)

Convert a tensor or an ndarray to PIL Image.

See ToPILImage for more details.

Parameters:

pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image.
mode (PIL.Image mode) – color space and pixel depth of input data (optional).

Returns: Image converted to PIL Image.
Return type: PIL Image

torchvision.transforms.functional.to_tensor(pic)

Convert a PIL Image or numpy.ndarray to tensor.

See ToTensor for more details.

Parameters: pic (PIL Image or numpy.ndarray) – Image to be converted to tensor.
Returns: Converted image.
Return type: Tensor

总之，很多函数功能又重叠，我就挑了比较常见的整理整理，更多的函数直接看官方文档比较好。文档在最上面贴着

beebabo

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录