torchvision的理解和学习

转自https://blog.csdn.net/tsq292978891/article/details/79403617

备份自用,不喜勿喷

torchvision在pypi上的文档介绍 
PyTorch 0.3.0 中文文档

简介: torchvision包是服务于pytorch深度学习框架的,用来生成图片,视频数据集,和一些流行的模型类和预训练模型. 
torchvision由以下四个部分组成: 
1. torchvision.datasets : Data loaders for popular vision datasets 
2. torchvision.models : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models. 
3. torchvision.transforms : Common image transformations such as random crop, rotations etc. 
4. torchvision.utils : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc.

下面分别介绍

第一部分: torchvision.datasets
torchvision.datasets是继承torch.utils.data.Dataset的子类. 因此,可以使用torch.utils.data.DataLoader对它们进行多线程处理(python multiprocessing) 
比如:
 

torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)

torchvision.datasets可能需要transform和target_transform参数,关于二者的解释如下:

transform - a function that takes in an image and returns a transformed version
common stuff like ToTensor, RandomCrop, etc. These can be composed together with transforms.Compose (see transforms section below) 输入原始图片,返回转换后的图片
target_transform - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices. 输入为 target, 返回转换后的 target
torchvision.datasets包括以下内容: 
MNIST 
COCO (Captioning and Detection) 
LSUN Classification 
ImageFolder 
Imagenet-12 
CIFAR10 and CIFAR100 
STL10 
SVHN 
PhotoTour 
其中,ImageFolder是一种data loader.图片以下面的方式存放: 
root/dog/xxx.png 
root/dog/xxy.png 
root/dog/xxz.png

root/cat/123.png 
root/cat/nsdf3.png 
root/cat/asd932_.png # 不同类别的图片放在各自的文件夹下

dset.ImageFolder(root=”root folder path”, [transform, target_transform]) 
然后,ImageFolder类有下面三个成员属性: 
(1) self.classes - The class names as a list (类别名字列表) 
(2) self.class_to_idx - Corresponding class indices (类别对应的序号) 
(3) self.imgs - The list of (image path, class-index) tuples (图片路径+类别序号组成的元组)

第二部分:torchvision.models
torchvision.models包含下列模型的定义:

AlexNet: AlexNet variant from the “One weird trick” paper.
VGG: VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization)
ResNet: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152
SqueezeNet: SqueezeNet 1.0, and SqueezeNet 1.1
使用方式1:构建一个模型,随机初始化参数

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
1
2
3
4
5
使用方式2:构建一个模型,使用预训练的模型进行参数初始化. 
We provide pre-trained models for the ResNet variants, SqueezeNet 1.0 and 1.1, and AlexNet, using the PyTorch model zoo. These can be constructed by passing pretrained=True. 
有预训练模型的网络有:ResNet variants, SqueezeNet 1.0 and 1.1, and AlexNet.构建预训练模型使用了torch.utils.model_zoo.设置pretrained=True

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
1
2
3
4
注:这些pre-trained models要求输入图片格式如下: 
1. 像素值范围[0, 1],normalized,mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225] 
2. mini-batches RGB images,shape (3 x H x W),H和W至少224. (输入图片: NCHW)

imagenet推荐的normalization例子:

Data loading code
traindir = os.path.join(args.data, 'train')
valdir = os.path.join(args.data, 'val')
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

train_loader = torch.utils.data.DataLoader(
    datasets.ImageFolder(traindir, transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize,
    ])),
    batch_size=args.batch_size, shuffle=True,
    num_workers=args.workers, pin_memory=True)

val_loader = torch.utils.data.DataLoader(
    datasets.ImageFolder(valdir, transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize,
    ])),
    batch_size=args.batch_size, shuffle=False,
    num_workers=args.workers, pin_memory=True)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
第三部分: torchvision.transforms
torchvision.transforms包含了常见的图像变化(预处理)操作.这些变化可以用torchvision.transforms.Compose链接在一起. 
torchvision.transforms中的变化, 可以分为以下几类: 
一: Transforms on PIL.Image 
1. Scale(size, interpolation=Image.BILINEAR) 
2. CenterCrop(size) - center-crops the image to the given size 
3. RandomCrop(size, padding=0) 
4. RandomHorizontalFlip() 
5. RandomSizedCrop(size, interpolation=Image.BILINEAR) 
6. Pad(padding, fill=0)

二: Transforms on torch.*Tensor 
1. Normalize(mean, std) 
作用: Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

三: Conversion Transforms 数据格式转换操作 
1. ToTensor() 
作用: Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

四: Generic Transforms 一般的变化操作 
1. Lambda(lambda) # 自己定义一个python lambda表达式, applies it to the input img and returns it. 
举例: transforms.Lambda(lambda x: x.add(10))  # 将每个像素值加10

第四部分: torchvision.utils
utils嘛, 就是一些工具. 好像目前只有两个. 
1. torchvision.utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False) 
作用: 输入4D mini-batch Tensor of shape (B x C x H x W)或者a list of images all of the same size, 然后用这些图片生成一个大的图片.(图片中每格子为单张图片) 
normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value. 
if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image. 
scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.

一个例子:

import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
import numpy as np
import random

%matplotlib inline
def show(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1,2,0)), interpolation='nearest')

import scipy.misc

lena = scipy.misc.face()
img = transforms.ToTensor()(lena)
print(img.size())

torch.Size([3, 768, 1024])
imglist = [img, img, img, img.clone().fill_(-10)]

show(make_grid(imglist, padding=100))

show(make_grid(imglist, padding=100, normalize=True))

show(make_grid(imglist, padding=100, normalize=True, range=(0, 1)))

show(make_grid(imglist, padding=100, normalize=True, range=(0, 0.5)))

show(make_grid(imglist, padding=100, normalize=True, scale_each=True))

show(make_grid(imglist, padding=100, normalize=True, range=(0, 0.5), scale_each=True))

torchvision.utils.save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False) 
作用: 将输入的Tensor保存为image file. 如果输入的是mini-batch tensor, 则会保存a grid of images. All options after filename are passed through to make_grid.

  • 9
    点赞
  • 42
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值