参考网站:PyTorch官网
推荐网站:Python图像处理PIL各模块详细介绍
今天心情有点躁乱,经历了ZH后从自我怀疑—发现问题—意识到问题大部分不在我—又烦又*—自我排遣—看穿一切的复杂心理过程后严重上火,起了两个水泡后我觉得不值得因为别人的话影响到自己的心态。我除了不自信加稍微傲娇以外还是对自己有比较客观的认识的。所以这一切不值得我去上火难受。所以还是静下来接着学习,然后写点东西吧。
图象识别时用到了torchvision库,因为之前对这个库的认识比较乱,所以还是写下来,配合官方文档自己总结总结,提高自己也能帮助别人。文档还是摘录官方文档,然后附加自己的理解与注释。
开始正文吧。
TORCHVISION
所以什么是TorchVision呢:
The torchvision
package consists of popular datasets, model architectures, and common image transformations for computer vision.
说白了就是百宝箱。都包括以下宝贝(本人能力不足又懒,不会整目录,所以就凑活看吧)
Package Reference
torchvision.datasets
- MNIST
- Fashion-MNIST
- KMNIST
- EMNIST
- FakeData
- COCO
- LSUN
- ImageFolder
- DatasetFolder
- Imagenet-12
- CIFAR
- STL10
- SVHN
- PhotoTour
- SBU
- Flickr
- VOC
- Cityscapes
torchvision.models
- Alexnet
- VGG
- ResNet
- SqueezeNet
- DenseNet
- Inception v3
- GoogLeNet
torchvision.transforms
- Transforms on PIL Image
- Transforms on torch.*Tensor
- Conversion Transforms
- Generic Transforms
- Functional Transforms
torchvision.utils
torchvision.get_image_backend()
Gets the name of the package used to load images
torchvision.set_image_backend(backend)
Specifies the package used to load images.
Parameters: backend (string) – Name of the image backend. one of {‘PIL’, ‘accimage’}. The accimage package uses the Intel IPP library. It is generally faster than PIL, but does not support as many operations.
TORCHVISION.DATASETS
All datasets are subclasses of torch.utils.data.Dataset
i.e, they have __getitem__
and __len__
methods implemented. Hence(因此), they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples parallelly using torch.multiprocessing workers.
For example:
imagenet_data = torchvision.datasets.ImageFolder('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
batch_size=4,
shuffle=True,
num_workers=args.nThreads)
All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.
1. MNIST/Fashion-MNIST/KMNIST
这三个用法相同
CLASS torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)
CLASS torchvision.datasets.FashionMNIST(root, train=True, transform=None, target_transform=None, download=False)
CLASS torchvision.datasets.KMNIST(root, train=True, transform=None, target_transform=None, download=False)
Dataset
Parameters:
- root (string) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.
- train (bool, optional) – If True, creates dataset from training.pt otherwise from test.pt.
- download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
- target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
P.S. 以后再有相同的参数:如:transform,target_transform等就不特殊说明了。通用
2.EMNIST
与上面一样,多了一个split
CLASS torchvision.datasets.EMNIST(root, split, **kwargs)
加注:
- split (string) – The dataset has 6 different splits: byclass, bymerge, balanced, letters, digits and mnist. This argument specifies which one to use.
3.FakeData
CLASS torchvision.datasets.FakeData(size=1000, image_size=(3, 224, 224), num_classes=10, transform=None, target_transform=None, random_offset=0)
A fake dataset that returns randomly generated images and returns them as PIL images
Parameters:
- size (int, optional) – Size of the dataset. Default: 1000 images
- image_size (tuple, optional) – Size if the returned images. Default: (3, 224, 224)
- num_classes (int, optional) – Number of classes in the datset. Default: 10
- random_offset (int) – Offsets the index-based random seed used to generate each image. Default: 0
4.COCO
These require the COCO API to be installed
包括:Captions,Detection
Captions
CLASS torchvision.datasets.CocoCaptions(root, annFile, transform=None, target_transform=None)
MS Coco Captions Dataset.
Parameters:
- root (string) – Root directory where images are downloaded to.
- annFile (string) – Path to json annotation file.
- transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.ToTensor
Example
```
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
annFile = 'json annotation file',
transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample
print("Image Size: ", img.size())
print(target)
```
**Output:**
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
附注:
__getitem__(index)
Parameters: index (int) – Index
Returns: Tuple (image, target). target is a list of captions for the image.
Return type: tuple
Detection
CLASS torchvision.datasets.CocoDetection(root, annFile, transform=None, target_transform=None)
MS Coco Detection Dataset.
Parameters:
- 同CoCoCaptions
5.LSUN
CLASS torchvision.datasets.LSUN(root, classes='train', transform=None, target_transform=None)
Parameters:
- root (string) – Root directory for the database files.
- classes (string or list) – One of {‘train’, ‘val’, ‘test’} or a list of categories to load. e,g. [‘bedroom_train’, ‘church_train’].
- transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
- target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
__getitem__(index)
- Parameters: index (int) – Index
- Returns: Tuple (image, target) where target is the index of the target category.
- Return type: tuple
太多了不想整理了,内容用法都差不多。点击获取更多数据集
还有两个:
ImageFolder
这个博文有一些值得参考的地方:PyTorch—ImageFolder/自定义类 读取图片数据
CLASS torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=<function default_loader>)
A generic data loader where the images are arranged in this way:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
Parameters:
-
root (string) – Root directory path.
-
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
-
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
-
loader – A function to load an image given its path.
__getitem__(index)
Parameters: index (int) – Index
Returns: (sample, target) where target is class_index of the target class.
Return type: tuple
DatasetFolder
CLASS torchvision.datasets.DatasetFolder(root, loader, extensions, transform=None, target_transform=None)
A generic data loader where the samples are arranged in this way:
root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext
root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext
Parameters:
-
root (string) – Root directory path.
-
loader (callable) – A function to load a sample given its path.
-
extensions (list[string]) – A list of allowed extensions.
-
transform (callable, optional) – A function/transform that takes in a sample and returns a transformed version. E.g, transforms.RandomCrop for images.
-
target_transform – A function/transform that takes in the target and transforms it.
__getitem__(index)
index (int) – Index
Returns: (sample, target) where target is class_index of the target class.
Return type: tuple
TORCHVISION.MODELS
The models subpackage contains definitions for the following model architectures:
- AlexNet
- VGG
- ResNet
- SqueezeNet
- DenseNet
- I- nception v3
- GoogLeNet
You can construct a model with random weights by calling its constructor:
import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
We provide pre-trained models, using the PyTorch torch.utils.model_zoo
. These can be constructed by passing pretrained=True
:
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url()
for details.
Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train()
or model.eval()
as appropriate. See train() or eval() for details.
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
You can use the following transform to normalize:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
for more detail about models,click here
具体的神经网络调用方法都一样,详细见?
TORCHVISION.TRANSFORMS
Transforms are common image transformations. They can be chained together using Compose. Additionally, there is the torchvision.transforms.functional
module. Functional transforms give fine-grained control over the transformations. This is useful if you have to build a more complex transformation pipeline (e.g. in the case of segmentation tasks).
CLASS torchvision.transforms.Compose(transforms)
Composes several transforms together.上面函数就相当于一个容器,组合各种图数据集的修改转变函数
Parameters: transforms (list of Transform objects) – list of transforms to compose.
Example
>>> transforms.Compose([
>>> transforms.CenterCrop(10),
>>> transforms.ToTensor(),
>>> ])
Transforms on PIL Image
CLASS torchvision.transforms.CenterCrop(size)
Crops the given PIL Image at the center.#从中心(切割)
Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
CLASS torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
Randomly change the brightness, contrast and saturation of an image. #改变图片亮度,对比度,饱和度
Parameters:
- brightness (float or tuple of python:float (min, max)) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
- contrast (float or tuple of python:float (min, max)) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
- saturation (float or tuple of python:float (min, max)) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
- hue (float or tuple of python:float (min, max)) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.#色调
CLASS torchvision.transforms.FiveCrop(size)
Crop the given PIL Image into four corners and the central crop
NOTE
This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.
Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop of size (size, size) is made.
Example:
>>> transform = Compose([
>>> FiveCrop(size), # this is a list of PIL Images
>>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
CLASS torchvision.transforms.Pad(padding, fill=0, padding_mode='constant')
Pad the given PIL Image on all sides with the given “pad” value.
Parameters:
- padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
- fill (int or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant
- padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.constant: pads with a constant value, this value is specified with fill
edge: pads with the last value at the edge of the image
reflect: pads with reflection of image without repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
symmetric: pads with reflection of image repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
CLASS torchvision.transforms.Resize(size, interpolation=2)
Resize the input PIL Image to the given size.
Parameters:
- size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
- interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
Transforms on torch.*Tensor
CLASS torchvision.transforms.Normalize(mean, std, inplace=False)
Normalize a tensor image with mean and standard deviation. Given mean: (M1,…,Mn) and std: (S1,…,Sn) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]
NOTE
This transform acts out of place, i.e., it does not mutates the input tensor.
Parameters:
-
mean (sequence) – Sequence of means for each channel.
-
std (sequence) – Sequence of standard(标准差) deviations for each channel.
__call__(tensor)
- Parameters: tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
- Returns: Normalized Tensor image.
- Return type: Tensor
Conversion Transforms
CLASS torchvision.transforms.ToPILImage(mode=None)
Convert a tensor or an ndarray to PIL Image.
Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range.
Parameters:
mode (PIL.Image mode) –
color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data:
-
If the input has 4 channels, the mode is assumed to be RGBA.
-
If the input has 3 channels, the mode is assumed to be RGB.
-
If the input has 2 channels, the mode is assumed to be LA.
-
If the input has 1 channel, the mode is determined by the data type (i.e int, float,
short).__call__(pic)
- Parameters: pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image.
- Returns: Image converted to PIL Image.
- Return type: PIL Image
CLASS torchvision.transforms.ToTensor
Convert a PIL Image or numpy.ndarray to tensor.
Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8
In the other cases, tensors are returned without scaling.
__call__(pic)
- Parameters: pic (PIL Image or numpy.ndarray) – Image to be converted to tensor.
- Returns: Converted image.
- Return type: Tensor
Functional Transforms
Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform. For example, you can apply a functional transform to multiple images like this:
就是说,索然Functional.Transforms与上面很多函数功能是类似的,但是不同之处在于Functional.Transforms并不能直接生成随机数,所以随机数要手动设置,不过设置的随机数是可以复用的。看下面这个例子就明白了:
import torchvision.transforms.functional as TF
import random
def my_segmentation_transforms(image, segmentation):
if random.random() > 5:
angle = random.randint(-30, 30)
image = TF.rotate(image, angle)
segmentation = TF.rotate(segmentation, angle)
# more transforms ...
return image, segmentation
具体常用函数:
torchvision.transforms.functional.adjust_brightness(img, brightness_factor)
Adjust brightness of an Image.
Parameters:
- img (PIL Image) – PIL Image to be adjusted.
- brightness_factor (float) – How much to adjust the brightness. Can be any non negative number. 0 gives a black image, 1 gives the original image while 2 increases the brightness by a factor of 2.
Returns:
Brightness adjusted image.
Return type:
PIL Image
torchvision.transforms.functional.crop(img, i, j, h, w)
Crop the given PIL Image.
Parameters:
- img (PIL Image) – Image to be cropped.
- i – Upper pixel coordinate.
- j – Left pixel coordinate.
- h – Height of the cropped image.
- w – Width of the cropped image.
Returns:
Cropped image.
Return type:
PIL Image
torchvision.transforms.functional.normalize(tensor, mean, std, inplace=False)
Normalize a tensor image with mean and standard deviation.
NOTE
This transform acts out of place by default, i.e., it does not mutates the input tensor.
See Normalize for more details.(和上面的Normalize的功能一样的)
Parameters:
- tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
- mean (sequence) – Sequence of means for each channel.
- std (sequence) – Sequence of standard deviations for each channely.
Returns:
Normalized Tensor image.
Return type:
Tensor
torchvision.transforms.functional.pad(img, padding, fill=0, padding_mode='constant')
Pad the given PIL Image on all sides with specified padding mode and fill value.
Parameters:
-
img (PIL Image) – Image to be padded.
padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. -
fill – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant
-
padding_mode –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value on the edge of the image
- reflect: pads with reflection of image (without repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in reflect
mode will result in [3, 2, 1, 2, 3, 4, 3, 2] - symmetric: pads with reflection of image (repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in symmetric
mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
Returns:
Padded image.
Return type:
PIL Image
torchvision.transforms.functional.resize(img, size, interpolation=2)
Resize the input PIL Image to the given size.
Parameters:
-
img (PIL Image) – Image to be resized.
-
size (sequence or int) – Desired output size. If size is a sequence like ( h , w ) (h, w) (h,w), the output size will be matched to this. If size is an int, the smaller edge of the image will be matched to this number maintaing the aspect ratio. i.e, if height > width, then image will be rescaled to ( size × height width , size ) \left(\text{size} \times \frac{\text{height}}{\text{width}}, \text{size} \right) (size×widthheight,size)
-
interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
Returns:
Resized image.
Return type:
PIL Image
torchvision.transforms.functional.rotate(img, angle, resample=False, expand=False, center=None) #旋转功能
Rotate the image by angle.
Parameters:
- img (PIL Image) – PIL Image to be rotated.
- angle (float or int) – In degrees degrees counter clockwise order.
- resample (PIL.Image.NEAREST or PIL.Image.BILINEAR or PIL.Image.BICUBIC, optional) – An optional resampling filter. See filters for more information. If omitted, or if the image has mode “1” or “P”, it is set to PIL.Image.NEAREST.
- expand (bool, optional) – Optional expansion flag. If true, expands the output image to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation.
- center (2-tuple, optional) – Optional center of rotation. Origin is the upper left corner. Default is the center of the image.
torchvision.transforms.functional.to_grayscale(img, num_output_channels=1) #转灰度
Convert image to grayscale version of image.
Parameters: img (PIL Image) – Image to be converted to grayscale.
Returns:
Grayscale version of the image:
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
Return type: PIL Image
torchvision.transforms.functional.to_pil_image(pic, mode=None)
Convert a tensor or an ndarray to PIL Image.
See ToPILImage for more details.
Parameters:
- pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image.
- mode (PIL.Image mode) – color space and pixel depth of input data (optional).
Returns: Image converted to PIL Image.
Return type: PIL Image
torchvision.transforms.functional.to_tensor(pic)
Convert a PIL Image or numpy.ndarray to tensor.
See ToTensor for more details.
Parameters: pic (PIL Image or numpy.ndarray) – Image to be converted to tensor.
Returns: Converted image.
Return type: Tensor
总之,很多函数功能又重叠,我就挑了比较常见的整理整理,更多的函数直接看官方文档比较好。文档在最上面贴着