torchvision.transforms —— 图像预处理包

不负初见

已于 2023-09-04 20:15:03 修改

阅读量276

点赞数

文章标签： python 人工智能开发语言

于 2023-09-04 20:10:51 首次发布

本文链接：https://blog.csdn.net/qq_53102746/article/details/132654629

版权

一、裁剪

1.中心裁剪：transforms.CenterCrop(size)

2.随机裁剪：transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant') size：期望随机裁剪之后输出的尺寸 padding：填充边界的值，单个（int）,两个（[左/右，上/下]），四个（各个边界） pad_if_needed :bool值，避免数组越界 fill:填充 padding_mode ：填充模式 “constant”:利用常值进行填充 “edge”:利用图像边缘像素点进行填充 “reflect”：利用反射的方式进行填充[1, 2, 3, 4] 》[3, 2, 1, 2, 3, 4, 3, 2] “symmetric”：对称填充方法[1, 2, 3, 4] 》》[2, 1, 1, 2, 3, 4, 4, 3]

3.随机长宽比裁剪：transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(3. /4., 4. /3.), interpolation=InterpolationMode.BILINEAR)

size：期望输出的尺寸大小

scale：在调整大小之前，被裁剪图像相对于原始图像的缩放范围

ratio：调整大小前裁剪图像的宽高比范围

interpolation：插值方法

4.上下左右中心裁剪：transforms.FiveCrop(size)

可以得到5张图像

from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open('a.jpg')
img0 = transforms.Resize((220,220))(img)
# img0.show()
img1 = transforms.FiveCrop((110,110))(img0)
axs = plt.figure().subplots(1, 6)
axs[0].imshow(img0);axs[0].set_title('src');axs[0].axis('off')
axs[1].imshow(img1[0]);axs[1].set_title('1');axs[1].axis('off')
axs[2].imshow(img1[1]);axs[2].set_title('2');axs[2].axis('off')
axs[3].imshow(img1[2]);axs[3].set_title('3');axs[3].axis('off')
axs[4].imshow(img1[3]);axs[4].set_title('4');axs[4].axis('off')
axs[5].imshow(img1[4]);axs[5].set_title('5');axs[5].axis('off')
plt.show()

输出为：

5.上下左右中心裁剪后翻转，transforms.TenCrop(size)

可以得到10张图像

from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open('a.jpg')
img0 = transforms.Resize((220,220))(img)
# img0.show()
# img1 = transforms.FiveCrop((110,110))(img0)
img1 = transforms.TenCrop((110,110))(img0)
axs = plt.figure().subplots(2, 6)
axs[0,0].imshow(img0);axs[0,0].set_title('src');axs[0,0].axis('off')
axs[0,1].imshow(img1[0]);axs[0,1].set_title('1');axs[0,1].axis('off')
axs[0,2].imshow(img1[1]);axs[0,2].set_title('2');axs[0,2].axis('off')
axs[0,3].imshow(img1[2]);axs[0,3].set_title('3');axs[0,3].axis('off')
axs[0,4].imshow(img1[3]);axs[0,4].set_title('4');axs[0,4].axis('off')
axs[0,5].imshow(img1[4]);axs[0,5].set_title('5');axs[0,5].axis('off')
axs[1,0].axis('off')
axs[1,1].imshow(img1[5]);axs[1,1].set_title('6');axs[1,1].axis('off')
axs[1,2].imshow(img1[6]);axs[1,2].set_title('7');axs[1,2].axis('off')
axs[1,3].imshow(img1[7]);axs[1,3].set_title('8');axs[1,3].axis('off')
axs[1,4].imshow(img1[8]);axs[1,4].set_title('9');axs[1,4].axis('off')
axs[1,5].imshow(img1[9]);axs[1,5].set_title('10');axs[1,5].axis('off')
plt.show()

输出为：

二、翻转和旋转

1.依概率p水平翻转 transforms.RandomHorizontalFlip(p=0.5) 以给定的概率随机垂直翻转图像

2.依概率p垂直翻转transforms.RandomVerticalFlip(p=0.5)

3.随机旋转：transforms.RandomRotation(degrees, interpolation=<InterpolationMode.NEAREST: 'nearest'>, expand=False, center=None, fill=0)

degrees:表示随机旋转一定角度，若为单个数，如 30，则表示在（-30，+30）之间随机旋转，若为sequence，如(30，60)，则表示在30-60度之间随机旋转

expand：可选的扩展标志。如果为true，则扩展输出以使其足够大以容纳整个旋转图像。如果为false或省略，则使输出图像与输入图像大小相同。

center：可选的旋转中心，(x, y)。原点在左上角，默认是图像的中心

fill：旋转区域外的填充值。默认为' ' 0 ' '。如果给定一个数字，则该值分别用于所有波段。

三、图像变换

1. 将输入图像调整为给定的大小：transforms.Resize(size, interpolation=2, max_size=None)
size:期望输出大小。如果size是(h, w)这样的序列，则输出size将与此匹配。如果size为int，图像的较小边缘将匹配此数字。
interpolation:插值方法，由torchvision.transforms.InterpolationMode定义的期望插值枚举。默认为InterpolationMode.BILINEAR。如果输入是张量，只有InterpolationMode
max_size：调整后图像的长边允许的最大值:如果根据size调整后图像的长边大于max size，则再次调整图像，使长边等于max size。因此，size可能被否决，即较小的边可能比大小短。这只在size为int(或在torchscript模式下长度为1的序列)时才支持。

2.将数据进行归一化：transforms.ToTensor()

只将数据归一化到[0,1]（即数据除以255），也会会把H*W*C会变成C *H *W(格式为(h,w,c)，像素顺序为RGB）
3.将数据标准化：transforms.Normalize(mean, std)

对图像按通道进行标准化，即减去均值，再除以方差。这样可以改变分布，让数据正态分布，加快模型的收敛速度。其中参数mean和std分别表示图像每个通道的均值和方差序列。x = (x - mean(x))/std(x)

4.填充：transforms.Pad(padding, fill=0, padding_mode='edge')
padding:列表，元素个数为1，2，4。1：四周填充，2：左右、上下填充，4：左上右下填充
fill:int-单值 or (r,g,b)，fill参数只有在padding_mode为constant时才有效
padding_mode:填充方式
“constant”:利用常值进行填充
“edge”:利用图像边缘像素点进行填充
“reflect”：利用反射的方式进行填充[1, 2, 3, 4] 》[3, 2, 1, 2, 3, 4, 3, 2]
“symmetric”：对称填充方法[1, 2, 3, 4] 》》[2, 1, 1, 2, 3, 4, 4, 3]

from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open('a.jpg')
img0 = transforms.Resize((220,220))(img)
# 尝试四种填充的方式
img1=transforms.Pad([105,105],fill=(0,0,0),padding_mode='constant')(img0)
img2=transforms.Pad([105,105],padding_mode='edge')(img0)
img3=transforms.Pad([105,105],padding_mode='reflect')(img0)
img4=transforms.Pad([105,105],padding_mode='symmetric')(img0)
axs = plt.figure().subplots(1, 5)
axs[0].imshow(img0);axs[0].set_title('src');axs[0].axis('off')
axs[1].imshow(img1);axs[1].set_title('constant');axs[1].axis('off')
axs[2].imshow(img2);axs[2].set_title('edge');axs[2].axis('off')
axs[3].imshow(img3);axs[3].set_title('reflect');axs[3].axis('off')
axs[4].imshow(img4);axs[4].set_title('symmetric');axs[4].axis('off')
plt.show()
# 将img1图片进行了保存，用于查看其图像大小
img1.save('img1.jpg')

输出为：

img1图像为：（其中图像大小为430x430）

在这里padding使用的参数为2，表示左右、上下填充，所以220+105x2 = 430

5.修改亮度、对比度、饱和度和色调(随机光学畸变)：transforms.ColorJitter(brightness=(0.9, 1.2), contrast=(0.9, 1.2), saturation=(0.8, 1.3)，hue=0.3)

其中brightness=0.3表示调整后的亮度为原来的0.7~1.3倍(该方法会随机选取某一倍数)；而brightness=(0.7, 1.3)与其等价，参数必须非负。contrast和saturation参数设置同brightness。

hue:色调因子，用于调整色调的尺度。hue = 0.3，表示调整后的色调为-0.3~0.3，而hue=(-0.3, 0.3)等价于hue=0.3。并且应当有0<= hue <= 0.5 或者 -0.5 <= min <= max <= 0.5。

img1_1 = transforms.ColorJitter(hue=0.5)(img0)
img1_1.save('img1.jpg')

输出为：

6.将图像转为灰度图：transforms.Grayscale(num_output_channels=1)

num_output_channels:当为1时，正常的灰度图，当为3时， 3 channel with r == g == b，其实1和3并没有区别，即便为1，用cv2读取的shape也为(H,W,3)

import cv2
img = Image.open('a.jpg')
img0 = transforms.Resize((220,220))(img)
img1_1 = transforms.Grayscale(num_output_channels=1)(img0)
img1_1.save('img1.jpg')
img1_2 = transforms.Grayscale(num_output_channels=2)(img0)
img1_2.save('img2.jpg')

a = cv2.imread('img1.jpg')
print(a)
print(a.shape)
b = cv2.imread('img2.jpg')
print(b)
print(b.shape)

# 输出为
[[[166 166 166]
  [179 179 179]
  [181 181 181]
  ...
  [182 182 182]
  [179 179 179]
  [170 170 170]]
...
 [[154 154 154]
  [159 159 159]
  [162 162 162]
  ...
  [166 166 166]
  [164 164 164]
  [157 157 157]]]
(220, 220, 3)
[[[166 166 166]
  [179 179 179]
  [181 181 181]
  ...
  [182 182 182]
  [179 179 179]
  [170 170 170]]
...

 [[154 154 154]
  [159 159 159]
  [162 162 162]
  ...
  [166 166 166]
  [164 164 164]
  [157 157 157]]]
(220, 220, 3)

进程已结束，退出代码为 0

该函数与cv2.cvtColor()存在一些差异，转换为灰度图像后，维度是不同的

img = Image.open('a.jpg')
img0 = transforms.Resize((220,220))(img)
img0.save('b.jpg')
c = cv2.imread('b.jpg')
c1 = cv2.cvtColor(c,cv2.COLOR_BGR2GRAY)   # 转为灰度图像
print(c1)
print(c1.shape)

# 输出为
[[167 179 181 ... 182 179 170]
 [180 194 195 ... 195 191 183]
 [184 198 199 ... 198 194 186]
 ...
 [180 188 198 ... 182 182 177]
 [190 195 200 ... 193 193 188]
 [156 159 162 ... 166 164 157]]
(220, 220)

进程已结束，退出代码为 0

7.线性变换：transforms.LinearTransformation(transformation_matrix, mean_vector), 对矩阵做线性变化，可用于白化处理

transformation_matrix (Tensor): tensor [D x D], D = C x H x W
mean_vector (Tensor): tensor [D], D = C x H x W

8.仿射变换：transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, resample=False, fillcolor=0)，在变换的过程中保持中心不变

degrees：要选择的度数范围。如果degrees是一个数字而不是像（min，max）这样的序列，则度数范围将是（-degrees，+degrees）。设置为0可停用旋转

translate（元组，可选） - 水平和垂直平移的最大绝对分数元组。例如translate =（a，b），然后在范围-img_width * a <dx <img_width * a中随机采样水平移位，并且在-img_height * b <dy <img_height * b范围内随机采样垂直移位。默认情况下不会平移

scale：缩放因子间隔，例如（a，b），然后从范围a <= scale <= b中随机采样缩放。默认情况下会保持原始比例。

shear：要选择的度数范围。如果degrees是一个数字而不是像（min，max）这样的序列，则度数范围将是（-degrees，+ degrees）。默认情况下不会应用剪切

resample（{PIL.Image.NEAREST ，PIL.Image.BILINEAR ，PIL.Image.BICUBIC} ，可选） - 可选的重采样过滤器。

fillcolor：输出图像中变换外部区域的可选填充颜色。

img1_3 = transforms.RandomAffine(degrees=30,translate =(0.5,0))(img0)
img1_3.save('img1.jpg')

保存的图像为：

如果该函数中只有degree参数时，其实就相当于transforms.RandomRotation()函数

9.依概率p转为灰度图：transforms.RandomGrayscale(p=0.1) 若通道数为3，则有 r = g = b

10.将tensor或ndarray的数据转换为PILImage类型的数据：transforms.ToPILImage(mode=None)
mode- 为None时，为1通道， mode=3通道默认转换为RGB，4通道默认转换为RGBA

11.自行定义transform操作:transforms.Lambda(lambda)，其中参数是lambda表示的是自定义函数

当官方提供的方法并不能够满足需要时，这时候就需要自定义自己的transform策略方法就是使用transforms.Lambda。比如想要截取图像，但并不想在随机位置截取，而是希望在一个指定的位置去截取那么就需要自定义一个截取函数，然后使用transforms.Lambda去封装它即可，如下：

def _crop(img,pos,size):
	'''
	:param img:输入的图像
	:param pos:图像截取的位置，类型为元组，包含(x,y)
	:param size:图像截取的大小
	:return: 返回截取后的图像
	'''
	ow, oh = img.size
	x1, y1 = pos
	tw = th = size
	if (ow > tw or oh  > th):
		return img.crop((x1,y1,x1+tw,y1+th))
	return img

deta_transforms = transforms.Compose([
	transforms.Lambda(lambda img:_crop(img,(5,5),224)),
	transforms.ToTensor(),
	])

# 原文链接：https://blog.csdn.net/qq_38406029/article/details/115129401

四、对transforms操作，使数据增强更灵活

1.从给定的一系列transforms中选一个进行操作：transforms.RandomChoice()

例如：transforms.RandomChoice([transforms.RandomVerticalFlip(p=1), transforms.RandomHorizontalFlip(p=1)])

2.给一个transform加上概率，以一定的概率执行该操作：transforms.RandomOrder()

例如：transforms.RandomApply([transforms.RandomAffine(degrees=0),transforms.Grayscale(num_output_channels=3)], p=0.5)

3.将transforms中的操作顺序随机打乱：transforms.RandomApply( p=0.5)

例如：

transforms.RandomOrder([transforms.RandomRotation(15),
transforms.Pad(padding=32),
transforms.RandomAffine(degrees=0, translate=(0.01, 0.1), scale=(0.9, 1.1))])

不负初见

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
torchvision.transforms —— 图像预处理包

5.修改亮度、对比度、饱和度和色调(随机光学畸变)：transforms.ColorJitter(brightness=(0.9, 1.2), contrast=(0.9, 1.2), saturation=(0.8, 1.3)，hue=0.3)x = (x - mean(x))/std(x)num_output_channels:当为1时，正常的灰度图，当为3时， 3 channel with r == g == b，其实1和3并没有区别，即便为1，用cv2读取的shape也为(H,W,3)
复制链接

扫一扫