图像数据读取和数据扩增

最新推荐文章于 2022-11-01 20:30:23 发布

但愿此生，从未邂逅

最新推荐文章于 2022-11-01 20:30:23 发布

阅读量313

点赞数 2

分类专栏：人工智能笔记数据处理文章标签： python 深度学习 pytorch

本文链接：https://blog.csdn.net/qq_56551150/article/details/126265352

版权

人工智能同时被 3 个专栏收录

10 篇文章 0 订阅

订阅专栏

笔记

5 篇文章 0 订阅

订阅专栏

数据处理

1 篇文章 0 订阅

订阅专栏

文章目录

前言
一、pytorch自带数据集读取方法
二、自定义数据集及读取方法
三、数据扩增
四、求补充

前言

提示：这里是文要记录的大概内容：

在进行机器学习或者深度学习时，最另我头疼的就是各种类型数据集的加载，以及数据的扩增，本文希望可以总结一些常见的方法，方便大家查阅，本人知识有限，希望各位大佬能够补充，万分感谢。

提示：以下是本篇文章正文内容

一、pytorch自带数据集读取方法

示例：CIFAR10下载和读取。

torchvision.datasets.CIFAR10(dataset_dir, train=True, transform=None, target_transform=None, download=False) 
train_loader = torch.utils.data.DataLoader(train_data,
                                           batch_size=2,
                                           shuffle=True,
                                           num_workers=4)

二、自定义数据集及读取方法

提示：由于本文主要是整理常见方法，没有具体思路文章总体可能会较乱。
图像数据 ➡ 图像索引文件 ➡ 使用Dataset构建数据集 ➡ 使用DataLoader读取数据

新建文件夹

if (not os.path.exists(data_path)):
            os.makedirs(data_path)
# data_path是文件夹名称（路径）

构建Dataset

class MyDataset(Dataset):  # 继承Dataset类
   def __init__(self):
       # 初始化图像文件路径或图像文件名列表等
       pass
   def __getitem__(self, index):
        # 1.根据索引index从文件中读取一个数据（例如，使用numpy.fromfile，PIL.Image.open，cv2.imread）
        # 2.预处理数据（例如torchvision.Transform）
        # 3.返回数据对（例如图像和标签）
       pass
   
   def __len__(self):
       return count  # 返回数据量

ImageFolder读取形式

使用torchvision包中的ImageFolder类针对上述的文件目录组织形式快速创建dataset。

train_dataset = torchvision.datasets.ImageFolder(root=train_root,                                        transform=train_transform)
# root为文件所在根目录

常见各种类型数据读取

# 文件直接索引成列表
train_path=glob.glob('../../../dataset/tianchi_SVHN/train/*.png')
train_path = os.listdir(path)
# txt, csv一般使用pandas
train_data = pd.read_csv('...')
# json
train_json = json.load(open('...')

注意：上述方法一般仅仅是拿到文件路径，在dataset中还需要进行打开

 img = Image.open(self.img_path[index]).convert('RGB')

三、数据扩增

以以下方法进行扩展介绍。

transforms.Compose([
                    transforms.Resize((64, 128)),
                    transforms.RandomCrop((60, 120)),
                    transforms.ToTensor(),
                    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])),

裁剪

（1）中心裁剪

torchvision.transforms.CenterCrop(size)
# size(sequence or int) - 裁剪后的输出尺寸。若为sequence，表示(h, w)；若为int，表示(size, size)。

（2）随机裁剪

torchvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
'''
 参数：
size(sequence or int) - 裁剪后的输出尺寸。若为sequence，表示(h, w)；若为int，表示(size, size)。
padding(int or sequence, optional) - 图像填充像素的个数。默认None，不填充；若为int，图像上下左右均填充int个像素；若为sequence，有两个给定值时，第一个数表示左右填充像素个数，第二个数表示上下像素填充个数，有四个给定值时，分别表示左上右下填充像素个数。
fill - 只针对constant填充模式，填充的具体值。默认为0。若为int，各通道均填充该值；若为长度3的tuple时，表示RGB各通道填充的值。
padding_mode - 填充模式。● constant：特定的常量填充；● edge：图像边缘的值填充● reflect；● symmetric。
'''

（3）随机长宽比裁剪

torchvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
'''
参数：
size - 期望输出的图像尺寸
scale - 随机裁剪的区间，默认(0.08, 1.0)，表示随机裁剪的图片在0.08倍到1.0倍之间。
ratio - 随机长宽比的区间，默认(3/4, 4/3)。
interpolation - 差值方法，默认为PIL.Image.BILINEAR（双线性差值）
'''

翻转和旋转

（1）依概率水平翻转

torchvision.transforms.RandomHorizontalFlip(p=0.5)
'''
参数：
p(float) - 翻转概率，默认0.5。
'''

（2）依概率垂直翻转

torchvision.transforms.RandomVerticalFlip(p=0.5)
'''
参数：

p(float) - 翻转概率，默认0.5。
'''

（3）随机旋转

torchvision.transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
'''
参数：

degrees(sequence or float or int) - 待选择旋转度数的范围。如果是一个数字，表示在(-degrees, +degrees)范围内随机旋转；如果是类似(min, max)的sequence，则表示在指定的最小和最大角度范围内随即旋转。

resample({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional) - 重采样方式，可选。

expand(bool, optional) - 图像尺寸是否根据旋转后的图像进行扩展，可选。若为True，扩展输出图像大小以容纳整个旋转后的图像；若为False或忽略，则输出图像大小和输入图像的大小相同。

center(2-tuple, optional) - 旋转中心，可选为中心旋转或左上角点旋转。

fill(n-tuple or int or float) - 旋转图像外部区域像素的填充值。此选项仅使用pillow >= 5.2.0。
'''

其他图像变换

（1）转为tensor

torchvision.transforms.ToTensor()
'''
将PIL Image或范围在[0, 255]的numpy.ndarray(H×W×C)转换成范围为[0.0, 1.0]的torch.Float(C×H×W)类型的tensor
'''

（2）转为PILImage

torchvision.transforms.ToPILImage(mode=None)
'''
将tensor(C×H×W)或者numpy.ndarray(H×W×C)的数据转换为PIL Image类型数据，同时保留值范围
参数：
mode(PIL.Image mode) - 输入数据的颜色空间和像素深度。如果为None(默认)时，会对数据做如下假定：输入为1通道，mode根据数据类型确定；输入为2通道，mode为LA；输入为3通道，mode为RGB；输入为4通道，mode为RGBA。
'''

（3）填充

torchvision.transforms.Pad(padding, fill=0, padding_mode='constant')
'''
参数：
对给定的PIL Image使用给定的填充值进行填充
padding(int or tuple) - 图像填充像素的个数。若为int，图像上下左右均填充int个像素；若为tuple，有两个给定值时，第一个数表示左右填充像素个数，第二个数表示上下像素填充个数，有四个给定值时，分别表示左上右下填充像素个数。

fill - 只针对constant填充模式，填充的具体值。默认为0。若为int，各通道均填充该值；若为长度3的tuple时，表示RGB各通道填充的值。

padding_mode - 填充模式。● constant：特定的常量填充；● edge：图像边缘的值填充● reflect；● symmetric。
'''

（4）resize

torchvision.transforms.Resize(size, interpolation=2)
'''
重置PIL Image的size
参数：
size(sequence or int) - 需求的输出图像尺寸。如果size是类似(h, w)的sequence，表示输出图像高为h，宽为w；如果为int，则匹配图像较小的边到size，并保持高宽比，如 height > width，图像将被重置为(size * height / width, size)。
interpolation(int, optional) - 差值方式，默认为PIL.Image.BILINEAR
'''

（5）标准化

torchvision.transforms.Normalize(mean, std, inplace=False)
'''
对tensor image进行标准化。根据给定的n个通道的均值(mean[1],...,mean[n])和标准差(std[1],..,std[n])计算每个通道的输出值output[channel] = (input[channel] - mean[channel]) / std[channel]
参数：
mean(sequence) - 含有每个通道均值的sequence
std(sequence) - 含有每个通道标准差的sequence
inplace (bool, optional) - 是否替换原始数据
'''