【PyTorch学习（四）】数据加载的流程，以面部姿态数据集为例。

最新推荐文章于 2023-04-02 12:36:30 发布

Daylight..

最新推荐文章于 2023-04-02 12:36:30 发布

阅读量488

点赞数 1

分类专栏：学习笔记文章标签： pytorch 学习深度学习

本文链接：https://blog.csdn.net/qq_21754773/article/details/125133692

版权

学习笔记专栏收录该内容

20 篇文章 1 订阅

订阅专栏

PyTorch数据加载的流程

深度学习需要处理大量的数据，一般使用pytorch进行数据加载。
pytorch提供了许多工具来让加载数据更简单并尽量减少代码的复杂度。本文将从一个小数据集中学习如何加载和预处理/增强数据。
需要使用的包：

scikit-image: 图形接口以及变换
pandas: 便于处理csv文件

首先导入相关包：

from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")
plt.ion()   # interactive mode

<matplotlib.pyplot._IonContext at 0x2061b38fe50>

处理的是一个面部姿态的数据集。也就是按如下方式标注的人脸:
数据集下载地址：https://download.pytorch.org/tutorial/faces.zip
将下载好的数据集放在’data/faces/'下。
csv中数据格式如下：

image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y

读取csv并将标注点数据写入(N，2）数组中，其中N是特征点的数量。

landmarks_frame = pd.read_csv('data/faces/face_landmarks.csv')

n = 65
img_name = landmarks_frame.iloc[n, 0]
landmarks = landmarks_frame.iloc[n, 1:].values
landmarks = landmarks.astype('float').reshape(-1, 2)

print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))

输出:

Image name: person-7.jpg
Landmarks shape: (68, 2)
First 4 Landmarks: [[32. 65.]
 [33. 76.]
 [34. 86.]
 [34. 97.]]

展示其中一张图片和它对应的标注点：

def show_landmarks(image, landmarks):
    """Show image with landmarks"""
    plt.imshow(image)
    plt.scatter(landmarks[:, 0], landmarks[:, 1], s=5, marker='.', c='y')
    plt.pause(0.01)  # pause a bit so that plots are updated

plt.figure()
show_landmarks(io.imread(os.path.join('data/faces/', img_name)),
               landmarks)

输出:
在这里插入图片描述

数据集类 Dataset class

torch.utils.data.Dataset 是一个代表数据集的抽象类。你自定的数据集类应该继承自 Dataset 类并重新实现以下方法:

__len__ （两个下划线）实现 len(dataset) 返还数据集的尺寸。
__getitem__ 用来获取一些索引数据，例如使用dataset[i] 获得第i个样本。
下面为本文使用的数据集创建一个类。在 __init__ 中读取csv的文件内容，在 __getitem__中读取图片。（这么做是为了节省内存空间。只有在需要用到图片的时候才读取它而不是一开始就把图片全部存进内存里。）
数据样本将按这样一个字典 {‘image’: image, ‘landmarks’: landmarks}组织。
数据集类将添加一个可选参数 transform 以方便对样本进行预处理。

class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
 Args:
 csv_file (string): Path to the csv file with annotations.
 root_dir (string): Directory with all the images.
 transform (callable, optional): Optional transform to be applied
 on a sample.
 """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].values
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}
        
        if self.transform:
            sample = self.transform(sample)

        return sample

接着实例化这个类并创建几个数据。下面将会打印出前四个例子的尺寸并展示标注的特征点。

face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
                                    root_dir='data/faces/')
print(face_dataset)
fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]

    print(i, sample['image'].shape, sample['landmarks'].shape)

    ax = plt.subplot(1, 4, i + 1)
    plt.tight_layout()
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    show_landmarks(**sample)

    if i == 3:
        plt.show()
        break

<__main__.FaceLandmarksDataset object at 0x00000206280BCAC0>
0 (324, 215, 3) (68, 2)

在这里插入图片描述

1 (500, 333, 3) (68, 2)

在这里插入图片描述

2 (250, 258, 3) (68, 2)

在这里插入图片描述

3 (434, 290, 3) (68, 2)

在这里插入图片描述

转换 Transforms

官网文档介绍：https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html?highlight=transformers

上面例子的图片并不是同样的尺寸，但大多数神经网络的输入要求是相同尺寸的图片。因此还需要对数据集做一些预处理。
创建三个常用的转换:

Rescale: 缩放图片
RandomCrop: 对图片进行随机裁剪。这是一种数据增强操作
ToTensor: 把 numpy 格式图片转为 torch 格式图片 (我们需要交换坐标轴).
写成可调用的类的形式而不是简单的函数，这样就不需要每次调用时传递一遍参数。只需要实现 __call__ 方法，必要的时候实现 __init__ 方法。我们可以这样调用这些转换:

tsfm = Transform(params)
transformed_sample = tsfm(sample)

观察下面这些转换是如何应用在图像和标签上的：
缩放、裁剪、转换tensor的类型

class Rescale(object):
    """将样本中的图像重新缩放到给定的大小.

 Args:
 output_size (tuple or int): 期望的输出大小。
 如果是tuple，输出是output_size匹配。
 如果是int，则匹配较小的图像边缘output_size保持长宽比相同.
 """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]

        return {'image': img, 'landmarks': landmarks}

class RandomCrop(object):
    """Crop randomly the image in a sample.

 Args:
 output_size (tuple or int): Desired output size. If int, square crop
 is made.
 """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        new_h, new_w = self.output_size

        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)

        image = image[top: top + new_h,
                      left: left + new_w]

        landmarks = landmarks - [left, top]

        return {'image': image, 'landmarks': landmarks}

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'landmarks': torch.from_numpy(landmarks)}

组合转换 Compose transforms

将上面创建的三个常用图片转换进行组合，并应用本文。
把图像的短边调整为256，然后随机裁剪 (randomcrop) 为224大小的正方形。需组合使用Rescale 和 RandomCrop 的变换。
可以调用一个简单的类 torchvision.transforms.Compose 来实现这一操作。

scale = Rescale(256)
crop = RandomCrop(128)
composed = transforms.Compose([Rescale(256),
                               RandomCrop(224)])

# Apply each of the above transforms on sample.
fig = plt.figure()
sample = face_dataset[65]
for i, tsfrm in enumerate([scale, crop, composed]):
    transformed_sample = tsfrm(sample)

    ax = plt.subplot(1, 3, i + 1)
    plt.tight_layout()
    ax.set_title(type(tsfrm).__name__)
    show_landmarks(**transformed_sample)

plt.show()

转换后的结果：
在这里插入图片描述

迭代数据集 Iterating through the dataset

下面创建一个带组合转换的数据集：

从文件中读取图片
对读取的图片进行组合转换
操作是随机的 (randomcrop) , 数据被增强了
为作为对比，我们首先使用for循环来对所有创建的数据集执行同样的操作。

transformed_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
                                           root_dir='data/faces/',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               ToTensor()
                                           ]))
print(transformed_dataset)
print(len(transformed_dataset))
for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['landmarks'].size())

    if i == 3:
        break

输出：

<__main__.FaceLandmarksDataset object at 0x0000020626EF51C0>
69
0 torch.Size([3, 224, 224]) torch.Size([68, 2])
1 torch.Size([3, 224, 224]) torch.Size([68, 2])
2 torch.Size([3, 224, 224]) torch.Size([68, 2])
3 torch.Size([3, 224, 224]) torch.Size([68, 2])

从计算成本和功能考虑，for循环存在一下不足：

计算成本高
不能进行批处理(Batching the data）
不能对数据进行随机处理（shuffling）
不支持多线程加载数据
基于以上的不足，人们又创建了一种新的迭代器torch.utils.data.DataLoader以解决这些问题。
collate_fn是确定如何对数据进行批处理的参数。

dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

# Helper function to show a batch
def show_landmarks_batch(sample_batched):
    """Show image with landmarks for a batch of samples."""
    images_batch, landmarks_batch = \
            sample_batched['image'], sample_batched['landmarks']
    batch_size = len(images_batch)
    im_size = images_batch.size(2)

    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

    for i in range(batch_size):
        plt.scatter(landmarks_batch[i, :, 0].numpy() + i * im_size,
                    landmarks_batch[i, :, 1].numpy(),
                    s=10, marker='.', c='r')

        plt.title('Batch from dataloader')

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['image'].size(),
          sample_batched['landmarks'].size())

    # observe 4th batch and stop.
    if i_batch == 3:
        plt.figure()
        show_landmarks_batch(sample_batched)
        plt.axis('off')
        plt.ioff()
        plt.show()
        break

输出：
在这里插入图片描述

0 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
1 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
2 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])
3 torch.Size([4, 3, 224, 224]) torch.Size([4, 68, 2])

以上我们完成了构造和使用数据集类 (datasets), 转换 (transforms) 和数据加载器 (dataloader)。torchvision 包提供了常用的数据集类 (datasets) 和转换 (transforms)。使用torchvision包就不用自己再构造这些类了。

torchvision

torchvision 是PyTorch中用于处理图像任务的包。包括如下类：

torchvision.datasets
torchvision.models
torchvision.transforms
torchvision.utils
torchvision.datasets 是用来进行数据加载的。
torchvision.models 中提供了已经训练好的模型，让我们可以进行数据加载之后，直接使用。

import torchvision.models as models
alexnet = models.alexnet()

此外，torchvision 包中有一个常用的数据集类 ImageFolder。它假定了数据集是以如下方式构造的:

root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png

其中 ‘ants’, ‘bees’ 等是分类标签。
在 PIL.Image 中你也可以使用类似的转换(transforms) 例如 RandomHorizontalFlip, Scale。利用这些你可以按如下的方式创建一个数据加载器 (dataloader) :

import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
                                           transform=data_transform)
dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,
                                             batch_size=4, shuffle=True,
                                             num_workers=4)