torchvision.datasets

torchvision

介绍

torchvision是PyTorch项目的一部分,由流行数据集,模型结构和计算机视觉的常见图像变换构成。简单而言就是

torchvision.datasets

torchvision.models

torchvision.transforms

当然还有别的模块,但这篇文章介绍第一个部分,数据集部分

torchvision.datasets

这个模块内置了很多数据集,并且所有的数据集都是 torch.utils.data.Dataset 的子集,因此可以通过 torch.utils.data.DataLoader读取

import torch 
import torchvision
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')#root 你存放数据集的地址
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=0)

所有数据集都有相似的API,都有两个参数:transform和target_transform来对相应的input和target做变换,也可以通过提供的 base classes来创建自己的数据集。

使用

手写数字MNIST识别

torchvision.dasets.mnist

参数:

  • root 存放数据集的根目录,如果downloaded为true,会下载到当前目录
  • true 如果为True,会从训练文件中创建,即创建训练集,反之从测试文件中创建测试集
  • download 如果为True,在root目录下载文件,如果已有文件不会重复下载,具体的下载地址和文件如下:
 mirrors = [
        "http://yann.lecun.com/exdb/mnist/",
        "https://ossci-datasets.s3.amazonaws.com/mnist/",
    ]

    resources = [
        ("train-images-idx3-ubyte.gz", "f68b3c2dcbeaaa9fbdd348bbdeb94873"),#训练
        ("train-labels-idx1-ubyte.gz", "d53e105ee54ea40749a09fcbcd1e9432"),
        ("t10k-images-idx3-ubyte.gz", "9fb629c4189551a2d022fa330f9573f3"),#测试
        ("t10k-labels-idx1-ubyte.gz", "ec29112dd5afa0611ce80d1b7f02629c"),
    ]
  • transform 可选 在一个PIL图像上施加一个函数/变换,返回变换后的图像
  • target_transform 可选 在target上施加一个函数/变换,在这里target是类别标签
数据集创建

数据集信息

  • 训练集 60000个样本 尺寸1×28×28
  • 测试集 10000个样本 尺寸1×28×28
  • 类别数 0-9 10类手写数字
from torchvision import datasets
from torchvision.transforms import ToTensor
train_data=datasets.MNIST(
    root='data',
    train=True,
    transform=ToTensor(),
    download=True
)
test_data=datasets.MNIST(
    root='data',
    train=False,
    transform=ToTensor(),
    download=True
)
Dataloader创建
from torch.utils.data import DataLoader

loaders = {
    'train':DataLoader(train_data,
                       batch_size=100,
                       shuffle=True,
                       num_workers=1),

    'test':DataLoader(test_data,
                       batch_size=100,
                       shuffle=True,
                       num_workers=1)
}
模型搭建
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


class CNN(nn.Module):
  def __init__(self):
    super(CNN,self).__init__()
    self.conv1 = nn.Conv2d(1,10,kernel_size=5) 
    self.conv2 = nn.Conv2d(10,20,kernel_size=5)
    self.conv2_drop = nn.Dropout2d()
    self.fc1 = nn.Linear(320, 50)
    self.fc2 = nn.Linear(50,10)

  def forward(self, x):
    #x [B, C, H, W]->[100, 1, 28, 28]
    x = F.relu(F.max_pool2d(self.conv1(x),2))                 #x->[100, 10, 12, 12]
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)),2))#x->[100, 20, 4, 4]
    x = x.view(-1,320)                                        #x->[100, 320]
    x = F.relu(self.fc1(x))                                   #x->[100, 50]
    x = F.dropout(x, training=self.training)                  #x->[100, 50]
    x = self.fc2(x)                                           #x->[100, 1]

    return F.softmax(x) 
训练
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = CNN().to(device)

optimizer = optim.Adam(model.parameters(), lr=0.01)

loss_fn = nn.CrossEntropyLoss()

def train(epoch):
    model.train()

    for batch_idx, (data, target) in enumerate(loaders['train']):
        data,target = data.to(device), target.to(device)

        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output,target)


        loss.backward()
        optimizer.step()

        if batch_idx % 20 == 0:
            print(f"Train Epoch: {epoch} [{batch_idx *len(data)}/{len(loaders['train'].dataset)}({100* batch_idx/len(loaders['train']):.0f}%)\t{loss.item():.6f}]")

for epoch in range(10):
    train(epoch)
测试
def test():
    model.eval()
    test_loss = 0
    correct = 0

    with torch.inference_mode():
        for data,target in loaders['test']:
            data,target = data.to(device),target.to(device)

            output = model(data)
            test_loss += loss_fn(output,target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct +=pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(loaders['test'].dataset)
    print(f"\nTest set: Average loss: {test_loss:.4f}, Accuracy {correct}/{len(loaders['test'].dataset)} ({100. * correct/len(loaders['test'].dataset):.0f})%\n")
    
test()
结果可视化
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.gridspec as gridspec

# Create 3x3 subplots
gs = gridspec.GridSpec(3, 3)
fig = plt.figure()
for i in range(9):
    model.eval();
    data,target = test_data[np.random.randint(test_data.data.shape)[0]]

    data = data.unsqueeze(0).to(device)

    output = model(data)

    prediction = output.argmax(dim=1,keepdim=True).item()



    image = data.squeeze(0).squeeze(0).cpu().numpy()

    ax = fig.add_subplot(gs[i]);
    ax.set_title(f"Prediction: {prediction}")
    ax.imshow(image,cmap='gray')
    ax.axis('off')
plt.show();

在这里插入图片描述

OxfordIIITPet猫狗识别

torchvision.datasets.OxfordIIITPet

参数:

  • root 存储数据集的根目录
  • split 数据集划分,支持traineval和test
  • target_types 该数据集有多种标注,如类别和像素级分割
    • category 37种宠物类别
    • segmentation 图像的前后景分割
  • transform 施加在图片上的变换
  • target_transform 施加在标签上的变换
  • download 是否下载数据集
数据集创建
import datetime
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as T

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split

# 划分训练集
raw_train_dataset = torchvision.datasets.OxfordIIITPet(root='./data/oxford-pets', download=True)

# 划分测试集
raw_test_dataset = torchvision.datasets.OxfordIIITPet(root='./data/oxford-pets', split='test', download=True)

print(len(raw_train_dataset))
print(len(raw_test_dataset))

class preprocessDataset(torch.utils.data.Dataset):
    def __init__(self, dataset, transform):
        self.dataset = dataset
        self.transform = transform

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        image, target = self.dataset[index]
        augmented_image = self.transform(image)
        return augmented_image, target

#因为模型主干选择内置resnet34,因此定义默认权重
weights = torchvision.models.resnet.ResNet34_Weights.DEFAULT
#得到预处理变换
preprocess = weights.transforms()
#预处理
train_dataset = preprocessDataset(raw_train_dataset, preprocess)
test_dataset = preprocessDataset(raw_test_dataset, preprocess)
#验证集划分
train_dataset, val_dataset = train_test_split(train_dataset, test_size=0.2, random_state=0)
dataloader创建
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
模型搭建
class ResNet_Classifier(torch.nn.Module):
    def __init__(self, weights, freeze_weights, dropout):
        super(ResNet_Classifier, self).__init__()

        # 使用内置resnet34
        resnet = torchvision.models.resnet34(weights=weights)     
        out_features = 512
    
        # 冻结预训练参数
        if freeze_weights:
            for param in resnet.parameters():
                param.requires_grad = False

        # 移除最后一层
        base_model = nn.Sequential(*list(resnet.children())[:-1])

        self.layers = nn.Sequential( 
                                        base_model,
                                        nn.Flatten(),
                                        nn.Linear(out_features, 512),
                                        nn.ReLU(),
                                        nn.Dropout(dropout),
                                        nn.Linear(512, 37)
                                    )

    def forward(self, x):
        outputs = self.layers(x)
        
        return outputs
trainer
def save_model(model,epoch, name=''):
    now = datetime.datetime.now()
    now = now + datetime.timedelta(hours=5, minutes=30)
    date_time = now.strftime("%Y-%m-%d_%H-%M-%S")

    torch.save(model.state_dict(), f'model_{name}_epoch[{epoch}]_{date_time}.pt')


def evaluate_model(model, dataloader):
    model.eval()  # Set the model to evaluation mode
    
    total_loss = 0.0
    correct_predictions = 0
    total_samples = 0
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)

            # 前向传播
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            # 更新指标
            total_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            correct_predictions += (predicted == labels).sum().item()
            total_samples += labels.size(0)

    average_loss = total_loss / len(dataloader)
    accuracy = correct_predictions / total_samples
    
    return average_loss, accuracy


def early_stop(val_loss, val_history, patience):
    if val_loss < val_history['best']:
        val_history['best'] = val_loss
        val_history['no_improvement'] = 0
    else:
        val_history['no_improvement'] += 1

    if val_history['no_improvement'] >= patience:
        return True

    return False

def train_model(model, dataloader, num_epochs, train_transform, device,early_stop_patience, history, val_history, data_augment=False):
    
    current_lr = optimizer.param_groups[0]['lr']
    best_val = 0

    # 训练
    for epoch in range(num_epochs):
        
        total_loss = 0
        correct_predictions = 0
        total_samples = 0
        
        for images, labels in dataloader:
            # 数据加载到GPU
            images, labels = images.to(device), labels.to(device)

            # 随机数据增强
            if data_augment:
                images = train_transform(images)

            # 梯度清零
            optimizer.zero_grad()

            # 前向传播
            outputs = model(images)

            # 后向传播
            loss = criterion(outputs, labels)
            loss.backward()

            # 权重更新
            optimizer.step()

            total_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            correct_predictions += (predicted == labels).sum().item()
            total_samples += labels.size(0)

        accuracy = correct_predictions / total_samples
        total_loss = total_loss/len(dataloader)
        
        val_loss, val_accuracy = evaluate_model(model, val_loader)
        
        # save the best model so far
        if best_val < val_accuracy and val_accuracy > 0.92:
            best_val = val_accuracy
            save_model(model,epoch+1, str(int(10000* val_accuracy)))

        if (epoch+1)%1 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], LR: {current_lr}, ', end='')
            print(f'train_loss: {total_loss:.5f}, train_acc: {accuracy:.5f}, ', end='')
            print(f'val_loss: {val_loss:.5f}, val_acc: {val_accuracy:.5f}')
                    
        if early_stop(val_loss, val_history, early_stop_patience):
            print(f"Stopped due to no improvement for {val_history['no_improvement']} epochs")
            save_model(model,epoch+1)
            break
                
        model.train(True) # Switch back to training mode
        
        # Update the learning rate
        scheduler.step(val_loss) 
        current_lr = optimizer.param_groups[0]['lr']

        history['train_loss'].append(total_loss)
        history['val_loss'].append(val_loss)
        history['train_acc'].append(accuracy)
        history['val_acc'].append(val_accuracy)

        if (epoch+1) % 50 == 0:
            save_model(model, epoch+1)
训练
num_epochs = 500          # epoch数
learning_rate = 0.015     # 初始学习率
dropout = 0.4
data_augment= True
early_stop_patience = 20   # 早停法的等待epoch数
lr_factor = 0.4
lr_scheduler_patience = 4  # number of epochs

# 数据增强
train_transform = T.RandomChoice([
                                    T.RandomRotation(20),
                                    T.ColorJitter(brightness=0.2, hue=0.1,saturation = 0.1),
                                    T.RandomHorizontalFlip(0.2),
                                    T.RandomPerspective(distortion_scale=0.2)
                                 ], 
                                    [0.3, 0.3, 0.3, 0.1] )

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = ResNet_Classifier(weights=weights, freeze_weights=False, dropout=dropout)
model = model.to(device)
model = torch.nn.DataParallel(model)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
criterion = torch.nn.CrossEntropyLoss()
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=lr_factor, patience=lr_scheduler_patience)

history = {'train_loss':[],'val_loss':[], 'train_acc':[], 'val_acc':[]}
val_history = {'best': 1e9, 'no_improvement':0}

train_model(model, train_loader, num_epochs, train_transform, device,early_stop_patience, history, val_history,data_augment)
测试
loss, acc = evaluate_model(model, test_loader)
print(f'test_loss: {loss}, test_acc: {acc*100:.3f}%')

数据集分类

图像分类数据集

所有数据集详见Datasets — Torchvision 0.16 documentation (pytorch.org)

数据集(部分)描述
CIFAR10 10类别分类数据集
FashionMNIST时装数据集
Flickr8k时间内容描述数据集
Flowers102102类别花卉数据集
MNIST手写数据集
StanfordCars汽车数据集
Food101101食品类别数据集

图片检测或分割数据集

数据集(部分)描述
CocoDetectionMS Coco 检测数据集
VOCSegmentationPascal VOC分割数据集
VOCDetectionPascal VOC检测数据集

光流数据集

数据集(部分)描述
HD1K自动驾驶光流数据集
FlyingChairsFlyingChairs光流数据集

立体匹配数据集

数据集(部分)描述
CarlaStereoCarla simulator data linked in the CREStereo github repo.
Kitti2012StereoKITTI dataset from the 2012 stereo evaluation benchmark.
SceneFlowStereoDataset interface for Scene Flow datasets.
SintelStereoSintel Stereo Dataset.
InStereo2kInStereo2k dataset.
ETH3DStereoETH3D Low-Res Two-View dataset.

图像描述数据集

数据集描述
CocoCaptionsMS Coco Captions 数据集

视频分类数据集

数据集(部分)描述
HMDB51人体动作数据集
UCF101n动作识别数据集

视频预测数据集

数据集(部分)描述
MovingMNIST运动的手写数字数据集

Related Links

torchvision — Torchvision 0.16 documentation (pytorch.org)

nithinbadi/handwritten-digits-predictor: Predicting handwritten digits using PyTorch on Google Colab. Using the MNIST dataset loaded from torchvision (github.com)

limalkasadith/OxfordIIITPet-classification: This repository contains a PyTorch implementation for classifying the Oxford IIIT Pet Dataset using KNN and ResNet. The goal is to differentiate the results obtained using these two approaches. (github.com)

  • 31
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值