黄金时代 —— Pytorch学习记录（二）

最新推荐文章于 2024-07-21 12:03:16 发布

人生简洁之道

最新推荐文章于 2024-07-21 12:03:16 发布

阅读量1.1k

点赞数 1

分类专栏： DL Learning 简历

Adress：CSDN - Life Recoder. PS: 爬虫、网站、公众号等侵权的当个人吧！乱写的笔记 1秒钟成了你的原创？

本文链接：https://blog.csdn.net/BeBuBu/article/details/106874885

版权

DL Learning 同时被 2 个专栏收录

19 篇文章 1 订阅

订阅专栏

简历

10 篇文章 0 订阅

订阅专栏

图片 - TorchVision

Penn-Fudan数据库中对行人检测和分割的预训练Mask R-CNN模型进行微调。
07年的数据集，170个图像，包含345个行人实例
我们将用它来说明如何在torchvision中使用新功能，以便在自定义数据集上训练实例细分模型
继承torch.utils.data.Dataset，实现__len__和__getitem__
- get item：需要返回：
  - 图像：尺寸(H， W）的PIL图像
  - 目标：包含以下字段的一个字典
    - 盒 (FloatTensor [N， 4]）：的N 的坐标在包围盒[X0， Y0， X 1， Y1]格式中，范围从0至W和0至H
    - 标签 (Int64Tensor [N]）：对于每个边界框的标签
    - image_id (Int64Tensor [1]）：图像标识符。它应该是在数据集中的所有图像之间唯一的，评估过程中使用
    - 面积 (张量[N]）：将边界框的面积。这是通过COCO度量评估过程中使用，以分离小，中，大箱之间的度量得分。
    - iscrowd (UInt8Tensor [N]）：用iscrowd =True，将被评估期间忽略。
    - (可选地）掩模 (UInt8Tensor [N， H， W]）：本分割掩码的每个其中一个对象
    - (可选地）关键点 (FloatTensor [N， K， 3]）：对于每一个中的所述一个N个对象，它包含K个关键点[X， Y，能见度]格式中，定义的对象。能见度= 0表示所述关键点是不可见的。请注意，数据增强，翻转关键点的概念是依赖于数据表示，你可能要适应引用/检测/ transforms.py为您的新关键点表示
- 如果要在训练过程中使用长宽比分组(以便每个批次仅包含长宽比相似的图像），则建议您还实现一种get_height_and_width 方法，该方法可返回图像的高度和宽度
- 如果未提供此方法，我们将通过查询数据集的所有元素__getitem__，这会将图像加载到内存中，并且比提供自定义方法要慢

定义模型

在这里插入图片描述

定义模型的4种方法

参考

首先说下，定义模型的方式十分重要，和之后模型的保存加载的文件格式息息相关的！（dict）
方法1这种方法比较常用，早期的教程通常就是使用这种方法。
方法2利用 torch.nn.Sequential() 容器进行快速搭建，模型的各层被顺序添加到容器中。缺点是每层的编号是默认的阿拉伯数字，不易区分。
方法3是对第二种方法的改进：通过add_module()添加每一层，并且为每一层增加了一个单独的名字。
方法4是方法3的另一种写法，通过字典的形式添加每一层，并且设置单独的层名称。

# Method 1
class Net1(torch.nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
        self.dense2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv(x)), 2)
        x = x.view(x.size(0), -1)
        x = F.relu(self.dense1(x))
        x = self.dense2(x)
        return x
# Method 2
class Net2(torch.nn.Module):
    def __init__(self):
        super(Net2, self).__init__()
        self.conv = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 3, 1, 1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2))
        self.dense = torch.nn.Sequential(
            torch.nn.Linear(32 * 3 * 3, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10)
        )
        
    def forward(self, x):
        conv_out = self.conv1(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out
# Method 3
class Net3(torch.nn.Module):
    def __init__(self):
        super(Net3, self).__init__()
        self.conv=torch.nn.Sequential()
        self.conv.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv.add_module("relu1",torch.nn.ReLU())
        self.conv.add_module("pool1",torch.nn.MaxPool2d(2))
        self.dense = torch.nn.Sequential()
        self.dense.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128))
        self.dense.add_module("relu2",torch.nn.ReLU())
        self.dense.add_module("dense2",torch.nn.Linear(128, 10))

    def forward(self, x):
        conv_out = self.conv1(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out
# Method 4
class Net4(torch.nn.Module):
    def __init__(self):
        super(Net4, self).__init__()
        self.conv = torch.nn.Sequential(
            OrderedDict(
                [
                    ("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1)),
                    ("relu1", torch.nn.ReLU()),
                    ("pool", torch.nn.MaxPool2d(2))
                ]
            ))

        self.dense = torch.nn.Sequential(
            OrderedDict([
                ("dense1", torch.nn.Linear(32 * 3 * 3, 128)),
                ("relu2", torch.nn.ReLU()),
                ("dense2", torch.nn.Linear(128, 10))
            ])
        )

    def forward(self, x):
        conv_out = self.conv1(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out

代码

Mask-R-CNN-Fine-tune

注意可视化 - netron

不知道为什么 torch.save(model, ‘xx.pth’)会报错 (Mask RCNN)

# vis_model
example = torch.rand(1, 3, 480, 640)
torch_out = torch.onnx.export(model, example, "test.onnx")
netron.start("test.onnx")

想替换backbone - error

想将MaskRCNN主干网络替换为MobileNetV2

def get_model_instance_segmentation2(num_classes):
    # Backbone
    backbone = tv.models.mobilenet_v2(pretrained=True).features
    backbone.out_channels = 1280 # FasterRCNN需要backbone的输出通道
    anchor_gen = AnchorGenerator(sizes=((32,64,128,256,512),),
        aspect_ratios=((0.5,1.0,2.0),))
    roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'], 
        output_size=7, sampling_ratio=2)
    mask_roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'],
        output_size=14, sampling_ratio=2)
    model = MaskRCNN(backbone, num_classes=num_classes, 
        rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler, 
        mask_roi_pool=mask_roi_pooler)

    # model = FasterRCNN(backbone, num_classes=num_classes, 
    #     rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler)

    return model

报错 … 不知道为什么！

ImageNet 网络微调

- 初始化预训练模型
- 重塑最终图层，使其输出数量与新数据集中的类数相同
- 为优化算法定义我们要在训练期间更新哪些参数
- 运行训练步骤

辅助函数

模型训练和验证

冻结层 requires_grad

初始化和重塑网络

ImageNet，1000个类
只想更新我们要重塑的层的参数
inception_v3 要求输入大小为(299,299)，而所有其他模型都期望为(224,224)

Alexnet

(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, 
    	out_features=1000, bias=True)
 )

model.classifier[6] = nn.Linear(4096,num_classes)

VGG

(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, out_features=1000, bias=True)
 )

model.classifier[6] = nn.Linear(4096,num_classes)

Squeezenet 1.0

(classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AvgPool2d(kernel_size=13, stride=1, padding=0)
 )

model.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))

Resnet

Resnet18，Resnet34，Resnet50，Resnet101 和 Resnet152
(fc): Linear(in_features=512, out_features=1000, bias=True)
model.fc = nn.Linear(512, num_classes)

Densenet

四个变体
但这里我们仅使用Densenet-121。输出层是具有1024个输入要素的线性层：
(classifier): Linear(in_features=1024, out_features=1000, bias=True)
model.classifier = nn.Linear(1024, num_classes)

Inception V3

在训练时它具有两个输出层。第二个输出称为辅助输出，包含在网络的AuxLogits部分中。主要输出是网络末端的线性层。注意，在测试时，我们仅考虑主要输出。加载模型的辅助输出和主要输出打印为：

(AuxLogits): InceptionAux(
    ...
    (fc): Linear(in_features=768, out_features=1000, bias=True)
 )
 ...
(fc): Linear(in_features=2048, out_features=1000, bias=True)

model.AuxLogits.fc = nn.Linear(768, num_classes)
model.fc = nn.Linear(2048, num_classes)

数据加载

hymenoptera_data

创建优化器

运行训练和验证

代码

from __future__ import print_function
from __future__ import division
import torch, torchvision
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, models, transforms
import time, os, copy

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()
    val_acc_history = []
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    for epoch in range(num_epochs):
        print('Epoch {} / {}'.format(epoch, num_epochs - 1))
        print('-' * 10)
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()
            else:
                model.eval()
            running_loss = 0.0
            running_corrects = 0
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else: 
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)
        print()


    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "inception":
        """ Inception v3
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size


if __name__ == '__main__':
    data_dir = './Dataset/hymenoptera_data'
    model_name = 'squeezenet'
    num_classes = 2
    batch_size = 8
    num_epochs = 15
    feature_extract = True
    # Detect if we have a GPU available
    device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
    # Initialize the model for this run
    model_ft, input_size = initialize_model(model_name, num_classes,
        feature_extract, use_pretrained=True)
    # Send the model to GPU
    model_ft = model_ft.to(device)
    # Data
    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomResizedCrop(input_size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'val': transforms.Compose([
            transforms.Resize(input_size),
            transforms.CenterCrop(input_size),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }
    print("Initializing Datasets and Dataloaders...")
    # Create training and validation datasets
    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), 
        data_transforms[x]) for x in ['train', 'val']}
    # Create training and validation dataloaders
    dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], 
        batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}
    # Gather the parameters to be optimized/updated in this run.
    params_to_update = model_ft.parameters()
    print("Params to learn:")
    if feature_extract:
        params_to_update = []
        for name,param in model_ft.named_parameters():
            if param.requires_grad == True:
                params_to_update.append(param)
                print("\t",name)
    else:
        for name,param in model_ft.named_parameters():
            if param.requires_grad == True:
                print("\t",name)

    # Observe that all parameters are being optimized
    optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)
        
    # Setup the loss fxn
    criterion = nn.CrossEntropyLoss()

    # Train and evaluate
    model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, 
        optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))
    

    # Initialize the non-pretrained version of the model used for this run
    scratch_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=False)
    scratch_model = scratch_model.to(device)
    scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=0.001, momentum=0.9)
    scratch_criterion = nn.CrossEntropyLoss()
    _,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))

    # Plot the training curves of validation accuracy vs. number
    #  of training epochs for the transfer learning method and
    #  the model trained from scratch
    ohist = []
    shist = []

    ohist = [h.cpu().numpy() for h in hist]
    shist = [h.cpu().numpy() for h in scratch_hist]

    plt.title("Validation Accuracy vs. Number of Training Epochs")
    plt.xlabel("Training Epochs")
    plt.ylabel("Validation Accuracy")
    plt.plot(range(1,num_epochs+1),ohist,label="Pretrained")
    plt.plot(range(1,num_epochs+1),shist,label="Scratch")
    plt.ylim((0,1.))
    plt.xticks(np.arange(1, num_epochs+1, 1.0))
    plt.legend()
    plt.show()

    plt.savefig('Compare.png')

STN 2015

(spatial transform network)：学习如何使用称为空间变换器网络的视觉注意力机制来扩充网络
利用Attention机制进行端到端训练的网络设计思想！！！

基础

2D仿射变换Affine
3D透射变换projection

STN网络结构

Localisation Network-局部网络：该网络就是一个简单的回归网络。将输入的图片进行几个卷积操作，然后全连接回归出6个角度值（假设是仿射变换），2*3的矩阵。永远不会从该数据集中显式学习变换，而是网络会自动学习增强全局精度的空间变换。 特征图 > 变换矩阵
Parameterised Sampling Grid-参数化网格采样：网格生成器负责将V中的坐标位置，通过矩阵运算，计算出目标图V中的每个位置对应原图U中的坐标位置。即生成T(G)。
Differentiable Image Sampling-差分图像采样：采样器根据T(G)中的坐标信息，在原始图U中进行采样，将U中的像素复制到目标图V中。利用期望的插值方式来计算出对应点的灰度值。

加载数据

MNIST 28x28

import torch, torchvision
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torchvision.datasets as datasets
import torchvision.transforms as transforms

    device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
    # Data
    # Training dataset
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST(root='.', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])), batch_size=64, shuffle=True, num_workers=4)
    # Test dataset
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST(root='.', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])), batch_size=64, shuffle=True, num_workers=4)

定义模型

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d() # 0.5
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

        # Spatial transformer localization-network
        self.localization = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=7), # 22
            nn.MaxPool2d(2, stride=2),      # 11
            nn.ReLU(True),                  # 
            nn.Conv2d(8, 10, kernel_size=5),# 7
            nn.MaxPool2d(2, stride=2),      # 3
            nn.ReLU(True)
        )

        # Regressor for the 3 * 2 affine matrix
        self.fc_loc = nn.Sequential(
            nn.Linear(10 * 3 * 3, 32),
            nn.ReLU(True),
            nn.Linear(32, 3 * 2)
        )

        # Initialize the weights/bias with identity transformation
        self.fc_loc[2].weight.data.zero_()
        self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float))

    # Spatial transformer network forward function
    def stn(self, x):
        xs = self.localization(x)
        xs = xs.view(-1, 10 * 3 * 3)
        theta = self.fc_loc(xs)
        theta = theta.view(-1, 2, 3)
        grid = F.affine_grid(theta, x.size()) # theta 28x28
        x = F.grid_sample(x, grid) # resample
        return x

    def forward(self, x):
        # transform the input
        x = self.stn(x)
        # Perform the usual forward pass
        # conv-pool-conv-drop-pool-fc-relu-drop-fc
        x = F.relu(F.max_pool2d(self.conv1(x), 2)) # 24 -> 12
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # 8 -> 4
        x = x.view(-1, 320) # 4x4x20
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

可视化结果

def convert_image_np(inp):
    """Convert a Tensor to numpy image."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    return inp

def visualize_stn():
    with torch.no_grad():
        # Get a batch of training data
        data = next(iter(test_loader))[0].to(device)

        input_tensor = data.cpu()
        transformed_input_tensor = model.stn(data).cpu()

        in_grid = convert_image_np(
            torchvision.utils.make_grid(input_tensor))

        out_grid = convert_image_np(
            torchvision.utils.make_grid(transformed_input_tensor))

        # Plot the results side-by-side
        f, axarr = plt.subplots(1, 2)
        axarr[0].imshow(in_grid)
        axarr[0].set_title('Dataset Images')

        axarr[1].imshow(out_grid)
        axarr[1].set_title('Transformed Images')

训练模型

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 500 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

def test():
    with torch.no_grad():
        model.eval()
        test_loss = 0
        correct = 0
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, size_average=False).item()
            # get the index of the max log-probability
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
        test_loss /= len(test_loader.dataset)
        print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'
              .format(test_loss, correct, len(test_loader.dataset),
                      100. * correct / len(test_loader.dataset)))

# Model + Optimizer
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
for epoch in range(1, 20+1):
train(epoch)
test()
visualize_stn()
plt.ioff()
plt.show()
plt.savefig('stn_vis.png')

在这里插入图片描述

风格迁移

在这里插入图片描述

定义了两个距离，一个用于内容和一种样式。测量两个图像之间的内容有多不同，而测量两个图像之间样式的差异。然后，我们获取第三个图像输入，并将其转换为最小化与内容图像的内容距离和与样式图像的样式距离。现在我们可以导入必要的包并开始神经传递。

准备

损失函数

内容损失 + 风格损失
内容损失：内容损失是代表单个图层内容距离的加权版本的函数。
风格损失：

预训练模型

PyTorch的VGG实现是一个模块，分为两个子 Sequential模块：(features包含卷积和池化层）和classifier(包含完全连接的层）。
我们将使用该features模块，因为我们需要各个卷积层的输出来测量内容和样式损失。某些层在训练期间的行为与评估不同，因此我们必须使用将网络设置为评估模式.eval()

开始训练

作者Leon Gatys在此处建议的那样，我们将使用L-BFGS算法来运行我们的梯度下降。与训练网络不同，我们希望训练输入图像以最大程度地减少内容/样式损失

代码

from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
import matplotlib.pyplot as plt

import torchvision.transforms as transforms
import torchvision.models as models

import copy

def image_loader(image_name):
    image = Image.open(image_name)
    # fake batch dimension required to fit network's input dimensions
    image = loader(image).unsqueeze(0)
    return image.to(device, torch.float)

unloader = transforms.ToPILImage()  # reconvert into PIL image

def imshow(tensor, title=None):
    image = tensor.cpu().clone()  # we clone the tensor to not do changes on it
    image = image.squeeze(0)      # remove the fake batch dimension
    image = unloader(image)
    plt.imshow(image)
    if title is not None:
        plt.title(title)
    plt.pause(0.001) # pause a bit so that plots are updated

class ContentLoss(nn.Module):
    # 尽管此模块名为ContentLoss，但它不是真正的PyTorch损失函数。
    # 如果要将内容损失定义为PyTorch损失函数，
    # 则必须创建一个PyTorch autograd函数以在backward方法中手动重新计算/实现梯度。
    def __init__(self, target,):
        super(ContentLoss, self).__init__()
        # we 'detach' the target content from the tree used
        # to dynamically compute the gradient: this is a stated value,
        # not a variable. Otherwise the forward method of the criterion
        # will throw an error.
        self.target = target.detach()

    def forward(self, input):
        self.loss = F.mse_loss(input, self.target)
        return input

def gram_matrix(input):
    a, b, c, d = input.size()  # a=batch size(=1)
    # b=number of feature maps
    # (c,d)=dimensions of a f. map (N=c*d)
    features = input.view(a * b, c * d)  # resise F_XL into \hat F_XL
    G = torch.mm(features, features.t())  # compute the gram product
    # we 'normalize' the values of the gram matrix
    # by dividing by the number of element in each feature maps.
    return G.div(a * b * c * d)

class StyleLoss(nn.Module):

    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()

    def forward(self, input):
        G = gram_matrix(input)
        self.loss = F.mse_loss(G, self.target)
        return input

# create a module to normalize input image so we can easily put it in a
# nn.Sequential
class Normalization(nn.Module):
    def __init__(self, mean, std):
        super(Normalization, self).__init__()
        # .view the mean and std to make them [C x 1 x 1] so that they can
        # directly work with image Tensor of shape [B x C x H x W].
        # B is batch size. C is number of channels. H is height and W is width.
        self.mean = torch.tensor(mean).view(-1, 1, 1)
        self.std = torch.tensor(std).view(-1, 1, 1)

    def forward(self, img):
        # normalize img
        return (img - self.mean) / self.std

# desired depth layers to compute style/content losses :
content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
                               style_img, content_img,
                               content_layers=content_layers_default,
                               style_layers=style_layers_default):
    cnn = copy.deepcopy(cnn)

    # normalization module
    normalization = Normalization(normalization_mean, normalization_std).to(device)

    # just in order to have an iterable access to or list of content/syle
    # losses
    content_losses = []
    style_losses = []

    # assuming that cnn is a nn.Sequential, so we make a new nn.Sequential
    # to put in modules that are supposed to be activated sequentially
    model = nn.Sequential(normalization)

    i = 0  # increment every time we see a conv
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = 'conv_{}'.format(i)
        elif isinstance(layer, nn.ReLU):
            name = 'relu_{}'.format(i)
            # The in-place version doesn't play very nicely with the ContentLoss
            # and StyleLoss we insert below. So we replace with out-of-place
            # ones here.
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = 'pool_{}'.format(i)
        elif isinstance(layer, nn.BatchNorm2d):
            name = 'bn_{}'.format(i)
        else:
            raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))

        model.add_module(name, layer)

        if name in content_layers:
            # add content loss:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module("content_loss_{}".format(i), content_loss)
            content_losses.append(content_loss)

        if name in style_layers:
            # add style loss:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module("style_loss_{}".format(i), style_loss)
            style_losses.append(style_loss)

    # now we trim off the layers after the last content and style losses
    for i in range(len(model) - 1, -1, -1):
        if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):
            break
    model = model[:(i + 1)]
    return model, style_losses, content_losses

def get_input_optimizer(input_img):
    # this line to show that input is a parameter that requires a gradient
    optimizer = optim.LBFGS([input_img.requires_grad_()])
    return optimizer

def run_style_transfer(cnn, normalization_mean, normalization_std,
                       content_img, style_img, input_img, num_steps=300,
                       style_weight=1000000, content_weight=1):
    """Run the style transfer."""
    print('Building the style transfer model..')
    model, style_losses, content_losses = get_style_model_and_losses(cnn,
        normalization_mean, normalization_std, style_img, content_img)
    optimizer = get_input_optimizer(input_img)

    print('Optimizing..')
    run = [0]
    while run[0] <= num_steps:

        def closure():
            # correct the values of updated input image
            input_img.data.clamp_(0, 1)

            optimizer.zero_grad()
            model(input_img)
            style_score = 0
            content_score = 0

            for sl in style_losses:
                style_score += sl.loss
            for cl in content_losses:
                content_score += cl.loss

            style_score *= style_weight
            content_score *= content_weight

            loss = style_score + content_score
            loss.backward() # loss得bp!!!

            run[0] += 1
            if run[0] % 50 == 0:
                print("run {}:".format(run))
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.item(), content_score.item()))
                print()

            return style_score + content_score

        optimizer.step(closure)

    # a last correction...
    input_img.data.clamp_(0, 1)

    return input_img

if __name__ == '__main__':
    #1 Prepare
    device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
    # desired size of the output image
    imsize = 512 if torch.cuda.is_available() else 128  # use small size if no gpu
    loader = transforms.Compose([
        transforms.Resize(imsize),  # scale imported image
        transforms.ToTensor()])  # transform it into a torch tensor
    style_img = image_loader("./Dataset/neural-style/picasso.jpg")
    content_img = image_loader("./Dataset/neural-style/dancing.jpg")
    assert style_img.size() == content_img.size(), \
        "we need to import style and content images of the same size"
    #2 Model
    cnn = models.vgg19(pretrained=True).features.to(device).eval()
    cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
    cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)
    #3 Input: 使用内容图像或白噪声的副本
    input_img = content_img.clone()
    # input_img = torch.randn(content_img.data.size(), device=device)
    #4 Train
    output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
        content_img, style_img, input_img)

    plt.figure()
    imshow(output, title='Output Image')
    plt.ioff()
    plt.show()
    plt.savefig('transferImg.png')

在这里插入图片描述

对抗性实例生成

设计和训练模型的一个经常被忽略的方面是安全性和鲁棒性，尤其是在面对想要欺骗模型的对手的情况下。
提高您对ML模型的安全漏洞的认识，并深入了解对抗性机器学习的热门话题。
将通过图像分类器上的示例来探讨该主题。具体来说，我们将使用第一种也是最流行的攻击方法之一，即快速梯度符号攻击(FGSM） 来欺骗MNIST分类器

威胁模型

就上下文而言，有许多类别的对抗性攻击，每种攻击者都有不同的目标和对攻击者知识的假设。
总体目标是向输入数据添加最少的扰动，以引起所需的错误分类。
攻击假设两种是：white-box和black-box
- white-box攻击假设攻击者有充分的知识和访问模型，包括结构，输入，输出，和权重！
- black-box攻击假设攻击者只能访问输入和模型的输出，并且一无所知底层架构或权重
攻击目标：错误分类和源/目标错误分类 ① 错误分类的目标：对手只希望输出分类错误，而不关心新分类是什么。② 源/目标误分类：对手想要改变图像是特定源类的最初使得其被归类为特定的目标类
FGSM攻击是white-box攻击，目的是进行错误分类

快速梯度符号攻击 FGSM

它旨在利用神经网络的学习方式，梯度来攻击神经网络
不是根据反向传播的梯度通过调整权重来使损失最小化，而是根据相同的反向传播的梯度来调整输入数据以使损失最大化。
即：攻击使用输入数据的损失梯度，然后调整输入数据以使损失最大化

代码 `TODO`

GAN

生成对抗网络

它们由两个不同的模型组成：生成器和判别器
生成器的工作是生成看起来像训练图像的“假”图像
判别器的工作是查看图像并从生成器输出它是真实的训练图像还是伪图像
在训练过程中，生成器不断尝试通过生成越来越好的伪造品而使判别器的性能超过智者，而判别器正在努力成为更好的侦探并正确地对真实和伪造图像进行分类
博弈的平衡点是当生成器生成的伪造品看起来像直接来自训练数据时，而判别器则始终猜测生成器输出是真实还是伪造品的50％置信度
然而，GAN的收敛理论仍在积极研究中，实际上，模型并不总是能达到此目的。

DCGAN

Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks
判别器由 Conv 、BN 以及 LeakyReLU 激活层组成。输入是 3x64x64 的图像，输出是输入图像来自实际数据的概率
生成器由 Deconv，BN 以及 ReLU 激活层组成。输入是一个本征向量(latent vector），它是从标准正态分布中采样得到的，输出是一个3x64x64 的RGB图像！

MORE `TODO`

并行/分布式训练

Parallel & Distribution

Not Only Pytorch

模型的生产部署

服务器部署

把深度学习部署到服务器上！
Flask API服务器

部署到C++环境

TorchScript是PyTorch模型(子类nn.Module）的中间表示，可以在高性能环境(例如C ++）中运行
TorchScript的作用：
- TorchScript代码可以在其自己的解释器中调用，该解释器基本上是受限制的Python解释器。该解释器不获取全局解释器锁定，因此可以在同一实例上同时处理许多请求。
- 这种格式使我们可以将整个模型保存到磁盘上，并将其加载到另一个环境中，例如在以Python以外的语言编写的服务器中
- TorchScript为我们提供了一种表示形式，其中我们可以对代码进行编译器优化以提供更有效的执行
- TorchScript 允许我们与许多后端/设备运行时进行接口，这些运行时比单个操作员需要更广泛的程序视图。

扩展Pytorch

从C++中加载TorchScript

#include <torch/script.h>
opencv / Mat

C++写torch

C++ 扩展有两种形式：可以使用 setuptools “提前”构建，也可以通过 torch.utils.cpp_extension.load() “即时”构建。我们将从第一种方法开始，稍后再讨论后者

from setuptools import setup, Extension
from torch.utils import cpp_extension

setup(name='lltm_cpp',
      ext_modules=[cpp_extension.CppExtension('lltm_cpp', ['lltm.cpp'])],
      cmdclass={'build_ext': cpp_extension.BuildExtension})

#include <torch/extension.h>

#include <torch/extension.h>

#include <iostream>

torch::Tensor d_sigmoid(torch::Tensor z) {
  auto s = torch::sigmoid(z);
  return (1 - s) * s;
}

<torch / extension.h> 是一站式(one-stop）头文件，其中包括编写 C++ 扩展所有必需的 PyTorch 扩展。这包括：
- ATen 库，它是我们张量计算的主要 API，
- pybind11，用于实现我们的 C++ 代码的 Python 衔接方法，
- 其他管理 ATen 和 pybind11 交互细节的头文件。
d_sigmoid() 的实现展示了如何使用 ATen API。PyTorch 的张量和变量接口是由 ATen 库自动生成的，因此我们可以或多或少地实现将 Python 以 1：1 的形式转换为 C++ 我们用于所有计算的主要数据类型将是 torch::Tensor。它的完整 API 可以在这里查到。注意，我们可以包含或任何其他 C 或 C++ 头文件 – 我们可以使用 C++11 的全部功能
前向

#include <vector>

std::vector<at::Tensor> lltm_forward(
    torch::Tensor input,
    torch::Tensor weights,
    torch::Tensor bias,
    torch::Tensor old_h,
    torch::Tensor old_cell) {
  auto X = torch::cat({old_h, input}, /*dim=*/1);

  auto gate_weights = torch::addmm(bias, X, weights.transpose(0, 1));
  auto gates = gate_weights.chunk(3, /*dim=*/1);

  auto input_gate = torch::sigmoid(gates[0]);
  auto output_gate = torch::sigmoid(gates[1]);
  auto candidate_cell = torch::elu(gates[2], /*alpha=*/1.0);

  auto new_cell = old_cell + candidate_cell * input_gate;
  auto new_h = torch::tanh(new_cell) * output_gate;

  return {new_h,
          new_cell,
          input_gate,
          output_gate,
          candidate_cell,
          X,
          gate_weights};
}

后向

#include <vector>

std::vector<at::Tensor> lltm_forward(
    torch::Tensor input,
    torch::Tensor weights,
    torch::Tensor bias,
    torch::Tensor old_h,
    torch::Tensor old_cell) {
  auto X = torch::cat({old_h, input}, /*dim=*/1);

  auto gate_weights = torch::addmm(bias, X, weights.transpose(0, 1));
  auto gates = gate_weights.chunk(3, /*dim=*/1);

  auto input_gate = torch::sigmoid(gates[0]);
  auto output_gate = torch::sigmoid(gates[1]);
  auto candidate_cell = torch::elu(gates[2], /*alpha=*/1.0);

  auto new_cell = old_cell + candidate_cell * input_gate;
  auto new_h = torch::tanh(new_cell) * output_gate;

  return {new_h,
          new_cell,
          input_gate,
          output_gate,
          candidate_cell,
          X,
          gate_weights};
}

一旦您用 C++ 和 ATen 编写了计算，可以使用 pybind11 以非常简单的方式将 C++ 函数或类衔接到 Python 中

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("forward", &lltm_forward, "LLTM forward");
  m.def("backward", &lltm_backward, "LLTM backward");
}

GPU：CUDA内核

#include <cuda.h>
#include <cuda_runtime.h>

ONNX

使用PyTorch C++ 前端

注解

Autograd

Broadcast

CPU线程/TorchScript推理

CUDA语义

Pytorch的自定义模块

大规模部署

并行处理

完全可重现的堵塞

序列化的相关语义S/L

问题

人生简洁之道

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

黄金时代 —— Pytorch学习记录（二）

文章目录

图片 - TorchVision

定义模型

定义模型的4种方法

代码

注意可视化 - netron

想替换backbone - error

ImageNet 网络微调

辅助函数

模型训练和验证

冻结层 requires_grad

初始化和重塑网络

Alexnet

VGG

Squeezenet 1.0

Resnet

Densenet

Inception V3

数据加载

创建优化器

运行训练和验证

代码

STN 2015

基础

STN网络结构

加载数据

定义模型

可视化结果

训练模型

风格迁移

准备

损失函数

预训练模型

开始训练

代码

对抗性实例生成

威胁模型

快速梯度符号攻击 FGSM

代码 TODO

GAN

生成对抗网络

DCGAN

MORE TODO

并行/分布式训练

Not Only Pytorch

模型的生产部署

服务器部署

部署到C++环境

扩展Pytorch

从C++中加载TorchScript

C++写torch

ONNX

使用PyTorch C++ 前端

注解

Autograd

Broadcast

CPU线程/TorchScript推理

CUDA语义

Pytorch的自定义模块

大规模部署

并行处理

完全可重现的堵塞

序列化的相关语义S/L

问题

代码 `TODO`

MORE `TODO`