pytorch使用dropout防止过拟合

最新推荐文章于 2024-03-26 18:30:31 发布

陨星落云

最新推荐文章于 2024-03-26 18:30:31 发布

阅读量3.3k

点赞数 7

分类专栏： pytorch python之机器学习数字图像处理与计算机视觉（python）

原文链接：https://github.com/jyao8112/deep-learning-v2-pytorch-master/tree/master/intro-to-pytorch

版权

数字图像处理与计算机视觉（python）同时被 3 个专栏收录

54 篇文章 15 订阅

订阅专栏

pytorch

27 篇文章 17 订阅

订阅专栏

python之机器学习

19 篇文章 2 订阅

订阅专栏

推测与验证

现在您已经学会了训练网络，可以将其用于进行预测。这通常称为推测，是从统计信息中借用的术语。但是，神经网络倾向于在训练数据上表现得太好，并且无法将其推广到以前从未见过的数据，这称为过拟合，它会削弱推理性能。为了在训练过程中发现过拟合，我们不在训练集中测试，而在验证集测试性能。我们在训练过程中监控验证效果时，通过正则化（例如dropout）来避免过拟合。我将向您展示如何在PyTorch中执行此操作。

像之前一样，让我们开始通过Torchvision加载数据集。在后面的部分中，您将学到更多有关Torchvision和加载数据的信息。这次，我们将利用可以通过在此处设置train = False来获得的测试集：

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

测试集包含图像，就像训练集一样。通常，您会看到原始数据集的10-20％用于测试和验证，其余的用于训练。

import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

在这里，我将创建一个普通神经网络的模型。

from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

根据不属于训练集的数据，来验证模型的性能。通常，这只是准确性，即网络正确预测的类别的百分比。其他选项包括 precision， recall 和 top-5 error rate。我们将在这里着重于准确性。首先，我将对测试集中一组数据进行正向传播。

model = Classifier()

images, labels = next(iter(testloader))
# Get the class probabilities
ps = torch.exp(model(images))
# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examples
print(ps.shape)

torch.Size([64, 10])

有了这些概率，我们可以使用ps.topk方法获得最可能的类。这将返回k个最大值。由于我们只想要最可能的类，因此可以使用ps.topk（1）。这将返回前k个值和前k个索引的元组。如果最高值为第五个元素，我们将取回4作为索引。

top_p, top_class = ps.topk(1,dim=1)
# Look at the most likely classes for the first 10 examples
print(top_class[:5,:])

tensor([[0],
        [4],
        [4],
        [4],
        [6]])

现在我们可以检查预测的类是否与标签匹配。通过将top_class和labels等同起来很容易做到，但是我们必须注意形状。这里top_class是形状为（64，1）的2D张量，而标签为形状（64）的1D。为了使相等性按照我们想要的方式工作，top_class和labels必须具有相同的形状。

我们这样做

equals = top_class == labels

equals 将具有形状（64，64），请自己尝试。它的作用是将top_class的每一行中的一个元素与标签中的每个元素进行比较，从而为每一行返回64个True / False布尔值。

equals = top_class == labels.view(*top_class.shape)

现在我们需要计算正确预测的百分比。 equals的值是0或1。这意味着，如果我们将所有值相加并除以值的数量，就可以得出正确预测的百分比。这与取平均值的操作相同，因此我们可以通过调用torch.mean获得准确性。如果就这么简单。如果您尝试使用torch.mean(equals)，则会出现错误。

RuntimeError: mean is not implemented for type torch.ByteTensor

发生这种情况是因为equals的类型为torch.ByteTensor，但没有为该类型的张量实现torch.mean。因此，我们需要将等于转换为浮点张量。请注意，当我们使用torch.mean时，它返回一个标量张量，要获取实际值作为浮点数，我们需要执行precision.item()。

accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')

Accuracy: 10.9375%

该网络是未经训练的，因此会进行随机猜测，我们应该看到10％左右的准确性。现在，让我们训练网络并包括验证集测试，以便我们可以衡量网络在测试集上的表现是否良好。由于我们不在验证阶段更新参数，因此可以通过使用torch.no_grad（）关闭t梯度来加快代码的速度：

# turn off gradients
with torch.no_grad():
    # validation pass here
    for images, labels in testloader:
        ...

练习：在下面实施验证，并在循环后打印出总精度。您应该能够获得80％以上的准确度。

model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

epochs = 30
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        test_loss = 0
        accuracy = 0
        
        # Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
            for images, labels in testloader:
                log_ps = model(images)
                test_loss += criterion(log_ps, labels)
                
                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
                
        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))

        print("Epoch: {}/{}.. ".format(e+1, epochs),
              "Training Loss: {:.3f}.. ".format(running_loss/len(trainloader)),
              "Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),
              "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

Epoch: 1/30..  Training Loss: 0.514..  Test Loss: 0.420..  Test Accuracy: 0.845
Epoch: 2/30..  Training Loss: 0.392..  Test Loss: 0.422..  Test Accuracy: 0.844
Epoch: 3/30..  Training Loss: 0.354..  Test Loss: 0.387..  Test Accuracy: 0.862
''''''
Epoch: 28/30..  Training Loss: 0.190..  Test Loss: 0.467..  Test Accuracy: 0.876
Epoch: 29/30..  Training Loss: 0.186..  Test Loss: 0.442..  Test Accuracy: 0.875
Epoch: 30/30..  Training Loss: 0.178..  Test Loss: 0.455..  Test Accuracy: 0.883

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WnrvlvHD-1586226578465)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_15_1.png)

过拟合

如果我们在训练网络时查看训练和验证损失，我们会看到一种称为过拟合的现象.

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Viubd6mu-1586226578465)(C:\Users\67231\Desktop\Dive-into-DL-PyTorch-master\deep-learning-v2-pytorch-master\deep-learning-v2-pytorch-master\intro-to-pytorch\assets\overfitting.png)]$

网络会越来越好地学习训练集，从而减少了训练损失。但是，它开始出现问题，无法推广到训练集之外的数据，从而导致验证损失增加。任何深度学习模型的最终目标都是对新数据进行预测，因此我们应努力使验证损失降至最低。一种选择是使用模型的验证损失最小，这里是大约8-10个训练时期。此策略称为提前停止。实际上，您在训练时会经常保存模型，然后选择验证损失最小的模型。

减少过度拟合的最常见方法是dropout，我们会随机丢弃输入单元。这迫使网络在权重之间共享信息，从而增强了泛化为新数据的能力。使用nn.Dropout模块可以很容易地在PyTorch中添加dropout。

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
        # Dropout module with 0.2 drop probability
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        # Now with dropout
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        
        # output so no dropout here
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

在训练过程中，我们希望使用dropout来防止过拟合，但是在推理过程中，我们希望使用整个网络。因此，在验证，测试以及使用网络进行预测的任何时候，我们都需要关闭dropout。为此，请使用model.eval（）。这会将模型设置为dropout概率为0的评估模式。您可以通过使用model.train（）将模型设置为训练模式来重新启用dropout。通常，验证循环的模式如下所示：关闭梯度，将模型设置为评估模式，计算验证损失和度量，然后将模型设置回训练模式。

# turn off gradients
with torch.no_grad():
    
    # set model to evaluation mode
    model.eval()
    
    # validation pass here
    for images, labels in testloader:
        ...

# set model back to train mode
model.train()

练习：将dropout添加到模型中，然后再次在Fashion-MNIST上进行训练。看看是否可以获得更低的验证损失或更高的准确性。

## TODO: Define your model with dropout added
from torch import nn,optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784,256)
        self.fc2 = nn.Linear(256,128)
        self.fc3 = nn.Linear(128,64)
        self.fc4 = nn.Linear(64,10)
        self.dropout = nn.Dropout(p=0.3)
    
    def forward(self,x):
        x = x.view(x.shape[0],-1)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        x = F.log_softmax(self.fc4(x),dim=1)
        
        return x

## TODO: Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(),lr=0.001)
epochs = 30
steps = 0

train_losses,test_losses = [],[]
        
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps,labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        test_loss = 0
        accuracy = 0
        
        with torch.no_grad():
            model.eval()
            for images,labels in testloader:
                log_ps = model(images)
                test_loss += criterion(log_ps,labels)
                
                ps = torch.exp(log_ps)
                top_p,top_class = ps.topk(1,dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        model.train()
        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))

        print("Epoch: {}/{}.. ".format(e+1, epochs),
              "Training Loss: {:.3f}..".format(running_loss/len(trainloader)),
              "Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),
              "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

Epoch: 1/30..  Training Loss: 0.653..  Test Loss: 0.469..  Test Accuracy: 0.827
Epoch: 2/30..  Training Loss: 0.477..  Test Loss: 0.434..  Test Accuracy: 0.845
Epoch: 3/30..  Training Loss: 0.436..  Test Loss: 0.397..  Test Accuracy: 0.850
Epoch: 4/30..  Training Loss: 0.408..  Test Loss: 0.396..  Test Accuracy: 0.857
''''''
Epoch: 28/30..  Training Loss: 0.276..  Test Loss: 0.324..  Test Accuracy: 0.886
Epoch: 29/30..  Training Loss: 0.275..  Test Loss: 0.352..  Test Accuracy: 0.882
Epoch: 30/30..  Training Loss: 0.271..  Test Loss: 0.341..  Test Accuracy: 0.884

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-F63ax3rr-1586226578465)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_21_1.png)

推测

现在已经对模型进行了训练，我们可以将其用于推测。我们之前已经做过，但是现在我们需要记住使用model.eval（）将模型设置为推测模式。你还需要使用torch.no_grad（）关闭自动梯度计算。

# Import helper module (should be in the repo)
import helper

# Test out your network!
model.eval()

dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)

# Calculate the class probabilities (softmax) for img
with torch.no_grad():
    output = model.forward(img)

ps = torch.exp(output)

# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nwV9eAGm-1586226578466)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_23_0.png)