PyTorch实现L1,L2正则化以及Dropout(给代码截图参考)
- 了解知道Dropout原理
- 用代码实现正则化(L1、L2、Dropout)
- Dropout的numpy实现
- PyTorch中实现dropout
- 参考资料:PyTorch 中文文档
作业
Dropout
Dropout出现的原因
训练深度神经网络的时候,总是会遇到两大缺点:容易过拟合;费时。Dropout可以比较有效的缓解过拟合的发生,在一定程度上达到正则化的效果。
什么是Dropout
在前向传播的时候,让某个神经元的激活值以一定的概率p停止工作,这样可以使模型泛化性更强,因为它不会太依赖某些局部的特征
Dropout工作流程及使用
-
首先随机(临时)删掉网络中一半的隐藏神经元,输入输出神经元保持不变(图3中虚线为部分临时被删除的神经元)
-
然后把输入x通过修改后的网络前向传播,然后把得到的损失结果通过修改的网络反向传播。一小批训练样本执行完这个过程后,在没有被删除的神经元上按照随机梯度下降法更新对应的参数(w,b)。
-
然后继续重复这一过程:
- 恢复被删掉的神经元(此时被删除的神经元保持原样,而没有被删除的神经元已经有所更新)
- 从隐藏层神经元中随机选择一个一半大小的子集临时删除掉(备份被删除神经元的参数)。
- 对一小批训练样本,先前向传播然后反向传播损失并根据随机梯度下降法更新参数(w,b) (没有被删除的那一部分参数得到更新,删除的神经元参数保持被删除前的结果)。
Dropout在神经网络中的使用
- Dropout在神经网络中的使用
对应的公式变化如下:
-
没有Dropout的网络计算公式:
-
采用Dropout的网络计算公式:
代码层面实现让某个神经元以概率p停止工作,其实就是让它的激活函数值以概率p变为0。比如我们某一层网络神经元的个数为1000个,其激活函数输出值为y1、y2、y3、…、y1000,我们dropout比率选择0.4,那么这一层神经元经过dropout后,1000个神经元中会有大约400个的值被置为0。
用代码实现正则化L1,L2
import torch
from torch.autograd import Variable
from torch.nn import functional as F
class MLP(torch.nn.Module):
def __init__(self):
super(MLP,self).__init__()
self.linear1 = torch.nn.Linear(128,32)
self.linear2 = torch.nn.Linear(32,16)
self.linear3 = torch.nn.Linear(16,2)
def forward(self,x):
out1 = F.relu(self.linear1(x))
out2 = F.relu(self.linear2(out1))
out = F.relu(self.linear3(out2))
return out,out1,out2
def l1_penalty(var):
return torch.abs(var).sum()
def l2_penalty(var):
return torch.sqrt(torch.pow(var,2).sum())
batchsize = 4
lambda1,lambda2 = 0.5,0.01
for i in range(1000):
model = MLP()
optimizer = torch.optim.SGD(model.parameters(),lr =1e-4)
inputs = torch.rand(batchsize,128)
targets = torch.ones(batchsize).long()
optimizer.zero_grad()
outputs,out1,out2 = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs,targets)
l1_regularization = lambda1 * l1_penalty(out1)
l2_regularization = lambda2 * l2_penalty(out2)
loss = cross_entropy_loss + l1_regularization + l2_regularization
print(i,loss.item())
loss.backward()
optimizer.step()
0 10.036213874816895
1 8.866266250610352
2 12.449080467224121
3 9.355453491210938
Pytorch实现Dropout
导入模块
import torchvision
import torchvision.datasets as datasets #为了下一步加载数据
import torch
import torch.nn.functional as F
import numpy as np
import random #设置随机数
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
<torch._C.Generator at 0x1dd21e25410>
下载数据
train_mnist = datasets.MNIST(root = './data', train = True, download = True) #训练集
test_mnist = datasets.MNIST(root = './data', train = False, download = True) #测试集
train_X, train_Y = train_mnist.data, train_mnist.targets
test_X, test_Y = test_mnist.data, test_mnist.targets
train_X = train_X.float()
test_X = test_X.float()
train_X = train_X / 255.0 #数据归一化
test_X = test_X / 255.0
定义神经网络结构
dim_in = 28 * 28
dim_hid = 128
dim_out = 10
class TwoLayerNet(torch.nn.Module):
def __init__(self, dim_in, dim_hid, dim_out):
super(TwoLayerNet, self).__init__()
# define the model architecture
self.fc1 = torch.nn.Linear(dim_in, dim_hid, bias=True)
self.fc2 = torch.nn.Linear(dim_hid, dim_out, bias=True)
def forward(self, x):
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = F.relu(x)
x = F.dropout(x,p=0.5) #dropout
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# 提前定义模型
model = TwoLayerNet(dim_in, dim_hid, dim_out)
#提前定义loss函数和优化器
loss_fun = torch.nn.NLLLoss(reduction='sum')
eta = 1e-2
optimizer = torch.optim.Adam(model.parameters(), lr=eta)
模型训练
for i in range(100):
#Forward pass
y_pred = model(train_X)
#Loss
loss = loss_fun(y_pred, train_Y)
if (i)%10 ==0:
print(i, loss.item())
optimizer.zero_grad()
# Backward pass
loss.backward()
# update model parameters
optimizer.step()
0 2804.633056640625
10 2753.31005859375
20 2680.813232421875
30 2742.895263671875
40 2675.33984375
50 2608.1005859375
60 2621.881591796875
70 2586.460205078125
80 2539.307861328125
90 2451.8271484375
模型预测
y_pred = model(test_X)
pred = y_pred.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct = pred.eq(test_Y.view_as(pred)).sum().item()
acc = correct / test_X.shape[0]
print('准确率为{}%'.format(acc * 100))
准确率为95.91%