pytorch学习笔记本

最新推荐文章于 2023-04-04 17:10:52 发布

qq_41802245

最新推荐文章于 2023-04-04 17:10:52 发布

阅读量240

点赞数

文章标签： pytorch 深度学习机器学习

本文链接：https://blog.csdn.net/qq_41802245/article/details/107220924

版权

torch 下：nn、autograd 、mm、 optim
nn下： functional、Parameter、BCEWithLogitsLoss、Sequential 、Module

神经网络搭建的简单过程

#导入常用的库
import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt

#生成数据
np.random.seed(1)
m = 400
N = int(m/2)
D = 2
x = np.zeros((m,D))
y = np.zeros((m,1),dtype='uint8')
a = 4

for j in range(2):
    ix = range(N*j, N*(j+1))
    t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2
    r = a*np.sin(4*t) + np.random.randn(N)*0.2
    x[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j

x = torch.from_numpy(x).float()
y = torch.from_numpy(y).float()

法1：通过定义函数去搭建网络结构
def func_net(x):
    x1 = torch.mm(x, w1) + b1
    x1 = F.tanh(x1)
    x2 = torch.mm(x1,w2) + b2
    
    return x2

optim_1 = torch.optim.SGD([w1,w2,b1,b2], 0.01)
criterion = nn.BCEWithLogitsLoss()

for e in range(10000):
    out = func_net(Variable(x))
    loss = criterion(out, Variable(y))
    optim_1.zero_grad()
    loss.backward()
    optim_1.step()
    if (e+1)%1000 == 0:
        print('epoch:{},loss:{}'.format(e+1, loss))


法二：采用pytorch的Sequential模块搭建
seq_net = nn.Sequential(
    nn.Linear(2,4),
    nn.Tanh(),
    nn.Linear(4,1)
    

)

optim_2 = torch.optim.SGD(seq_net.parameters(), 0.01)
criterion = nn.BCEWithLogitsLoss()

for e in range(10000):
    out = seq_net(Variable(x))
    loss = criterion(out, Variable(y))
    optim_2.zero_grad()
    loss.backward()
    optim_2.step()
    if (e+1)%1000 == 0:
        print('epoch:{},loss:{}'.format(e+1,loss))

法三：采用pytorch的Module模块搭建

class module_net(nn.Module):
    def __init__(self,num1,num2,num3):
        super(module_net,self).__init__()
        self.layer1 = nn.Linear(num1,num2)
        self.layer2 = nn.Tanh()
        self.layer3 = nn.Linear(num2,num3)
    def forward(self,x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        return x
    
mod_net = module_net(2,4,1)

optim_3 = torch.optim.SGD(mod_net.parameters(), 0.01)
criterion = nn.BCEWithLogitsLoss()

for e in range(10000):
    out = mod_net(Variable(x))
    loss = criterion(out,Variable(y))
    optim_3.zero_grad()
    loss.backward()
    optim_3.step()
    if (e+1)%1000 == 0:
        print('epoch:{},loss:{}'.format(e+1,loss))

三种方法，法二和法三的效果会更好，这是因为 PyTorch 自带的模块比我们写的更加稳定，这2一些初始化的问题相关。

模型的访问

mod_net.layer1
》
Linear(in_features=2, out_features=4, bias=True)


mod_net.layer1.weigtht
》
Parameter containing:
tensor([[-0.0118, -1.9755],
        [ 1.3713,  1.3738],
        [ 1.4056, -1.5453],
        [-0.0824, -0.0446]], requires_grad=True)

for i in mod_net.children():
	print(i)
》
Linear(in_features=2, out_features=4, bias=True)
Tanh()
Linear(in_features=4, out_features=1, bias=True)


for i in mod_net.modules():
	print(i)
》
module_net(
  (layer1): Linear(in_features=2, out_features=4, bias=True)
  (layer2): Tanh()
  (layer3): Linear(in_features=4, out_features=1, bias=True)
)
Linear(in_features=2, out_features=4, bias=True)
Tanh()
Linear(in_features=4, out_features=1, bias=True)

模型的保存和读取

1)将参数和模型保存在一起
torch.save(seq_net, 'save_seq_net.pth')
net_ = torch.load('save_seq_net.pth')
#读取
net_.layer1.weight
》
Parameter containing:
tensor([[-0.0118, -1.9755],
        [ 1.3713,  1.3738],
        [ 1.4056, -1.5453],
        [-0.0824, -0.0446]], requires_grad=True)
2)只保存参数
torch.save(seq_net.state_dict,'save_seq_net.pth')
seq_net2 = nn.Sequential(
    nn.Linear(2, 4),
    nn.Tanh(),
    nn.Linear(4, 1)
)

seq_net2.load_state_dict(torch.load('save_seq_net_params.pth'))

一般推荐使用第二种，可移植性更强

参数初始化
Xavier初始化法
来源文献：http://proceedings.mlr.press/v9/glorot10a.html
公式：
$\omega =： Uniform(-\frac{\sqrt{6}}{\sqrt{n_{j}+n_{j+1}}},\frac{\sqrt{6}}{\sqrt{n_{j}+n_{j+1}}})$
$n_{j}$ 、 $n_{j+1}$ 分别是该层的输入和输出数目。

for layer in mod_net.modules():
    if isinstance(layer, nn.Linear):
        nn.init.xavier_uniform_(layer.weight(), gain=nn.init.calculate_gain('relu'))
    
#一般初始化方法：
for layer in net2.modules():
    if isinstance(layer, nn.Linear):
        param_shape = layer.weight.shape
        layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape)) 
#gain用于设置初始化参数的标准差来匹配特定的激活函数

batch_size
batch_size越大梯度具有越高的随机性，batch_size越小梯度越稳定

基于梯度的优化算法
1）torch.optim.SGD()
2)torch.optim.SGD(momentum=0.9)：相当于每次在进行参数更新的时候，都会将之前的速度考虑进来，每个参数在各方向上的移动幅度不仅取决于当前的梯度，还取决于过去各个梯度在各个方向上是否一致，如果一个梯度一直沿着当前方向进行更新，那么每次更新的幅度就越来越大，如果一个梯度在一个方向上不断变化，那么其更新幅度就会被衰减，这样我们就可以使用一个较大的学习率，使得收敛更快，同时梯度比较大的方向就会因为动量的关系每次更新的幅度减少
3)torch.optim.Adagrad()：改进动量法和随机梯度下降对任何参数都使用相同的学习率的情况，但随着梯度平方的不断累加，学习率会越来越小，导致收敛乏力
4)torch.optim.RMSprop()：前面我们提到了 Adagrad 算法有一个问题，就是学习率分母上的变量 s 不断被累加增大，最后会导致学习率除以一个比较大的数之后变得非常小，这不利于我们找到最后的最优解，所以 RMSProp 的提出就是为了解决这个问题。
5)torch.optim.Adadelta():Adadelta 算是 Adagrad 法的延伸，它跟 RMSProp 一样，都是为了解决 Adagrad 中学习率不断减小的问题，RMSProp 是通过移动加权平均的方式，而 Adadelta 也是一种方法，有趣的是，它并不需要学习率这个参数。
6)torch.optim.Adam():Adam 是一个结合了动量法和 RMSProp 的优化算法，其结合了两者的优点。
几种梯度下降算法的详细介绍