Pytorch学习：Task2-3 梯度计算和梯度下降过程、PyTorch全连接层原理和使用

最新推荐文章于 2023-09-24 21:07:11 发布

奶油松果

最新推荐文章于 2023-09-24 21:07:11 发布

阅读量1.5k

点赞数 1

分类专栏： Pytorch学习文章标签： python pytorch

本文链接：https://blog.csdn.net/qq_36930921/article/details/121441634

版权

Pytorch学习专栏收录该内容

13 篇文章 0 订阅

订阅专栏

Pytorch学习：Task2-3 梯度计算和梯度下降过程、PyTorch全连接层原理和使用

1. 学习自动求梯度
- 学习梯度下降原理
2. 全连接层学习

1. 学习自动求梯度

torch.autograd学习 torch内置计算梯度工具torch.autograd
backwards 反向传播：参数（模型权重）根据给定参数的损益函数的梯度进行调整。

# input:x  parameters:w,b
import torch
x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5,3,requires_grad=True)  # 参数需要计算损失函数梯度 因此设置 required_grad = True
b = torch.randn(3,requires_grad=True)
z = torch.matmul(x,w) + b  # z = w*x + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y) 

# 可以通过打印查看backeards函数
print('Gradient function for z =', z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

# 为了优化神经网络中参数的权重，需要根据参数计算损失梯度αloss/αw和αloss/αb
# we called loss.backward() w.grad/b.grad
loss.backward(retain_graph=True)
print(w.grad)
print(b.grad)

'''
Note:
1)我们只能获得图叶节点的属性，requires_grad = True;其他节点属性无法获得
2) 只能在给定图形上进行一次梯度计算backward，如果需要在同一张图表上多次backward calls，需要retain_graph=True
'''

# 如果无需反向计算 可以设置torch.no_grad()模块
z = torch.matmul(x,w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x,w) + b
print(z.requires_grad)


# 另一种方法是使用detach()
z = torch.matmul(x,w) + b
z_det = z.detach()
print(z_det.requires_grad)

那么什么情况下你会不需要使用backward呢也就是disable gradient tracking【禁止梯度跟踪】

在你的神经网络中有一些参数是frozen parameters[冻结参数]
为了加快计算速度

'''
自动累计梯度，因此需要自己将梯度清零
'''
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)

学习梯度下降原理

学习链接

1.线性回归 linear regression

h(x) = xita0 + xita1 * x1 + xita2 * x2 +…

2.损失函数 loss function

j = 0.5 * (h(x) - y)^2 最小二乘法求解最优xita 使损失函数j最小

3. 最小均方法 Least mean square

4. 梯度下降 – 随机梯度下降SGD

小练习

使用numpy创建一个y=10*x+4+noise(0,1)的数据，其中x是0到100的范围，以0.01进行等差数列
使用pytroch定义w和b，并使用随机梯度下降，完成回归拟合。

import torch
import torch.nn as nn
import numpy as np
x = np.arange(0,100,0.1)
y = 10*x + 4 +np.random.rand(1000)
x = torch.from_numpy(x)
y = torch.from_numpy(y)

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.w = torch.nn.Parameter(torch.tensor([7.]))
        self.b = torch.nn.Parameter(torch.tensor([6.]))
    def forward(self,x):
        return self.w * x+self.b

net = Net()
optimizer = torch.optim.SGD(net.parameters(),lr=0.0005)
loss_func = torch.nn.MSELoss()
for step in range(1000):
    output = net(x)
    loss = loss_func(output,y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if step%10 == 0:
        print(loss)

2. 全连接层学习

步骤一：全连接层原理学习

学习链接

1、全连接层的推导：

全连接层的每一个结点都和上一层的所有结点相连，用来把前边提取的特征综合起来

2、全连接层的前向计算：线性加权求和的过程。

每一个输出可看作前一层的每一个结点乘以一个权重系数W，最后在加上一个偏置值b

小例子

具体的网络
在这里插入图片描述
矩阵形式

3、全连接层的反向传播

在这里插入图片描述
需要对W和b进行更新，还需要向前传递梯度，因此，需要计算三个偏导数：
1）对上一层的输出求导
2）对权重系数W求导
3）对偏执系数b求导

全连接层的意义
连接层实际就是卷积核大小为上层特征大小的卷积运算，卷积后的结果为一个节点，就对应全连接层的一个点。
例如最后一个卷积层的输出为77512，连接此卷积层的全连接层为114096
1）共有4096组滤波器
2）每个滤波器有512个卷积核
3）每个卷积核的大小为77
4）则输出为11*4096

步骤2：在pytorch中使用矩阵乘法实现全连接层

# 在pytorch中使用矩阵乘法实现全连接层
import torch.nn as nn
import torch
class FC(nn.Module):
    def __init__(self,input_dim,output_dim):
        super(FC,self).__init__()
        self.w = torch.nn.Parameter(torch.randn(input_dim,output_dim))
        self.b = torch.nn.Parameter(torch.randn(output_dim))
        
    def forward(self,x):
        print("self.w  ",self.w)
        print("self.w.t()  ",self.w.t())
        y = torch.matmul(self.w.t(),x) + self.b  # torch.t()  求矩阵的转置的函数
        return y
linear = FC(3,2)
x = torch.ones(3)
y = linear(x)
print(y)
print(y.shape)

步骤3：在pytorch中使用nn.Linear层

参考链接：

用法和参数

CLASS torch.nn.Linear(in_features,out_features,bias=True)
in_features输入的二维张量的大小，输入的[batch_size,size]中的size
out_features输出的二维张量的大小，输出的二维张量的形状为[batch_size,output_size] ，也代表了该全连接层神经元个数。

相当于输入一个为[batch_size,in_features]的张量，输出一个[batch_size,output_features]大小的张量。

例子：

# 学习使用nn.Linear层
import torch as t
import torch.nn as nn
'''
PyTorch的nn.Linear()是用于设置网络中的全连接层的，
需要注意在二维图像处理的任务中，全连接层的输入与输出一般都设置为二维张量，
形状通常为[batch_size, size]，不同于卷积层要求输入输出是四维张量。
'''

# in_features由输入张量形状决定 , out_features决定了输出张量的形状
connected_layer = nn.Linear(in_features=64*64*3,out_features=1)
# 假定输入的图像形状为[64,64,3]
input = t.randn(1,64,64,3)
print(input)
print(input.shape)
# 将四维张量转换为二维张量之后，才能作为全连接层的输入
input = input.view(1,64*64*3)
print(input)
print(input.shape)
output = connected_layer(input)  # 调用全连接层
print(output)
print(output.shape)