torch.nn.MSELoss()及torch.optim.SGD的理解

最新推荐文章于 2024-05-10 21:49:34 发布

小饼干超人

最新推荐文章于 2024-05-10 21:49:34 发布

阅读量4.5k

点赞数 3

分类专栏： pytorch 文章标签： mes sgd

本文链接：https://blog.csdn.net/m0_37586991/article/details/88371251

版权

pytorch 专栏收录该内容

21 篇文章 5 订阅

订阅专栏

文章目录

一个简单的例子

import torch
import torch.nn as nn

x = torch.randn(10, 3)
y = torch.randn(10, 2)
# Build a fully connected layer.
linear = nn.Linear(3, 2)

# Build loss function and optimizer.
criterion = nn.MSELoss()

# 优化方法选用随机梯度下降，学习率为0.01
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print('loss:', loss.item())

# Backward pass.
loss.backward()
print('dL/dw: ', linear.weight.grad)
print('dL/db: ', linear.bias.grad)

# 1-step gradient descent.
optimizer.step()

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print(loss.item())

MSELoss的直观实现方法

# MSELoss()等同于:
def mseLoss(pred, y):
   return ((pred - y) ** 2).mean()

$MSE=\frac{1}{m}\sum_{i=1}^M(\hat{y_i}-y_i)^2$

SGD的直观实现方法

optimizer.step()
# optimizer.step()等同于：
linear.weight.data.sub_(0.01 * linear.weight.grad.data)
linear.bias.data.sub_(0.01 * linear.bias.grad.data)

其中，0.01是lr, sub_()方法是原地减，就像t_()方法是原地转置一样
w和b的初始值是随机选取的，然后按照
$w^1\leftarrow w^0-\eta\frac{dL}{dw}|_{w=w^0,b=b^0}\\$

$b^1\leftarrow b^0-\eta\frac{dL}{db}|_{w=w^0,b=b^0}$

参考

pytorch-tutorials

小饼干超人

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
torch.nn.MSELoss()及torch.optim.SGD的理解

文章目录一个简单的例子MSELoss的直观实现方法SGD的直观实现方法一个简单的例子import torchimport torch.nn as nnx = torch.randn(10, 3)y = torch.randn(10, 2)# Build a fully connected layer.linear = nn.Linear(3, 2)# Build loss fu...
复制链接

扫一扫