08线性回归+基础优化算法

yonuyeung

已于 2023-01-15 02:10:40 修改

阅读量892

点赞数

分类专栏：动手学深度学习文章标签：人工智能

于 2023-01-06 16:57:31 首次发布

本文链接：https://blog.csdn.net/qq_59414507/article/details/128581434

版权

动手学深度学习专栏收录该内容

5 篇文章 1 订阅

订阅专栏

P2基础优化算法

1.最常见的优化算法——梯度下降，用在模型没有显示解的情况下（线性回归有显示解，但是现实中很少有这样理想的情况）

2.梯度下降的实现方法：沿着反梯度更新方向参数求解

解释：
超参数：需要人为指定的值，而不是通过训练得到的参数值
反梯度方向：从外到内
步长：比如W0到W1的距离

学习率：

3.梯度下降的常见版本——小批量随机梯度下降
做法：采取b个样本来近似损失
批量大小：b，同样也是一个超参数，不能太大也不能太小

总结：1.梯度下降通过不断沿着反梯度方向更新参数求解
2.小批量随机梯度下降是深度学习默认的求解算法
3.重要的两个超参数：批量大小，学习率

P3线性回归的从零开始实现

#3.2线性回归的从零开始实现

import random
import torch
from d2l import torch as d2l
#import d2l

疑问1：为什么不能直接 “import d2l” 而是要用 “from d2l import torch as d2l” 这个指令？这个指令又是什么意思？

"from xx import yy" 表示从xx这个包中引用yy这个类，"import xx as yy" 表示引用xx这个包，但是我把它叫作yy，常用语包的名称过长时候简化程序时用。

1.生成数据集

#3.2.1 生成数据集
def synthetic_data(w,b,num_examples):
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1,1))

true_w = torch.tensor([2,-3.4])
true_b = 4.2
features, labels = synthetic_data(true_w,true_b,1000)

print('features:', features[0],'\nlabel:', labels[0])
#绘图函数
d2l.set_figsize()
d2l.plt.scatter(features[:, 1].detach().numpy(), labels.detach().numpy(), 1);
d2l.plt.show()

1.torch.normal( )，torch.matmul( )

①torch.normal(0, 1, (num_examples, len(w)))

意思为生成一个均值为0，方差为1的随机数

输出的形式：行数为num_examples的大小，列数为w的长度

详细解释：torch.normal函数用法

②torch.matmul(X, w)

意思为两个向量相乘

详细解释：【Pytorch】torch. matmul()

2.reshape((-1,1))

意思为输出的形状的列数固定为1，行数要计算机自动计算得来

详细解释：Python的reshape的用法：reshape(1,-1)

3.Pycharm上面编程的话，想要看到图像记得加上d2l.plt.show()

有很多类似，直接在anaconda prompt中编程有输出结果，而在pycharm中没有输出结果的问题，要记得在pycharm中添加输出结果的指令。

#3.2.2 读取数据集
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]
#读取并打印
batch_size = 10

for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break

#3.2.3 初始化模型参数
w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#3.2.4 定义模型
def linreg(X, w, b):  #@save
    """线性回归模型"""
    return torch.matmul(X, w) + b

#3.2.5 定义损失函数
def squared_loss(y_hat, y):  #@save
    """均方损失"""
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

#3.2.6 定义优化算法
def sgd(params, lr, batch_size):  #@save
    """小批量随机梯度下降"""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

#3.2.7 训练
lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # X和y的小批量损失
        # 因为l形状是(batch_size,1)，而不是一个标量。l中的所有元素被加到一起，
        # 并以此计算关于[w,b]的梯度
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')
#输出误差估计
print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {true_b - b}')