周报-240705-CSDN博客

本文链接：https://blog.csdn.net/qq_66248905/article/details/140086139

学习内容

1、Numpy的广播机制

2、随机梯度下降算法

3、正态分布与平方损失

4、线性回归的从零开始实现

5、softmax回归

学习时间

2024.6.27~2024.7.5

学习笔记

Numpy的广播机制

目的：解决不同形状的矩阵间的运算

机制规则：

将两个数组的维度大小右对齐，然后比较对应维度上的数值,

如果数值相等或其中有一个为1或者为空，则能进行广播运算，

输出的维度大小为取数值大的数值。否则不能进行数组运算。

例子：

随机梯度下降算法

目的：寻找一组参数，这组参数能最小化在所有训练样本上的总损失。

以下仅讨论损失函数与b的关系。

当b不向下移动时，即损失函数导数为0，即最小值点

损失函数是针对某一样本的，当我们对所有样本用损失函数计算得到一个平均的值，该函数我们叫做成本函数（下面的g函数）

学习率：参数更新的步长（设置不当将影响函数收敛，反复震荡）

减少迭代次数和内存开销，先取小部分随机样本来收敛函数，即小批量随机梯度下降（minibatch stochastic gradient descent）

（防止在最低点附近反复震荡，为了让梯度下降的更平滑，保留上次下降的方向与后一次合成，即动量随机梯度下降算法）

如何自动调整学习率？

在训练之初地较大帮助快速收敛，在后面随着梯度下降而减小，防止震荡。

矢量化加速

略（用矢量代替for循环，效率更高）

正态分布与平方损失

可视化正态分布


import math  
import numpy as np  
import matplotlib.pyplot as plt  # 假设d2l.plot是基于matplotlib的封装  
  
# 假设的d2l.plot函数，这里我们使用matplotlib.pyplot作为替代  
def plot_normal(x, ys, xlabel, ylabel, figsize, legend):  
    plt.figure(figsize=figsize)  
    for y, l in zip(ys, legend):  
        plt.plot(x, y, label=l)  
    plt.xlabel(xlabel)  
    plt.ylabel(ylabel)  
    plt.legend()  
    plt.show()  
  
# 修正后的正态分布函数  
def normal(x, mu, sigma):  
    p = 1 / (math.sqrt(2 * math.pi) * sigma)  
    return p * np.exp(-0.5 * ((x - mu) / sigma)**2)  
  
# 生成x的数组  
x = np.arange(-7, 7, 0.01)  
  
# 均值和标准差对  
params = [(0, 1), (0, 2), (3, 1)]  
  
# 使用修正后的normal函数和plot_normal进行可视化  
ys = [normal(x, mu, sigma) for mu, sigma in params]  
plot_normal(x, ys, xlabel='x', ylabel='p(x)', figsize=(4.5, 2.5),  
            legend=[f'mean {mu}, std {sigma}' for mu, sigma in params])

改变均值会产生沿𝑥轴的偏移，增加方差将会分散分布、降低其峰值。

以下为用最大似然估计证明了最小二乘法是正确的：

假设了观测中包含噪声，其中噪声服从正态分布。噪声正态分布如下式:

其中，𝜖∼𝑁(0,𝜎2)。

线性回归的从零开始实现

生成合成数据集

我们的合成数据集是一个矩阵𝑋∈𝑅1000×2。（1000是样本数，2是维度）

import random  
import torch  
import matplotlib.pyplot as plt  # 导入matplotlib.pyplot  
  
def synthetic_data(w, b, num_examples):  
    """生成y=Xw+b+噪声"""  
    X = torch.normal(0, 1, (num_examples, len(w)))  
    y = torch.matmul(X, w) + b  
    y += torch.normal(0, 0.01, y.shape)  
    return X, y.reshape((-1, 1))  
  
true_w = torch.tensor([2, -3.4])  
true_b = 4.2  
features, labels = synthetic_data(true_w, true_b, 1000)  
  
print('features:', features[0], '\nlabel:', labels[0])  
  
# 设置图形大小（这里我们直接调用matplotlib的rcParams）  
plt.rcParams['figure.figsize'] = (6.0, 4.0)  
  
# 绘制散点图  
plt.scatter(features[:, 1].detach().numpy(), labels.detach().numpy(), s=10)  # 设置点的大小为10  
plt.xlabel('Feature')  # 添加x轴标签  
plt.ylabel('Label')    # 添加y轴标签  
plt.title('Synthetic Data Scatter Plot')  # 添加图形标题  
plt.show()  # 显示图形

注：

1、torch.normal(mean, std, *, size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) 是PyTorch中的一个函数，用于生成从指定均值（mean）和标准差（std）的正态（高斯）分布中抽取的随机数。
X = torch.normal(0, 1, (num_examples, len(w)))

在这个例子中，mean=0 和 std=1 表示生成的是标准正态分布（均值为0，标准差为1）的随机数。
size=(num_examples, len(w)) 指定了生成张量的形状。这里，num_examples 是样本数量，len(w) 是每个样本的特征数量（或者说权重向量w的长度）。因此，生成的张量X是一个二维数组，其中包含了num_examples行和len(w)列。

2、在reshape方法的参数中，-1有特殊的含义。它告诉NumPy自动计算该维度的大小，以便使总的元素数量保持不变。这在你不知道某一维度应该有多少元素，但知道其他维度的大小时非常有用。

结果图：

读取数据集

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)
    #。循环的范围是0到num_examples，步长为batch_size。
    #但是，由于range函数不会超出上限，所以使用min(i + batch_size, num_examples)来确保不会超出样本总数。
    for i in range(0, num_examples, batch_size):
        #在每次迭代中，生成当前批次的索引张量，然后利用这些索引从features和labels中索引出对应的批次数据和标签。
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        #每次迭代for循环时返回下一个批次的数据和标签，直到所有数据都被遍历完。
        yield features[batch_indices], labels[batch_indices]
#每个批次的样本数
batch_size = 10

for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    #只打印出第一个批次的数据和标签。
    break

初始化模型参数、定义模型、定义损失函数、定义优化算法、训练


import random  
import torch  
#上面有------------------------------------------------------------------------------------------
def synthetic_data(w, b, num_examples):  
    """生成y=Xw+b+噪声的数据"""  
    X = torch.normal(0, 1, (num_examples, len(w)))  
    y = torch.matmul(X, w) + b  
    y += torch.normal(0, 0.01, y.shape)  
    return X, y.reshape((-1, 1))  
#上面有
true_w = torch.tensor([2, -3.4])  
true_b = 4.2  
features, labels = synthetic_data(true_w, true_b, 1000)  

def data_iter(batch_size, features, labels):  
    num_examples = len(features)  
    indices = list(range(num_examples))  
     # 这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)  
    for i in range(0, num_examples, batch_size):  
        batch_indices = torch.tensor(  
            indices[i:min(i + batch_size, num_examples)], dtype=torch.long)  
        yield features[batch_indices], labels[batch_indices]  
#----------------------------------------------------------------------------------------------------------



#初始化模型参数
w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)  
b = torch.zeros(1, requires_grad=True)  
#定义模型
def linreg(X, w, b):  
    """线性回归模型"""  
    return torch.matmul(X, w) + b  
  
def squared_loss(y_hat, y):  
    """均方损失"""  
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2  
  
def sgd(params, lr, batch_size):  
    """小批量随机梯度下降"""  
    #上下文管理器，用于暂时禁用对返回的所有Tensor的梯度跟踪。
    #在这里，所有计算得到的Tensor都不会记录梯度信息，减少内存消耗并加速计算。
    with torch.no_grad():  
        for param in params:  
            param -= lr * param.grad / batch_size  
            #更新完参数后，清零梯度
            param.grad.zero_()  
  
lr = 0.03  
num_epochs = 3  
  
for epoch in range(num_epochs):  
    for X, y in data_iter(batch_size=10, features=features, labels=labels):  
        l = squared_loss(linreg(X, w, b), y)  
        l.sum().backward()  
        sgd([w, b], lr, len(X))  # 注意这里batch_size应替换为当前批次的大小len(X)  
    with torch.no_grad():  
        train_l = squared_loss(linreg(features, w, b), labels)  
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')  
  
print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}')  
print(f'b的估计误差: {true_b - b.item()}')