python线性回归从零开始实现

yhl1001

已于 2023-07-03 21:51:34 修改

阅读量185

点赞数 3

文章标签： python 线性回归机器学习

于 2023-07-02 23:40:28 首次发布

本文链接：https://blog.csdn.net/yhl1001/article/details/131506975

版权

一、线性回归原理简述

$y = w_{1}x_{1} + w_{2}x_{2} +\cdots + w_{n}x_{n}$

二、实现思路

1：如果有真是数据集，则加载真实数据集；如果没有真是数据集则生成人造数据集

2：构建神经网络

3：定义损失函数

4：构建优化器

5：进行训练

三、代码实现

1.导入所需的包

import torch
import random

2.构建人造数据集

生成人造数据集y=XW+b+e，其中e为服从N(0, 0.001)随机误差

def synthetic_data(w, b, examples_num):
    #生成人造数据集y=XW+b+e  
    #e为服从N(0, 0.001)随机误差
    x = torch.normal(0, 1, (examples_num, len(w)))       #随机生成服从N(0,1)， 形状为(examples_num, len(w)的tensor变量
    y = torch.matmul(x, w) + b                           #计算y（y=WX + b）y的x形状为（n，1）
    e = torch.normal(0, 0.001, y.shape)                  #生成随机误差e
    y = y + e                                            #生成最终的y
    return x, y.reshape(-1, 1)                           #返回x， y

true_w  = torch.tensor([2,-3.4])
b = torch.tensor([4.2])
examples_num = 1000
features, labels = synthetic_data(true_w, b, examples_num)

3.构造出每次进行迷你批SGD优化的样本

def data_iter(batch_size, features, labels):
    #构建一个迭代器，用来在进行梯度优化的时候取数据
    examples_num = len(labels)                    #样本的个数就等于y的元素个数
    indices = list(range(examples_num))           #构建一个列表用于索引
    random.shuffle(indices)                       #将索引顺序打乱，保证每次取样是从样本中随机抽取
    for i in range(0, examples_num, batch_size):      #构建一个循环用来取每次优化时的样本x， y
        batch_indices = indices[i:i+batch_size]   #获取索引，索引为从列表的第i+1个元素，到第i+batch_size+1个元素
        yield features[batch_indices], labels[batch_indices]    #获得每次抽取的features和labels

4.定义神经网络（即线性回归模型）

def linreg(x, w, b):                   #定义线性回归模型
    return torch.matmul(x, w) + b          #返回经过线性模型运算之后的输出

5.定义损失函数

def squard_loss(y_hat, y):                                        #定义损失函数
    return (y_hat-y.reshape(y_hat.shape))**2/2                    #返回损失函数(不求平均是因为不知道batchsize)

6.定义优化器

def sgd(params, lr, batch_size):
    with torch.no_grad():                                       #不计算梯度（作用是节约内存，不加这个也是对的）
        for param in params:                                    #分别更新w和b
            param -= lr * param.grad / batch_size               #梯度下降算法
            param.grad.zero_()                                   #避免梯度累加

7.进行训练

lr = 0.03
epochs = 3
net = linreg
loss = squard_loss
batch_size = 10                                                  

for epoch in range(epochs):                                       #进行epochs次迭代
    for x, y in data_iter(batch_size, features, labels):               
        y_hat = net(x, w, b)                               
        all_loss = loss(y, y_hat)
        all_loss.sum().backward()
        sgd((w, b), lr, batch_size)                               #训练神经网络
        
    with torch.no_grad():
        train_l  = loss(net(features, w, b), labels)              #计算在全体数据上的损失
        print(f'epoch{epoch + 1},loss {float(train_l.mean()):f}') #测试神经网络