一个简单的线性回归模型（梯度下降

路过的阿烨

已于 2024-02-25 23:00:06 修改

阅读量375

点赞数 10

文章标签：线性回归算法回归

于 2024-02-25 22:50:46 首次发布

本文链接：https://blog.csdn.net/qq_62176884/article/details/136284216

版权

本文详细介绍了在AI推理中使用梯度下降算法处理大规模数据集的过程，包括全量梯度下降和小批量梯度下降的方法，以及如何通过随机化和动态调整学习率来优化训练效率。

摘要由CSDN通过智能技术生成

现在大部分的AI推理都是用梯度下降，因为它适用于大规模数据集和高维特征空间，每次迭代只需要计算梯度，而不需要整个数据集，因此在处理大规模数据时更具有效性。而之前写的解析解如果数据量大，解析起来会十分耗时。

一.全量梯度下降

import numpy as np

#np.random.seed(1)
x = np.random.rand(100, 1)
y = 4 + 3 * x + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), x]  

learning_rate = 0.001
n_iterations = 10000
theta = np.random.randn(2, 1)  

for _ in range(n_iterations):
    gradients = X_b.T.dot(X_b.dot(theta) - y) 
    theta = theta - learning_rate * gradients

print(theta)

1.导入库

import numpy as np

numpy用于数据计算

2.生成数据

#np.random.seed(1)
x = np.random.rand(100, 1)
y = 4 + 3 * x + np.random.randn(100, 1)

np.random.seed(1)：设置随机种子，测试的时候使用，生成相同的数据。

x = np.random.rand(100, 1)：生成100个服从均匀分布的随机数，构成x。

y = 4 + 3 * x + np.random.randn(100, 1)：根据x构建随机的y。

3.构建设计矩阵

X_b = np.c_[np.ones((100, 1)), x]

在自变量 x 的前面添加一列全为 1 的列，以便进行截距项的处理。

4.设置超参数

learning_rate = 0.001
n_iterations = 10000

learning_rate = 0.001：学习率

n_iterations = 10000：迭代次数

5.初始化参数

theta = np.random.randn(2, 1)

随机初始化参数向量 theta。包含了截距项和斜率项。

6.梯度下降

for _ in range(n_iterations):
    gradients = X_b.T.dot(X_b.dot(theta) - y) 
    theta = theta - learning_rate * gradients

梯度下降的专门的公式具体推导就不说了

迭代次数为n

gradients = X_b.T.dot(X_b.dot(theta) - y)：计算梯度。X_b.dot(theta) 是预测值

X_b.dot(theta) - y 是预测值与真实值之间的误差

通过矩阵运算得到的是形状为 (100, 1) 的列向量，然后对其进行转置，再与设计矩阵 X_b 相乘，得到形状为 (2, 1) 的梯度向量。

theta = theta - learning_rate * gradients：将当前参数向量 theta 沿着负梯度方向移动一小步，用来更新参数。

回到for循环

7.打印输出

二.小批次梯度下降


import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X] 

learning_rate = 0.001
n_epochs = 1000  
m = 100
batch_size = 10
num_batches = int(m / batch_size)
theta = np.random.randn(2, 1)

for epoch in range(n_epochs):
for _ in range(num_batches):
random_index = np.random.randint(m)
x_batch = X_b[random_index: random_index + batch_size]
y_batch = y[random_index: random_index + batch_size]
gradients = x_batch.T.dot(x_batch.dot(theta) - y_batch)
theta = theta - learning_rate * gradients

print(theta)

1.导入库

2.生成数据

3.构建设计矩阵

前面没啥不同略过

4.设置超参数

learning_rate = 0.001
n_epochs = 1000  
m = 100
batch_size = 10
num_batches = int(m / batch_size)
theta = np.random.randn(2, 1)

注意这里的

batch_size = 10

num_batches = int(m / batch_size)：数据集分成多个批次。

5.初始化参数

6.迭代训练

for epoch in range(n_epochs):
for _ in range(num_batches):
random_index = np.random.randint(m)
x_batch = X_b[random_index: random_index + batch_size]
y_batch = y[random_index: random_index + batch_size]
gradients = x_batch.T.dot(x_batch.dot(theta) - y_batch)
theta = theta - learning_rate * gradients

用了两次循环

外层循环控制整个训练过程的迭代次数。

内层循环遍历数据集中的每个批次。

在每个批次中，随机选择一个索引 random_index，然后从数据集中取出 batch_size 个样本。

计算当前批次的梯度 gradients，并根据梯度下降算法更新参数 theta。（公式）

相比于全量梯度下降，它每次只利用一部分样本来更新参数，因此更适合处理大规模数据集。

7.优化

for epoch in range(n_epochs):
    arr = np.arange(len(X_b))  
    np.random.shuffle(arr)
    X_b = X_b[arr]  
    y = y[arr]  
    for i in range(num_batches):
        x_batch = X_b[i * batch_size: (i + 1) * batch_size] 
        y_batch = y[i * batch_size: (i + 1) * batch_size]  
        gradients = x_batch.T.dot(x_batch.dot(theta) - y_batch)
        theta = theta - learning_rate * gradients

和原来的批量迭代训练相比减少了有数据反复取不到的情况引入了随机性

使用 arr = np.arange(len(X_b)) 生成一个包含所有索引的数组，之后 np.random.shuffle(arr) 随机打乱这个数组。

接着使用打乱后的索引数组，按照顺序切分成多个批次，每个批次包含 batch_size 个样本。

8.随着迭代动态调整学习率

import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]

t0, t1 = 5, 500

def learning_rate_schedule(t):
    return t0 / (t + t1)  

learning_rate = 0.001
n_epochs = 1000
m = 100
batch_size = 10
num_batches = int(m / batch_size)
theta = np.random.randn(2, 1)

for epoch in range(n_epochs):
    arr = np.arange(len(X_b))  
    np.random.shuffle(arr)
    X_b = X_b[arr]
    y = y[arr]
    for i in range(num_batches):
        x_batch = X_b[i * batch_size: (i + 1) * batch_size]
        y_batch = y[i * batch_size: (i + 1) * batch_size]
        gradients = x_batch.T.dot(x_batch.dot(theta) - y_batch)
        learning_rate = learning_rate_schedule(epoch * m + i)  
        theta = theta - learning_rate * gradients

print(theta)

def learning_rate_schedule(t):
    return t0 / (t + t1)

可以使在训练初期采用较大的学习率，有助于更快地收敛，而在训练后期逐渐减小学习率。

路过的阿烨

关注

10
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
一个简单的线性回归模型（梯度下降

现在大部分的AI推理都是用梯度下降，因为它适用于大规模数据集和高维特征空间，每次迭代只需要计算梯度，而不需要整个数据集，因此在处理大规模数据时更具有效性。而之前写的解析解如果数据量大，解析起来会十分耗时。
复制链接

扫一扫