梯度下降法介绍及公式推导

简宏伟

已于 2024-12-18 15:13:20 修改

阅读量2.4k

点赞数 13

分类专栏：机器学习文章标签：机器学习人工智能

于 2024-12-18 14:18:04 首次发布

本文链接：https://blog.csdn.net/jackiejane/article/details/144558588

版权

机器学习专栏收录该内容

3 篇文章

订阅专栏

梯度下降法是一种用于求解无约束最优化问题的迭代算法，广泛应用于机器学习和人工智能领域。它的基本思想是沿着函数梯度的反方向更新变量，以逐步接近函数的最小值点。下面我们将详细推导梯度下降法的公式。

1. 问题定义

假设我们有一个可微的凸函数 $f:\mathbb{R}^n\to\mathbb{R}$ ，我们需要找一点个 $x^*\in\mathbb{R}$ ，使得 $f(x^*)$ 是 $f$ 的最小值。

2. 梯度的定义

函数 $f$ 在点 $x$ 的梯度是一个向量，记为 $\nabla f(x)$ ，其第 $i$ 个分量是 $f$ 关于 $x_i$ 的偏导数：
$\nabla f(x) = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right)^T$

3. 梯度下降法的基本思想

梯度下降法的基本思想是：在每一步迭代中，从当前点 $x$ 沿着梯度的反方向移动一步，以期望减少函数值。即：
$x_{k+1} = x_k - \alpha_k \nabla f(x_k)$
其中， $\alpha_k > 0$ 是第 $k$ 步的步长（也称为学习率），它控制了在梯度方向上移动的距离。

4. 步长的选择

步长 $\alpha_k$ 的选择对算法的收敛速度和效果有重要影响。常见的选择方法包括：

固定步长：在整个迭代过程中使用一个固定的步长 $\alpha$ 。
线搜索：在每一步迭代中，通过线搜索确定最优的步长 $\alpha_k$ ，使得 $f(x_k - \alpha_k \nabla f(x_k))$ 最小。
自适应步长：根据某种规则（如Armijo规则）动态调整步长。

5. 算法步骤

初始化：选择一个初始点 $x_0$ 和一个初始步长 $\alpha_0$ 。
迭代：对于 $\ldots$ ：
- 计算梯度 $\nabla f(x_k)$ 。
- 选择步长 $\alpha_k$ 。
- 更新点 $x_{k+1} = x_k - \alpha_k \nabla f(x_k)$ 。
终止条件：当满足某个终止条件时（如梯度的模长小于某个阈值，或者达到最大迭代次数），停止迭代。

6.示例

类似于最小二乘法公式推导这篇文章，以拟合直线 $f = w x + b$ 来举例，不同的是本文采用梯度下降法来解决这个问题。
在线性回归中，使用样本数据对参数 $w, b$ 进行拟合时，需要定义一个损失函数用来衡量预测值 $f(x_i)$ 与真实值 $y_i$ 的误差，而解决问题的方法就是使用梯度下降法来逐步缩小这个误差，损失函数如下：
$\mathcal{L} = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f(x_i^{}) - y_i)^2$
计算梯度，对于损失函数求偏导
$\begin{align} \frac{\partial \mathcal{L}}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f(x_i) - y_i)x_i \newline \frac{\partial \mathcal{L}}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f(x_i) - y_i) \\ \end{align}$
参数 $w, b$ 在梯度的反方向上更新，采用固定学习率 $\alpha$ :
$\begin{align*} \newline w &= w - \alpha \frac{\partial \mathcal{L}}{\partial w} \newline b &= b - \alpha \frac{\partial \mathcal{L}}{\partial b} \newline \end{align*}$
过程代码如下：

import numpy as np
import matplotlib.pyplot as plt

def compute_prediction(x, w, b):
    m = len(x)
    f_wb = np.zeros(m)
    for i in range(m):
        f_wb[i] = w * x[i] + b
    return f_wb

def compute_loss(x, y, w, b):
    m = x.shape[0]
    cost_sum = 0
    f_wb = compute_prediction(x, w, b)
    for i in range(m):
        cost = (f_wb[i] - y[i]) ** 2
        cost_sum += cost
    total = (cost_sum / (2 * m))
    return total

def compute_gradient(x, y, w, b):
    m = x.shape[0]
    dl_dw = 0
    dl_db = 0
    for i in range(m):
        f_wb = w * x[i] + b
        dl_dw_i = (f_wb - y[i])*x[i]
        dl_db_i = f_wb - y[i]
        dl_dw += dl_dw_i
        dl_db += dl_db_i
    dl_dw = dl_dw / m
    dl_db = dl_db / m
    return dl_dw, dl_db

def gradient_descent(x, y, w, b, alpha, num_iter):
    w_new = w
    b_new = b
    for i in range(num_iter):
        dl_dw, dl_db = compute_gradient(x, y, w_new, b_new)
        w_new = w_new - alpha * dl_dw
        b_new = b_new - alpha * dl_db
        loss = compue_loss(x, y, w_new, b_new)
        if i % 100 == 0:
        	print(f"iteration {i}, loss = {loss}, w_new = {w_new}, b_new = {b_new}")
    return w_new, b_new

if __name__ == '__main__':
    plt.rcParams['font.family'] = ['SimHei']
    x = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0])
    y = np.array([1.5, 2.6, 3.7, 4.1, 5.1, 6.2, 7.9])
    w = 0
    b = 0
    iterations = 3000
    alpha = 0.02
    w_new, b_new = gradient_descent(x, y, w, b, alpha, iterations)
    print(f"w_new = {w_new}, b_new = {b_new}")

    plt.scatter(x, y, marker='x', c='r', label='样本点')
    _y = w_new * x + b_new
    plt.plot(x, _y, c='b', label='拟合直线')

    plt.legend()
    plt.show()

执行过程
iteration 0, loss = 4.083270897959183, w_new = 0.4348571428571429, b_new = 0.08885714285714286
iteration 100, loss = 0.045664329045427686, w_new = 1.0284820572555182, b_new = 0.2950160966273745
iteration 200, loss = 0.04399190943292208, w_new = 1.0170929790163885, b_new = 0.35141414029427887
iteration 300, loss = 0.04321788689124178, w_new = 1.0093449229474887, b_new = 0.389782056936218
iteration 400, loss = 0.0428596568809325, w_new = 1.0040738754064713, b_new = 0.41588397345192607
iteration 500, loss = 0.04269386230341597, w_new = 1.0004879511523623, b_new = 0.4336412588461576
iteration 600, loss = 0.04261712992158636, w_new = 0.9980484260568011, b_new = 0.44572164322645447
iteration 700, loss = 0.042581616948186136, w_new = 0.9963888029665887, b_new = 0.4539399988779966
iteration 800, loss = 0.042565180976179944, w_new = 0.9952597517399471, b_new = 0.4595309939563808
iteration 900, loss = 0.042557574145494485, w_new = 0.9944916516334235, b_new = 0.46333458037827735
iteration 1000, loss = 0.04255405358260628, w_new = 0.9939691086701637, b_new = 0.4659221823106715
iteration 1100, loss = 0.042552424209801445, w_new = 0.9936136196575556, b_new = 0.4676825429329474
iteration 1200, loss = 0.042551670110010785, w_new = 0.9933717784177523, b_new = 0.4688801264876919
iteration 1300, loss = 0.04255132110056829, w_new = 0.9932072524178053, b_new = 0.4696948495350295
iteration 1400, loss = 0.04255115957341695, w_new = 0.9930953244245458, b_new = 0.4702491103556911
iteration 1500, loss = 0.04255108481606821, w_new = 0.9930191791607023, b_new = 0.4706261772108932
iteration 1600, loss = 0.0425510502171717, w_new = 0.9929673770954908, b_new = 0.47088269798232785
iteration 1700, loss = 0.042551034204249084, w_new = 0.9929321358496834, b_new = 0.471057210552896
iteration 1800, loss = 0.042551026793212446, w_new = 0.9929081610254951, b_new = 0.47117593246814665
iteration 1900, loss = 0.04255102336326615, w_new = 0.9928918508133643, b_new = 0.47125669967657685
iteration 2000, loss = 0.04255102177583213, w_new = 0.992880754881329, b_new = 0.4713116460775329
iteration 2100, loss = 0.04255102104114237, w_new = 0.9928732062543251, b_new = 0.4713490264332082
iteration 2200, loss = 0.04255102070111606, w_new = 0.9928680708795352, b_new = 0.4713744565062632
iteration 2300, loss = 0.04255102054374656, w_new = 0.9928645772542416, b_new = 0.47139175673264894
iteration 2400, loss = 0.0425510204709134, w_new = 0.992862200520665, b_new = 0.4714035261771313
iteration 2500, loss = 0.04255102043720508, w_new = 0.9928605836150096, b_new = 0.47141153299856486
iteration 2600, loss = 0.04255102042160434, w_new = 0.992859483624702, b_new = 0.47141698008568605
iteration 2700, loss = 0.042551020414384015, w_new = 0.9928587352949075, b_new = 0.47142068577068896
iteration 2800, loss = 0.04255102041104237, w_new = 0.9928582262018023, b_new = 0.47142320676971744
iteration 2900, loss = 0.0425510204094958, w_new = 0.9928578798628064, b_new = 0.47142492181999723
w_new = 0.9928576461815167, b_new = 0.4714260789959605
运行效果如下：