Gradient Descent梯度下降算法代码实现

最新推荐文章于 2024-07-08 11:43:01 发布

ZN_daydayup

最新推荐文章于 2024-07-08 11:43:01 发布

阅读量1.2k

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/zn961018/article/details/116333668

版权

摘要

使用代码实现和验证梯度下降算法

梯度下降法原理

梯度下降，gradient descent(之后将简称GD)，是一种通过迭代找最优的方式一步步找到损失函数最小值的算法，基本算法思路可总结为如下几点：

(1) 随机设置一个初始值

(2) 计算损失函数的梯度

(3) 设置步长，步长的长短将会决定梯度下降的速度和准确度

(4) 将初值减去步长乘以梯度，更新初值，然后将这一过程不断迭代

使用二次函数简单实现和验证梯度下降

import numpy as np
import matplotlib.pyplot as plt

plot_x = np.linspace(-1,6,141)
plot_y = (plot_x - 2.5)**2 - 1
plt.plot(plot_x,plot_y)
plt.show()

def dJ(theta):
    return 2*(theta - 2.5)

def J(theta):
    try:
        return (theta - 2.5)**2-1
    except:
        return float('inf')

实验结果：

theta= 0.5
函数值= 3.0
第0次梯度下降.....
theta= 0.9
函数值= 1.5600000000000005
第1次梯度下降.....
theta= 1.2200000000000002
函数值= 0.6383999999999994
第2次梯度下降.....
theta= 1.4760000000000002
函数值= 0.04857599999999951
第3次梯度下降.....
theta= 1.6808
函数值= -0.3289113600000001
......
第42次梯度下降.....
theta= 2.4998638870532313
函数值= -0.9999999814732657
第43次梯度下降.....
最终theta= 2.499891109642585
最终函数值= -0.99999998814289

画出梯度下降的过程

theta = 0.0
theta_history = [theta]
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    theta_history.append(theta)
    if(abs(J(theta) - J(last_theta)) < epsilon):
        break

plt.plot(plot_x,J(plot_x))
plt.plot(np.array(theta_history),J(np.array(theta_history)),color="r",marker="+")
plt.show()

学习率对梯度下降快慢的影响

当学习率eta = 0.001时的图像

initial_theta = 0
eta = 0.001
theta_history = []
gradient_descent(initial_theta,eta)
plot_theta_history()

当学习率eta = 0.8时的图像

initial_theta = 0
eta = 0.8
theta_history = []
gradient_descent(initial_theta,eta)
plot_theta_history()

当学习率eta = 1.1时的图像

initial_theta = 0
eta = 1.1
theta_history = []
gradient_descent(initial_theta,eta,n_iters=10)
plot_theta_history()

在线性回归模型中使用梯度下降法

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(666)
x = 2 * np.random.random(size=100)
y = x * 3. + 4. + np.random.normal(size=100)
X = x.reshape(-1, 1)
X.shape
plt.scatter(x, y)
plt.show()

使用梯度下降法训练

def J(theta, X_b, y):
    try:
        return np.sum((y - X_b.dot(theta))**2) / len(X_b)
    except:
        return float('inf')

def dJ(theta, X_b, y):
    res = np.empty(len(theta))
    res[0] = np.sum(X_b.dot(theta) - y)
    for i in range(1, len(theta)):
        res[i] = (X_b.dot(theta) - y).dot(X_b[:,i])
    return res * 2 / len(X_b)

def gradient_descent(X_b, y, initial_theta, eta, n_iters = 1e4, epsilon=1e-8):
    
    theta = initial_theta
    cur_iter = 0

    while cur_iter < n_iters:
        gradient = dJ(theta, X_b, y)
        last_theta = theta
        theta = theta - eta * gradient
        if(abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
            break
            
        cur_iter += 1

    return theta

X_b = np.hstack([np.ones((len(x), 1)), x.reshape(-1,1)])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01
theta = gradient_descent(X_b, y, initial_theta, eta)

实验结果：array([4.02145786, 3.00706277])

即截距 b = 4.02145786 ,斜率 a = 3.00706277 大致满足设置的函数 y = x * 3. + 4. + np.random.normal(size=100)