梯度下降算法，gradient descent algorithm

Advsance

已于 2024-07-27 20:14:01 修改

阅读量209

点赞数 2

文章标签：算法

于 2024-07-27 18:15:13 首次发布

本文链接：https://blog.csdn.net/Advsance/article/details/140738551

版权

定义：是一个优化算法，也成最速下降算法，主要的部的士通过迭代找到目标函数的最小值，或者收敛到最小值。
说人话就是求一个函数的极值点，极大值或者极小值

算法过程中有几个超参数：
学习率n，又称每次走的步长, n会影响获得最优解的速度，取值不合适的时候可能达不到最优解
阈值 threshold，当两步之间的差值

求解步骤

给定初始点x，阈值和学习率
计算函数在该点的导数
根据梯度下降公式得到下一个x点：x=x-学习率*导数
计算更新前后两点函数值的差值
如果差值小于阈值则找到极值点，否则重复2-5步

例如用梯度下降算法计算下列函数的极值点 $y = (x-2.5)^2 -1$
构造数据

import numpy as np
import matplotlib.pyplot as  plt
plot_x = np.linspace(-1, 6, 141)
plot_y = (plot_x - 2.5) ** 2 - 1
plt.plot(plot_x, plot_y)


def J(theta):  #原始函数
  return ((theta - 2.5)**2 - 1)

def dJ(theta): #导数
  return 2*(theta - 2.5)

def gradient_descent(xs, x, eta, espilon):
  theta = x
  xs.append(x)
  while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    xs.append(theta)

    if (abs(J(theta) - J(last_theta)) < espilon):
      break

eta = 0.0001 #每次前进的 x
xs = []
espilon = 1e-8
gradient_descent(xs, 1, eta, espilon)


plt.plot(plot_x, J(plot_x))
plt.plot(np.array(xs), J(np.array(xs)), color="r", marker="+")
print(xs[-1])

2.495000939618705
请添加图片描述

起点我们也可以从另一端开始
例如5

eta = 0.0001 #每次前进的 x
xs = []
espilon = 1e-8
gradient_descent(xs, 5, eta, espilon)


plt.plot(plot_x, J(plot_x))
plt.plot(np.array(xs), J(np.array(xs)), color="r", marker="+")
print(xs[-1])

请添加图片描述

计算的极值点 $y = -(x-2.5)^2 -1$

def J(theta):  #原始函数
  return -((theta - 2.5)**2 - 1)

def dJ(theta): #导数
  return -2*(theta - 2.5)

def gradient_descent(xs, x, eta, espilon):
  theta = x
  xs.append(x)
  while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta + eta * gradient
    xs.append(theta)

    if (abs(J(theta) - J(last_theta)) < espilon):
      break

eta = 0.0001 #每次前进的 x
xs = []
espilon = 1e-8
gradient_descent(xs, 1, eta, espilon)


plt.plot(plot_x, J(plot_x))
plt.plot(np.array(xs), J(np.array(xs)), color="r", marker="+")
print(xs[-1])

请添加图片描述

使用梯度下降算法计算最简单的线性模型

假设有两组数据

x = np.array([55, 71, 68, 87, 101, 87, 75, 78, 93, 73])
y = np.array([91, 101, 87, 109, 129, 98, 95, 101, 104, 93])

线性模型的损失函数如下：

$\sum_{n=1}^n (y_i - (w_0 + w_i x_i))^2$

其中 w0 和 w1 是我们要求的值，他们代表了线性方程中的两个系数

分别对w0 和 w1求偏导数

$\frac{\partial f}{\partial w_0} = -2\sum_{n=1}^n(y_i-(w_0+w_ix_i))$

$\frac{\partial f}{\partial w_1} = -2\sum_{n=1}^nx_i(y_i-(w_0+w_ix_i))$

注意区分w1 多了一个xi

参照公式 x=x-学习率*导数
得到

w0_gradient = -2 * sum((y - y_hat))
w1_gradient = -2 * sum(x * (y - y_hat))

def ols_gradient_descent(x, y, lr, num_iter):
  '''
  x 自变量
  y 因变量
  num_iter -- 迭代次数
  
  返回:
  w1 -- 线性方程系数
  w0 -- 线性方程的截距
  '''

  w1 = 0
  w0 = 0
  
  for i in range(num_iter):
    y_hat = (w1 * x) + w0
    w0_gradient = -2 * sum((y - y_hat))
    w1_gradient = -2 * sum(x * (y - y_hat))
    w1 -= lr * w1_gradient
    w0 -= lr * w0_gradient
  
  return w1, w0

x = np.array([55, 71, 68, 87, 101, 87, 75, 78, 93, 73])
y = np.array([91, 101, 87, 109, 129, 98, 95, 101, 104, 93])

lr = 0.00001 # 迭代步长
num_iter = 500 #迭代次数
w1, w0 = ols_gradient_descent(x, y, lr=0.00001, num_iter=500)

print(w1, w0)
xs = np.array([50, 100])
ys = xs * w1 + w0

plt.plot(xs, ys, color = "r")
plt.scatter(x, y)