机器学习基础学习-模拟实现梯度下降法

最新推荐文章于 2022-05-22 22:05:18 发布

小夭。

最新推荐文章于 2022-05-22 22:05:18 发布

阅读量163

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/m0_47146037/article/details/120711665

版权

机器学习专栏收录该内容

25 篇文章 9 订阅

订阅专栏

本文详细讲解了梯度下降的概念，包括其在寻找局部最优解中的作用，并通过实例演示如何在二次函数中应用梯度下降法，讨论了学习率调整对收敛效果的影响。作者还展示了如何在代码中实现梯度下降算法，以及如何处理无限值和循环次数限制问题。

摘要由CSDN通过智能技术生成

同样，记录这个的起因是源于老师的作业，所以多有不成熟的地方，大佬绕路

概念说明

引用其他博客的一段说明梯度下降
首先来看看梯度下降的一个直观的解释。比如我们在一座大山上的某处位置，由于我们不知道怎么下山，于是决定走一步算一步，也就是在每走到一个位置的时候，求解当前位置的梯度，沿着梯度的负方向，也就是当前最陡峭的位置向下走一步，然后继续求解当前位置梯度，向这一步所在位置沿着最陡峭最易下山的位置走一步。这样一步步的走下去，一直走到觉得我们已经到了山脚。当然这样走下去，有可能我们不能走到山脚，而是到了某一个局部的山峰低处。

从上面的解释可以看出，梯度下降不一定能够找到全局的最优解，有可能是一个局部最优解。当然，如果损失函数是凸函数，梯度下降法得到的解就一定是全局最优解。

在这里插入图片描述

梯度下降法介绍

在这里插入图片描述
这里我们如果要最小化一个损失函数的话，我们应该采用梯度下降法。但是在上一篇博客当中，我们最小化一个损失函数是采用的解析解的方法，即直接求出了最小化这个函数对应的参数的数学解，但是实际上很多机器学习的模型是无法直接求出这样的数学解，所以这里我们采用一种基于搜索的最优化方法来找到相应的最优解，其中梯度下降法应该是最为常见的一个方法。
在这里插入图片描述

我们沿着其导数逐渐减小的方向进行

其实在一维函数中，我们可以直接用导数来理解（多维函数中需要对各个方向的分量分别进行求导，最后得到的方向就是梯度）

我们之前所遇到的是二次函数，显然是有唯一的极值点，所以上一篇博客写到，我们可以直接用解析法求得解析解，但并不是所有函数都有唯一极值点
在这里插入图片描述
高维平面情况如下图

如果我们采用梯度下降法，很明显我们现在曲率下降的情况，会找到第一个极小值点，但很明显，这一块只是一个局部情况，从全局来看，这并不是最小值点

为了解决这一问题，其实可以随机改变初始点的位置，从而能找到更好的解
在这里插入图片描述
但是对目前老师布置的作业而言，我们在线性回归问题当中考虑梯度下降的话，其损失函数是具有唯一的最优解的。

代码实现

下面我相当于我自己定义了一个损失二次函数，然后针对这个二次函数去找局部最优解
代码部分

import numpy as np
import matplotlib.pyplot as plt

# 损失函数
plot_x = np.linspace(-1, 6, 141) 
plot_y = (plot_x - 2.5) ** 2 - 1 # 先自己定义了一个二次函数

# 绘制
plt.plot(plot_x, plot_y)
plt.show()

# 对损失函数求导
def dEwb(theta):
    return 2 * (theta -2.5)

# 损失函数
def E_function(theta):
    return (theta - 2.5) ** 2 - 1

theta = 0.0
α = 0.1 # 设置阿尔法
epsilon = 1e-8 # 差值精度，确定最优解点
history_theta = [ theta ] # 记录theta值用于之后的对比
# 找梯度最小的情况
while True:
    gradient = dEwb(theta) # 求精度
    last_theta = theta # theta重新赋值前，记录上一场的值
    theta = theta - α * gradient # 通过一定的α取得下一个点的theta
    history_theta.append(theta) # 更新history_theta
    # 最近两点的损失函数差值小于一定精度，退出循环
    if(abs(E_function(theta) - E_function(last_theta)) < epsilon):
        break

plt.plot(plot_x, E_function(plot_x))
plt.plot(np.array(history_theta), E_function(np.array(history_theta)), color = 'r', marker = '+')
plt.show()

运行结果
首先是绘制出定义的二次函数
在这里插入图片描述
下降过程用红色的加号标注出来

改变学习率

这里为了改变学习率，首先将上述的代码的梯度下降过程，和绘制过程的代码进行封装

import numpy as np
import matplotlib.pyplot as plt

# 损失函数
plot_x = np.linspace(-1, 6, 141) 
plot_y = (plot_x - 2.5) ** 2 - 1 # 先自己定义了一个二次函数

# 绘制
plt.plot(plot_x, plot_y)
plt.show()

# 对损失函数求导
def dEwb(theta):
    return 2 * (theta -2.5)

# 损失函数
def E_function(theta):
    return (theta - 2.5) ** 2 - 1

# 找梯度最小的情况,梯度下降过程
def gradient_descent(initial_theta, α, epsilon = 1e-8):
  theta = initial_theta
  history_theta.append(theta)

  while True:
      gradient = dEwb(theta) # 求精度
      last_theta = theta # theta重新赋值前，记录上一场的值
      theta = theta - α * gradient # 通过一定的α取得下一个点的theta
      history_theta.append(theta) # 更新history_theta
      # 最近两点的损失函数差值小于一定精度，退出循环
      if(abs(E_function(theta) - E_function(last_theta)) < epsilon):
          break

def plot_theta_history():
  plt.plot(plot_x, E_function(plot_x))
  plt.plot(np.array(history_theta), E_function(np.array(history_theta)), color = 'r', marker = '+')
  plt.show()

# 改变参数，进行调用
α = 0.1 # 设置阿尔法
 # 差值精度，确定最优解点
history_theta = [] # 记录theta值用于之后的对比
gradient_descent(0., α)
plot_theta_history()

当我们改变学习率α后（现在是0.1，假设改为0.01）
在这里插入图片描述
可以看到我们红色加号的点变的更加密集，也就是说学习率降低了。

如果接下来增大学习率α（0.1变为0.8）
在这里插入图片描述
这种情况下，虽然变量theta从左半边跳到了右半边，但是损失函数的值还是在减小，最后也能找到我们的最优解的情况（那是不是theta不超过某个限度的情况下，只要损失函数在减小，最后也能找到最优解的地方呢？）

当我们继续增大学习率的时候，python出现了报错，结果过大
在这里插入图片描述
我们可以首先在求损失函数的方法中捕获异常

# 损失函数
def E_function(theta):
  try:
    return (theta - 2.5) ** 2 - 1
  except:
    return float('inf') # 返回浮点数的最大值

但是这样一来也仅仅是不报错了，上面的循环是采用的while true，所以这里其实造成了一个死循环（无法通过abs(E_function(theta) - E_function(last_theta)) < epsilon判断来结束循环，当无穷减去无穷的时候，这里定义的其实是NAN），所以在这里我们也要限制循环次数

def gradient_descent(initial_theta, α, n_iters = 1e8, epsilon = 1e-8):
  theta = initial_theta
  history_theta.append(theta)
  i_iters = 0

  while i_iters < n_iters:
      gradient = dEwb(theta) # 求精度
      last_theta = theta # theta重新赋值前，记录上一场的值
      theta = theta - α * gradient # 通过一定的α取得下一个点的theta
      history_theta.append(theta) # 更新history_theta
      # 最近两点的损失函数差值小于一定精度，退出循环
      if(abs(E_function(theta) - E_function(last_theta)) < epsilon):
          break
      i_iters += 1

限制后出来的结果如下图所示
在这里插入图片描述
显然这种情况下的学习率太大了

下面是整体的代码

import numpy as np
import matplotlib.pyplot as plt

# 损失函数
plot_x = np.linspace(-1, 6, 141) 
plot_y = (plot_x - 2.5) ** 2 - 1 # 先自己定义了一个二次函数

# 绘制
plt.plot(plot_x, plot_y)
plt.show()

# 对损失函数求导
def dEwb(theta):
    return 2 * (theta -2.5)

# 损失函数
def E_function(theta):
  try:
    return (theta - 2.5) ** 2 - 1
  except:
    return float('inf') # 返回浮点数的最大值

# 找梯度最小的情况,梯度下降过程
# n_iters:循环次数,不传值的时候默认限制1w次
def gradient_descent(initial_theta, α, n_iters = 1e8, epsilon = 1e-8):
  theta = initial_theta
  history_theta.append(theta)
  i_iters = 0

  while i_iters < n_iters:
      gradient = dEwb(theta) # 求精度
      last_theta = theta # theta重新赋值前，记录上一场的值
      theta = theta - α * gradient # 通过一定的α取得下一个点的theta
      history_theta.append(theta) # 更新history_theta
      # 最近两点的损失函数差值小于一定精度，退出循环
      if(abs(E_function(theta) - E_function(last_theta)) < epsilon):
          break
      i_iters += 1
def plot_theta_history():
  plt.plot(plot_x, E_function(plot_x))
  plt.plot(np.array(history_theta), E_function(np.array(history_theta)), color = 'r', marker = '+')
  plt.show()

# 改变参数，进行调用
α = 1.1 # 设置阿尔法
 # 差值精度，确定最优解点
history_theta = [] # 记录theta值用于之后的对比
gradient_descent(0, α, n_iters = 10)
plot_theta_history()