梯度下降法模拟（二）

最新推荐文章于 2021-11-13 16:45:45 发布

_卷心菜_

最新推荐文章于 2021-11-13 16:45:45 发布

阅读量175

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/Thumb_/article/details/110469720

版权

机器学习专栏收录该内容

29 篇文章 4 订阅

订阅专栏

代码模拟

首先绘制曲线

plot_x = np.linspace(-1,6,141) # 将 -1 至 6 均分为 141 个点
plot_y = (plot_x - 2.5) ** 2 - 1
plt.plot(plot_x,plot_y)   # 绘制曲线
plt.show()

在这里插入图片描述
接下来，首先定义函数 DJ 和 J：

def dJ(theta):   
    return 2 * (theta - 2.5)  #对 plot_y 求导
def J(theta):
    return (theta - 2.5) ** 2 - 1

梯度下降法过程：

eta = 0.1   # 公式中的 n
epsilon = 1e-8   # 定义一个数，如果这一次的结果与上一次的差值不到这个数，就认为达到了极值
theta = 0.0
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient   # 其实这个式子不理解
    
    if(abs(J(theta) - J(last_theta)) < epsilon):  # 若前后两次差值小于定义的 epsilon，则达到极值
        break
print(theta)
print(J(theta))

结果为

2.499891109642585
-0.99999998814289

即，在 theta = 2.499891109642585 时，J 取极小值 -0.99999998814289

绘制随着 theta 从 0 开始直到找到极值的变化曲线

theta = 0.0
theta_history = [theta]   # 定义一个数组，用来放每一次的 theta 值
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient   # 其实这个式子不理解
    theta_history.append(theta)   # 把每一次的 theta 添加进来
    if(abs(J(theta) - J(last_theta)) < epsilon):  # 若前后两次差值小于定义的 epsilon，则达到极值
        break

plt.plot(plot_x,J(plot_x))
plt.plot(np.array(theta_history),J(np.array(theta_history)),color='r',marker='+')
plt.show()

绘制出的曲线为
在这里插入图片描述
可观察到刚开始变化比较大，这是因为刚开始梯度比较大，乘以一个 eta ,得到的结果也比较大。

由 theta_history 的长度为 46 可知，经过了46次取值找到了极值。

len(theta_history)
46

将梯度下降法的过程和绘制梯度变化曲线封装起来

def gradient_descent(initial_theta,eta,epsilon=1e-8):
    theta = initial_theta
    theta_history.append(initial_theta)
    
    while True:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient  
        theta_history.append(theta)   
        if(abs(J(theta) - J(last_theta)) < epsilon):           break
            
def plot_theta_history():
    plt.plot(plot_x,J(plot_x))
    plt.plot(np.array(theta_history),J(np.array(theta_history)),color='r',marker='+')
    plt.show()

试着改变 eta 的值来观察结果，例如，将 eta 取 0.01

eta = 0.01
theta_history = []
gradient_descent(0.,eta)
plot_theta_history()

绘制出的曲线为
在这里插入图片描述

len(theta_history)
424

此时 theta_history 的长度为 424 ，由此可知，eta 值越小，梯度下降越小，查找的次数越多。

若 eta 取 0.8
在这里插入图片描述
可看到最终从两边来回变化找到了极值，所以说也不必非得从一边一点一点的下降。

但是，eta 的值也不能取太大，若取 1.1 ，梯度方向是不断向上走的，运行结果会报错，显示“Result too large”（值太大了），此时可使用以下方式解决此问题

def J(theta):
    try:          
        return (theta - 2.5) ** 2 -1.  # 若成功，返回此结果
    except:
        return float('inf')  #失败则返回浮点数的最大值

此时执行

eta = 1.1
theta_history = []
gradient_descent(0.,eta)

会进入一个死循环，因为 J(theta) - J(last_theta) 是无穷 - 无穷，会返回一个 nan ，还需在 gradient_descent 中添加一个参数 n_iters，表示设置的循环次数

def gradient_descent(initial_theta,eta,n_iters = 1e4,epsilon=1e-8):
    theta = initial_theta
    theta_history.append(initial_theta)
    i_iters = 0    # 初值设为0
    
    while i_iters < n_iters:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient  
        theta_history.append(theta)   
        if(abs(J(theta) - J(last_theta)) < epsilon):
            break
        
        i_iters += 1

再次执行

eta = 1.1
theta_history = []
gradient_descent(0.,eta)

结果为

len(theta_history)
10001

即，设置的10000加上初值，一共10001个值。
此时，最后一个值为

theta_history[-1]
nan

绘制 eta = 1.1 时的曲线情况，取 n_iters = 10
在这里插入图片描述
总结：eta 该取多少是和损失函数有关的，和某一点上的导数值是多少相关的，为保险起见，对于大多数函数，eta 取 0.01 是绝对可以实现的，一旦发生异常，可以绘制 theta_history 看看是怎样变化的。

_卷心菜_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
梯度下降法模拟（二）

代码模拟首先绘制曲线plot_x = np.linspace(-1,6,141) # 将 -1 至 6 均分为 141 个点plot_y = (plot_x - 2.5) ** 2 - 1plt.plot(plot_x,plot_y) # 绘制曲线plt.show()接下来，首先定义函数 DJ 和 J：def dJ(theta): return 2 * (theta - 2.5) #对 plot_y 求导def J(theta): return (theta -
复制链接

扫一扫

专栏目录