第六章梯度下降法学习笔记上_slam中使用梯度下降-CSDN博客

本文链接：https://blog.csdn.net/moonlightpeng/article/details/106546774

6-1 什么是梯度下降法

6-2 模拟实现梯度下降法

6-3 线性回归中的梯度下降法

6-1 什么是梯度下降法

加负号则J减小

图中导数为负值，则J增加的方向在x轴的负方向，也就是theta减小的，我们希望找到最小值，所以应该向减小的方向移动则加负号，其会向减小的方向移动

移动的步长乘以yita

不等于0，继续求导向导数的反方向移动

对于多维函数中，要对各个方向求导，则就是梯度

上图中也一样，导数为正，则表示J增加的方向，加负号表示向减小的方向，其公式不变

梯度下降法像在模拟一个小球在坡上滚动

如果yita太大，则其跳动太大，损失函数没有下降反而增大了

并不是所有的函数都有唯一的极值点

有可能找到局部最优解，其和初始点的选择有关

这一小节不需要重复多次找合理的点

6-2 模拟实现梯度下降法

def J(theta):
    return (theta-2.5)**2 - 1.

def dJ(theta):
    return 2*(theta-2.5)

theta = 0.0
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    
    if(abs(J(theta) - J(last_theta)) < epsilon):
        break
    
print(theta)
print(J(theta))

theta = 0.0
theta_history = [theta]
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    theta_history.append(theta)
    
    if(abs(J(theta) - J(last_theta)) < epsilon):
        break

plt.plot(plot_x, J(plot_x))
plt.plot(np.array(theta_history), J(np.array(theta_history)), color="r", marker='+')
plt.show()

theta_history = []

def gradient_descent(initial_theta, eta, epsilon=1e-8):
    theta = initial_theta
    theta_history.append(initial_theta)

    while True:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient
        theta_history.append(theta)
    
        if(abs(J(theta) - J(last_theta)) < epsilon):
            break
            
def plot_theta_history():
    plt.plot(plot_x, J(plot_x))
    plt.plot(np.array(theta_history), J(np.array(theta_history)), color="r", marker='+')
    plt.show()

学习率太大，引入异常处理的方法

但这样程序会进入死循环，所以要改进引入迭代次数

def gradient_descent(initial_theta, eta, n_iters = 1e4, epsilon=1e-8):
    
    theta = initial_theta
    i_iter = 0
    theta_history.append(initial_theta)

    while i_iter < n_iters:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient
        theta_history.append(theta)
    
        if(abs(J(theta) - J(last_theta)) < epsilon):
            break
            
        i_iter += 1
        
    return