梯度下降算法_梯度下降算法及泰勒展开式

最新推荐文章于 2024-07-21 17:05:22 发布

weixin_39583521

最新推荐文章于 2024-07-21 17:05:22 发布

阅读量560

点赞数

文章标签：梯度下降算法

本文链接：https://blog.csdn.net/weixin_39583521/article/details/111391705

版权

本文介绍了梯度下降算法的实现，包括批量、随机和小批量梯度下降，并探讨了泰勒展开式在机器学习中的作用，特别是在函数逼近和梯度迭代中的应用。

摘要由CSDN通过智能技术生成

在上一篇《机器学习1》中提到梯度下降算法并列出了代数表达式，来看一下代数实现

下面我们把它放到python 3里面，转变成代码的形式去实现梯度下降。

import numpy as np

X=2*np.random.rand(100,1)#生成训练函数(特征部分)
y=4+3*X+np.random.randn(100,1)   #生成训练数据(标签部分)
plt.plot(X,y,'b.')  #画图
plt.xlabel('$x_1$',fontsize=18)
plt.ylabel('$y$',rotation=0,fontsize=18)
plt.axis([0,2,0,15])
save_fig('generated_data_plot')  #保存图片
plt.show()

1.批量梯度下降算法

在上面代码的基础上,我们还要添加新特征,创建测试数据。设置步长，数据集个数，迭代次数

def plot_gradient_descent(theta,eta,theta_path=None):
    m=len(X_b)
    plt.plot(X,y,'b.')
    n_iterations=1000     #迭代次数
    for iteration in range(n_iterations):
        if iteration <10:    #画线
            y_predict=X_new_b.dot(theta)
            style='b-'
            plt.plot(X_new,y_predict,style)
        gradients=2/m * X_b.T.dot(X_b.dot(theta)-y)
        theta = theta-eta * gradients
        if theta_path is not None:
            theta_path.append(theta)
    plt.xlabel('$x_1$',fontsize=18)
    plt.axis([0,2,0,15])
    plt.title(r'$eta={}$'.format(eta),fontsize=16)

采用三种步长逼近:

2.随机梯度下降算法

n_epochs=50

theta=np.random.randn(2,1)     #随机初始化

for epoch in range(n_epochs):
    for i in range(m):
        if epoch==0 and i<20:
            y_predict=X_new_b.dot(theta)
            style='b-'
            plt.plot(X_new,y_predict,style)
        random_index=np.random.randint(m)
        xi=X_b[random_index:random_index+1]
        yi=y[random_index:random_index+1]
        gradients=2* xi.T.dot(xi.dot(theta)-yi)
        eta=0.1
        theta=theta-eta*gradients
        theta_path_sgd.append(theta)
        
plt.plot(X,y,'b.')
plt.xlabel('$x_1$',fontsize=18)
plt.ylabel('$y$',rotation=0,fontsize=18)
plt.axis([0,2,0,15])
save_fig('sgd_plot')
plt.show()
theta

3.小批量梯度下降算法

theta_path_mgd=[]

n_iterations=50
minibatch_size=20

np.random.seed(42)
theta=np.random.randn(2,1)

for epoch in range(n_iterations):
    shuffled_indices=np.random.permutation(m)
    X_b_shuffled=X_b[shuffled_indices]
    y_shuffled=y[shuffled_indices]
    for i in range(0,m,minibatch_size):
        xi=X_b_shuffled[i:i+minibatch_size]
        yi=y_shuffled[i:i+minibatch_size]
        gradients=2/minibatch_size * xi.T.dot(xi.dot(theta)-yi)
        eta=0.1
        theta=theta-eta*gradients
        theta_path_mgd.append(theta)

泰勒公式(泰勒展开式)

是用一个函数在某点的信息，描述其附近取值的公式。如果函数足够平滑，在已知函数在某一点的各阶导数值的情况下，泰勒公式可以利用这些导数值来做系数，构建一个多项式近似函数，求得在这一点的邻域中的值。

所以泰勒公式是做什么用的？

简单来讲就是用一个多项式函数去逼近一个给定的函数(即尽量使多项式函数图像拟合给定的函数图像)，注意，逼近的时候一定是从函数图像上的某个点展开。如果一个非常复杂函数，想求其某点的值，直接求无法实现，这时候可以使用泰勒公式去近似的求该值，这是泰勒公式的应用之一。泰勒公式在机器学习中主要应用于梯度迭代。

1.一元泰勒展开式