梯度下降底层实现

最新推荐文章于 2024-09-30 13:17:10 发布

争取不掉头发的我

最新推荐文章于 2024-09-30 13:17:10 发布

阅读量126

点赞数

分类专栏：底层实现文章标签：梯度下降机器学习

本文链接：https://blog.csdn.net/weixin_43698739/article/details/99870544

版权

底层实现专栏收录该内容

3 篇文章 0 订阅

订阅专栏

梯度下降

首先要了解什么是梯度下降：

在微积分里面，对多元函数的参数求∂偏导数，把求得的各个参数的偏导数以向量的形式写出来，就是梯度。比如函数f(x,y), 分别对x,y求偏导数，求得的梯度向量就是(∂f/∂x, ∂f/∂y)T,简称grad f(x,y)或者▽f(x,y)。对于在点(x0,y0)的具体梯度向量就是(∂f/∂x0, ∂f/∂y0)T.或者▽f(x0,y0)，如果是3个参数的向量梯度，就是(∂f/∂x, ∂f/∂y，∂f/∂z)T,以此类推。
　　　　
　　　　那么这个梯度向量求出来有什么意义呢？他的意义从几何意义上讲，就是函数变化增加最快的地方。具体来说，对于函数f(x,y),在点(x0,y0)，沿着梯度向量的方向就是(∂f/∂x0, ∂f/∂y0)T的方向是f(x,y)增加最快的地方。或者说，沿着梯度向量的方向，更加容易找到函数的最大值。反过来说，沿着梯度向量相反的方向，也就是 -(∂f/∂x0, ∂f/∂y0)T的方向，梯度减少最快，也就是更加容易找到函数的最小值。

简单点就是在这里插入图片描述
如图所示咱们的目标就是要使损失函数变小
所以上图经过一系列的变换就会得到如下的要求，实现这个

开始上代码：

简单的梯度下降

import numpy as np
import matplotlib.pyplot as plt

plot_x=np.linspace(-1,6,141)
plot_y=(plot_x-2.5)**2-1

plt.plot(plot_x,plot_y)
plt.show()

写出一个简单的二维线性函数

下面是函数的一个导数与y值

def dJ(theta):
    return 2*(theta-2.5)

def J(theta):
    return (theta-2.5)**2-1.

实现梯度的求导，其实就上面公式的实现

theta=0.0
#下降步长
eta=0.1
#判断是否为最小值
epsilon=1e-8

while True:
    gradient=dJ(theta)
    lastTheta=theta
    theta=theta-eta*gradient
    
    if(abs(J(theta)-J(lastTheta))<epsilon):
        break

print(theta)
print(J(theta))

看不懂就看这个图，其实就是求导，然后不停使你的导数变小，最后得到一个极值点，及最优解

theta=0.0
#学习率
eta=0.1
#判断是否为最小值
epsilon=1e-8
theta_history=[theta]


while True:
    gradient=dJ(theta)
    lastTheta=theta
    theta=theta-eta*gradient
    theta_history.append(theta)
    if(abs(J(theta)-J(lastTheta))<epsilon):
        break
        
plt.plot(plot_x,plot_y)
plt.plot(np.array(theta_history),J(np.array(theta_history) ),color='r',marker='+' )
plt.show()

在这里插入图片描述
到这里就一脚完成简单的梯度下降了，在运行下面的代码看看你就更清晰的了解梯度什么意思了

#封装上述函数
def gradient_descent_2(theta,eta,n_iters=1e4,epsilon= 1e-8):
    '''
    n_iters  最大循环次数
    '''
    theta_history=[theta]
    i_iters=0
    
    while i_iters<n_iters:
        gradient=dJ(theta)
        lastTheta=theta
        theta=theta-eta*gradient
        theta_history.append(theta)
        
        
        if(abs(J(theta)-J(lastTheta))<epsilon):
            break
        i_iters+=1
    plt.plot(plot_x,plot_y)
    plt.plot(np.array(theta_history),J(np.array(theta_history) ),color='r',marker='+' )
    plt.show()


eta=0.01
theta_history=[]
gradient_descent(theta=0.0,eta=eta)

eta=0.8
theta_history=[]
gradient_descent(theta=0.0,eta=eta)

上面的代码直观的使你认知了梯度

eta=1.1
theta_history=[]
gradient_descent_2(theta=0.0,eta=eta,n_iters=10)
#数值太大了，学习率对于你的梯度算法而言重要性不言而喻    
#这里 five 的我任务我们可以theta = 0.01  对于大多数的函数基本适用

现在简答的梯度你已经知道了
接下来，开始真正的梯度学习：

批量梯度下降法_多元线性回归

与上述理论相同但现在我们把维度扩展到了多维

创建一个案例

import numpy as np
import matplotlib.pyplot as plt

x = 2 * np.random.random(size=100)
y = x * 3. + 4. +np.random.normal(size=100)

X=x.reshape(-1,1)
X.shape
plt.scatter(x,y)
plt.show()

仔细回顾上面的公式！！！！！
变形：在这里插入图片描述
经过复杂的线性变换变成这个样子

def J(theta, x_b ,y):
    try:
        return np.sum((y - x_b.dot(theta))**2 )/len(x_b)
    except:
        return float('inf')
    

def DJ(theta,x_b,y):
    return x_b.T.dot(x_b.dot(theta)-y)*2./len(x_b)

代码实现

def gradient_descent_2(x_b,y,theta,eta,n_iters=1e4,epsilon= 1e-8):
    '''
    n_iters  最大循环次数
    '''
    i_iters=0
    
    while i_iters<n_iters:
        gradient=DJ(theta,x_b,y)
        lastTheta=theta
        theta=theta-eta*gradient
                
        if(abs(J(theta,x_b,y)-J(lastTheta,x_b,y))<epsilon):
            break
        i_iters+=1
    return theta

x_b=np.hstack([np.ones((len(X),1)),X])
initial_theta=np.zeros(x_b.shape[1])
eta=0.01

theta=gradient_descent_2(x_b,y,initial_theta,eta)
theta

这种模式的算法准确性高，但对应的算法花费开销大

争取不掉头发的我

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录