随机梯度下降法

最新推荐文章于 2024-07-31 14:47:45 发布

_卷心菜_

最新推荐文章于 2024-07-31 14:47:45 发布

阅读量418

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/Thumb_/article/details/110739939

版权

机器学习专栏收录该内容

29 篇文章 4 订阅

订阅专栏

批量梯度下降法 Batch Gradient Descent

对于上式，任意取 i 的一个值，则有下式：
在这里插入图片描述
随机梯度下降的方式可用下图来表示

可看出，它不能保证每次得到的方向都是梯度减小的方向，更不能保证每次都是减小速度最快的方向。这样，学习率 eta 的取值就变得非常重要。我们模拟退火的思想，取
在这里插入图片描述

代码实现

先使用之前的方法实现

import numpy as np
import matplotlib.pyplot as plt

m = 100000

x = np.random.normal(size=m)
X = x.reshape(-1,1)
y = 4. * x + 3. + np.random.normal(0,3,size=m)

def J(theta,X_b,y):
    try:
        return np.sum((y - X_b.dot(theta)) ** 2) / len(y)
    except:
        return float('inf')

def dJ(theta,X_b,y):
    return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(X_b)

def gradient_descent(X_b,y,initial_theta,eta,n_iters = 1e4,epsilon=1e-8):
    theta = initial_theta
    cur_iter = 0    # 初值设为0

    while cur_iter < n_iters:
        gradient = dJ(theta,X_b,y)
        last_theta = theta
        theta = theta - eta * gradient
        if(abs(J(theta,X_b,y) - J(last_theta,X_b,y)) < epsilon):
            break

        cur_iter += 1

    return theta

%%time
X_b = np.hstack([np.ones((len(X),1)),X])   # 构造 X_b 矩阵
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01
theta = gradient_descent(X_b,y,initial_theta,eta)
print(theta)

可得结果

Wall time: 3 s
array([2.98848846, 3.99278167])

即，数据量为 m 时，计算时间为 3 s。

对同样的数据使用随机梯度下降法，修改相应的函数：

def dJ_sgd(theta,X_b_i,y_i):
    return X_b_i.T.dot(X_b_i.dot(theta) - y_i) * 2.

def sgd(X_b,y,initial_theta,n_iters):
    
    t0 = 5
    t1 = 50
    
    def learning_rate(t):
        return t0 / (t + t1)
    
    theta = initial_theta
    for cur_iter in range(n_iters):
        rand_i = np.random.randint(len(X_b))
        grandient = dJ_sgd(theta,X_b[rand_i],y[rand_i])
        theta = theta - learning_rate(cur_iter) * grandient
        
    return  theta

%%time
X_b = np.hstack([np.ones((len(X),1)),X])   # 构造 X_b 矩阵
initial_theta = np.zeros(X_b.shape[1])
theta = sgd(X_b,y,initial_theta,n_iters=len(X_b//3))   # // 代表取整
print(theta)

可得结果

Wall time: 1.12 s
array([2.99102922, 3.96841072])

可看到，使用随机梯度下降法，在 n_iters=len(X_b//3) 即，只使用 1/3 数据量的情况下，斜率和截距可达到比较好的结果，计算速度也只有 1.12 s。

_卷心菜_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录