随机梯度下降法-CSDN博客

本文链接：https://blog.csdn.net/Bonjour_h/article/details/116794738

批量梯度下降法（Batch Gradient Descent）在这里插入图片描述该方法每一次计算时都要将样本中的所有信息批量计算
，当m（样本量）非常大的时候计算是非常耗时的

批量梯度下降法¶
import numpy as np
import matplotlib.pyplot as plt

m = 100000


x = np.random.normal(size=m)
X = x.reshape(-1,1)
y = 4.*x + 3. + np.random.normal(0,3,size=m)

def J(theta,X_b,y):
    try:
        return np.sum((y - X_b.dot(theta)) ** 2) / len(y)
    except:
        return float('inf')

def dJ(theta,X_b,y):
    return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(y)

def gradient_descent(X_b,y,initial_theta,eta,n_iters=1e4,epsilon=1e-8):
    theta = initial_theta
    cur_iter = 0
    while cur_iter < n_iters:
        gradient = dJ(theta,X_b,y)
        last_theta = theta
        theta = theta - eta * gradient
        if(abs(J(theta,X_b,y) - J(last_theta,X_b,y)) < epsilon ):
            break      

        cur_iter += 1
    return theta

%%time
X_b = np.hstack([np.ones((len(X),1)),X])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01
theta = gradient_descent(X_b,y,initial_theta,eta)
输出：Wall time: 2.11 s

theta
输出：
array([2.99768835, 3.97898632])

随机梯度下降法（Stochastic Gradient Descent）
在这里插入图片描述
学习率 $\eta$ 值随着梯度下降算法循环次数的增加相应的变得越来越小。

当循环次数从1变成2的时候 $\eta$ 值一下就下降了50%，而循环次数从10000变成10001的时候 $\eta$ 值的变化并不明显，前后下降的差别太大了，所以选择给分母加上一个常数。
当分母固定为1的情况下，有时候达不到固定的效果，所以给分子也设定为一个常数，所以 $\eta$ 的值变为：
在这里插入图片描述

随机梯度下降法¶

def dJ_sgd(theta,X_b_i,y_i):
    return X_b_i.T.dot(X_b_i.dot(theta) - y_i) * 2
def sgd(X_b,y,initial_theta,n_iters):#随机梯度下降法
    t0 = 5
    t1 = 50
    
    def learning_rate(t):
        return t0 / (t + t1)
    
    theta = initial_theta
    for cur_iter in range(n_iters):
        rand_i = np.random.randint(len(X_b))#随机一个样本对应的索引
        gradient = dJ_sgd(theta,X_b[rand_i],y[rand_i])
        theta = theta - learning_rate(cur_iter) * gradient
    return theta


%%time
X_b = np.hstack([np.ones((len(x),1)),X])
initial_theta = np.zeros(X_b.shape[1])#初始的theta值全为0
theta = sgd(X_b,y,initial_theta,n_iters = len(X_b)//3)#循环次数仅为长度的1/3，意味着只检测了三分之一的样本
输出：Wall time: 456 ms#所花时间比批量梯度法要少

theta
输出：
array([2.99452091, 4.04013433])