机器学习入门(二)——梯度下降法

最新推荐文章于 2023-06-11 22:26:56 发布

程研板

最新推荐文章于 2023-06-11 22:26:56 发布

阅读量459

点赞数

分类专栏：机器学习文章标签：机器学习 python

本文链接：https://blog.csdn.net/qq_38258720/article/details/108628794

版权

机器学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

一.梯度下降法概述

在这里插入图片描述
η为固定值，dJ/dθ为导数，导数为负，说明向右递减，需要向右寻找导数为0的点，则需要前面加个负号，代表正方向，导数负的越大，则需要向右移动更长的距离，负的越小，则移动的越短，直到趋近于0，则找到了局部内最小的点。

在这里插入图片描述

//损失函数
def J(theta):
    return (theta-2.5)**2-1

//损失函数的导数
def dJ(theta):
    return (theta-2.5)*2

eta = 0.1
epsilon = 1e-8
theta = 0.0
theta_history = [theta]           //记录增加一个步长后theta的值
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    theta_history.append(theta)
    if(abs(J(theta)-J(last_theta))<epsilon):
        break       
plt.plot(plot_x,J(plot_x))
plt.plot(np.array(theta_history),J(np.array(theta_history)),color='r',marker='+')
plt.show()

在这里插入图片描述

二.多元线性回归中使用梯度下降法

在这里插入图片描

在这里插入图片描述

def fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4):
    """根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""
    assert X_train.shape[0] == y_train.shape[0], \
        "the size of X_train must be equal to the size of y_train"

    def J(theta, X_b, y):
        try:
            return np.sum((y - X_b.dot(theta)) ** 2) / len(y)
        except:
            return float('inf')

    def dJ(theta, X_b, y):
        # res = np.empty(len(theta))
        # res[0] = np.sum(X_b.dot(theta) - y)
        # for i in range(1, len(theta)):
        #     res[i] = (X_b.dot(theta) - y).dot(X_b[:, i])
        # return res * 2 / len(X_b)
        return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(X_b)        //见下图，可进行向量化运算

    def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):

        theta = initial_theta
        cur_iter = 0

        while cur_iter < n_iters:
            gradient = dJ(theta, X_b, y)
            last_theta = theta
            theta = theta - eta * gradient
            if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
                break

            cur_iter += 1

        return theta

    X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
    initial_theta = np.zeros(X_b.shape[1])
    self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)

    self.intercept_ = self._theta[0]
    self.coef_ = self._theta[1:]

    return self

在这里插入图片描述

三.梯度下降法的优势

维数增大时，正规方程处理的矩阵耗时多，

样本数小于特征数，要让每一个样本都参与计算，这使得计算比较慢，有一个改进的方案即随机梯度下降法

四.随机梯度下降法

在这里插入图片描述随着m越大，梯度下降法的计算时间也很长，可以随机选择m中的一个进行计算。

在这里插入图片描述学习频率随着学习次数逐渐减少。

随机用了样本的三分之一，时间肯定要快

def dJ_sgd(theta, X_b_i, y_i):
    return 2 * X_b_i.T.dot(X_b_i.dot(theta) - y_i)
 
def sgd(X_b, y, initial_theta, n_iters):
 
    t0, t1 = 5, 50
    def learning_rate(t):
        return t0 / (t + t1)
 
    theta = initial_theta
    for cur_iter in range(n_iters):
        rand_i = np.random.randint(len(X_b))
        gradient = dJ_sgd(theta, X_b[rand_i], y[rand_i])
        theta = theta - learning_rate(cur_iter) * gradient
 
    return theta

n_iters=len(X_b)//3

优化后的随机梯度下降法( 参数n_iters为轮次，多轮次计算能够提高精确度)：

def fit_sgd(self, X_train, y_train, n_iters=5, t0=5, t1=50):
    """根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""
    assert X_train.shape[0] == y_train.shape[0], \
        "the size of X_train must be equal to the size of y_train"
    assert n_iters >= 1

    def dJ_sgd(theta, X_b_i, y_i):
        return X_b_i * (X_b_i.dot(theta) - y_i) * 2.

    def sgd(X_b, y, initial_theta, n_iters, t0=5, t1=50):

        def learning_rate(t):
            return t0 / (t + t1)

        theta = initial_theta
        m = len(X_b)

        for cur_iter in range(n_iters):
            indexes = np.random.permutation(m)
            X_b_new = X_b[indexes]
            y_new = y[indexes]
            for i in range(m):
                gradient = dJ_sgd(theta, X_b_new[i], y_new[i])
                theta = theta - learning_rate(cur_iter * m + i) * gradient

        return theta

    X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
    initial_theta = np.random.randn(X_b.shape[1])
    self._theta = sgd(X_b, y_train, initial_theta, n_iters, t0, t1)

    self.intercept_ = self._theta[0]
    self.coef_ = self._theta[1:]

    return self

五.如何确定梯度计算的准确性调试梯度下降法

在这里插入图片描述

def dJ_debug(theta, X_b, y, epsilon=0.01):
    res = np.empty(len(theta))
    for i in range(len(theta)):
        theta_1 = theta.copy()
        theta_1[i] += epsilon
        theta_2 = theta.copy()
        theta_2[i] -= epsilon
        res[i] = (J(theta_1, X_b, y) - J(theta_2, X_b, y)) / (2 * epsilon)
    return res

dJ-debug有作用但速度很慢

dJ-debug的算法与J无法，其它函数都可以用，dJ-math只适用于这一个问题

六.sklearn中的梯度下降的方法

线性回归讲了3种方法
1.正规方程

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_train,y_train)
lin_reg.coef_                                    //系数的值
lin_reg.intercept_                               //截距
lin_reg.score(X_test,y_test)

2.随机梯度下降

from sklearn.linear_model import SGDRegression

程研板

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

机器学习入门(二)——梯度下降法

目录

一.梯度下降法概述

二.多元线性回归中使用梯度下降法

三.梯度下降法的优势

四.随机梯度下降法

五.如何确定梯度计算的准确性 调试梯度下降法

六.sklearn中的梯度下降的方法

五.如何确定梯度计算的准确性调试梯度下降法