机器学习经典算法笔记——梯度下降算法

最新推荐文章于 2024-03-30 10:12:16 发布

天下第一小白

最新推荐文章于 2024-03-30 10:12:16 发布

阅读量293

点赞数

分类专栏：机器学习笔记文章标签：梯度下降算法机器学习经典算法

原文链接：https://www.jianshu.com/p/c7e642877b0e

版权

机器学习笔记专栏收录该内容

25 篇文章 1 订阅

订阅专栏

本文推荐一个更好的版本理解梯度下降，下面的代码选自本文

这大概是我见过最好理解的一个版本

梯度下降的场景假设
梯度
梯度下降算法的数学解释
梯度下降算法的实例
梯度下降算法的实现
Further reading

代价函数：
$J(\theta) = \frac{1}{2m} (X\theta-\overrightarrow{y})^T(X\theta-\overrightarrow{y})$
求导可得：
$\nabla J(\theta) = \frac{1}{m} X^T(X\theta-\overrightarrow{y})$
首先，我们需要定义数据集和学习率

import numpy as np

# Size of the points dataset.
m = 20

# Points x-coordinate and dummy value (x0, x1).
X0 = np.ones((m, 1))
X1 = np.arange(1, m+1).reshape(m, 1)
X = np.hstack((X0, X1))

# Points y-coordinate
y = np.array([
    3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
    11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)

# The Learning Rate alpha.
alpha = 0.01

接下来我们以矩阵向量的形式定义代价函数和代价函数的梯度

def error_function(theta, X, y):
    '''Error function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./2*m) * np.dot(np.transpose(diff), diff)

def gradient_function(theta, X, y):
    '''Gradient of the function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./m) * np.dot(np.transpose(X), diff)

最后就是算法的核心部分，梯度下降迭代计算

def gradient_descent(X, y, alpha):
    '''Perform gradient descent.'''
    theta = np.array([1, 1]).reshape(2, 1)
    gradient = gradient_function(theta, X, y)
    while not np.all(np.absolute(gradient) <= 1e-5):
        theta = theta - alpha * gradient
        gradient = gradient_function(theta, X, y)
    return theta

当梯度小于1e-5时，说明已经进入了比较平滑的状态，类似于山谷的状态，这时候再继续迭代效果也不大了，所以这个时候可以退出循环！

完整的代码如下


import numpy as np

# Size of the points dataset.
m = 20

# Points x-coordinate and dummy value (x0, x1).
X0 = np.ones((m, 1))
X1 = np.arange(1, m+1).reshape(m, 1)
X = np.hstack((X0, X1))

# Points y-coordinate
y = np.array([
    3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
    11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)

# The Learning Rate alpha.
alpha = 0.01

def error_function(theta, X, y):
    '''Error function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./2*m) * np.dot(np.transpose(diff), diff)

def gradient_function(theta, X, y):
    '''Gradient of the function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./m) * np.dot(np.transpose(X), diff)

def gradient_descent(X, y, alpha):
    '''Perform gradient descent.'''
    theta = np.array([1, 1]).reshape(2, 1)
    gradient = gradient_function(theta, X, y)
    while not np.all(np.absolute(gradient) <= 1e-5):
        theta = theta - alpha * gradient
        gradient = gradient_function(theta, X, y)
    return theta

optimal = gradient_descent(X, y, alpha)
print('optimal:', optimal)
print('error function:', error_function(optimal, X, y)[0,0])

scikit-learn中的随机梯度下降算法

这里需要说明的是，随机梯度下降算法没有对整个数据集进行训练，只是随机选取若干个数据集

# 随机梯度下降算法
from sklearn.linear_model import SGDRegressor  
# 这里需要说明的是，它只能解决线性模型，它在线性模型的包里
sgd_reg = SGDRegressor(n_inter = ??)  # 这里传入迭代次数
sgd_reg.fit(X_train_standard, y_train)  # 这里需要进行标准化
sgd_reg.score(X_test_stardard, y_test)