python 梯度下降_Python解释的闭合形式和梯度下降回归

python 梯度下降

机器学习编程 (Machine learning, Programming)

介绍 (Introduction)

Regression is a kind of supervised learning algorithm within machine learning. It is an approach to model the relationship between the dependent variable (or target, responses), y, and explanatory variables (or inputs, predictors), X. Its objective is to predict a quantity of the target variable, for example; predicting the stock price, which differs from classification problem, where we want to predict the label of target, for example; predicting the direction of stock (up or down).

回归是机器学习中的一种监督学习算法。 这是一种对因变量(或目标,响应) y和解释变量(或输入,预测变量) X之间的关系建模的方法。 例如,其目的是预测目标变量的数量。 预测股票价格,这与分类问题不同,例如,我们要预测目标标签; 预测库存的方向(向上或向下)。

Moreover, a regression can be used to answer whether and how several variables are related, or influence each other, for example; determine if and to what extent the work experience or age impacts salaries.

而且,例如,可以使用回归来回答几个变量是否相关以及如何相互影响,或者相互影响。 确定工作经验或年龄是否以及在多大程度上影响工资。

In this article, I will focus mainly on linear regression and its approaches.

在本文中,我将主要关注线性回归及其方法。

线性回归的不同方法 (Different approaches to Linear Regression)

OLS (Ordinary least squares) goal is to find the best-fitting line (hyperplane) that minimizes the vertical offsets, which can be mean squared error (MSE) or other error metrics (MAE, RMSE) between the target variable and the predicted output.

OLS(普通最小二乘法)的目标是找到使垂直偏移最小的最佳拟合线(超平面),垂直偏移可以是目标变量和预测输出之间的均方误差(MSE)或其他误差度量(MAE,RMSE) 。

We can implement a linear regression model using the following approaches:

我们可以使用以下方法实现线性回归模型:

  1. Solving model parameters (closed-form equations)

    求解模型参数(闭式方程)
  2. Using optimization algorithm (gradient descent, stochastic gradient, etc.)

    使用优化算法(梯度下降,随机梯度等)

Please note that OLS regression estimates are the best linear unbiased estimator (BLUE, in short). Regression in other forms, the parameter estimates may be biased, for example; ridge regression is sometimes used to reduce the variance of estimates when there is collinearity in the data. However, the discussion of bias and variance is not in the scope of this article (please refer to this great article related to bias and variance).

请注意,OLS回归估计是最佳线性无偏估计器 (简称BLUE)。 例如,以其他形式回归时,参数估计可能有偏差; 当数据存在共线性时,脊回归有时用于减少估计的方差。 但是,对偏差和方差的讨论不在本文的讨论范围内(请参阅这篇有关偏差和方差的出色文章 )。

闭式方程 (Closed-form equation)

Let’s assume we have inputs of X size n and a target variable, we can write the following equation to represent the linear regression model.

假设我们有X个大小为n的输入和一个目标变量,我们可以编写以下方程式来表示线性回归模型。

Image for post
Simple form of linear regression (where i = 1, 2, …, n)
线性回归的简单形式(其中i = 1、2,…,n)

The equation is assumed we have the intercept X0 = 1. There is also a model without intercept, where B0 = 0, but this is based on some hypothesis that it will always undergo through the origin (there’s a lot of discussion on this topic which you can read more here and here).

假设该方程式我们有截距X0 =1。还有一个没有截距的模型,其中B0 = 0,但这是基于某些假设,即它将始终通过原点进行(关于该主题有很多讨论,您可以在此处此处阅读更多内容)。

From the equation above, we can compute the regression parameters based on the below computation.

从上面的方程式,我们可以根据下面的计算来计算回归参数。

Image for post
Matrix formulation of the multiple regression model
多元回归模型的矩阵表述

Now, let’s implement this in Python, there are three ways we can do this; manual matrix multiplication, statsmodels library, and sklearn library.

现在,让我们在Python中实现此功能,可以通过三种方式实现此功能: 手动矩阵乘法, statsmodels库和sklearn库。

Image for post
The model weights
模型权重
Image for post
Example of single linear model, one input (left) and the model prediction (right)
单个线性模型,一个输入(左)和模型预测(右)的示例

You can see that all three solutions give out the same results, we can then use the output to write the model equation (Y =0.7914715+1.38594198X).

您可以看到所有三个解决方案都给出了相同的结果,然后我们可以使用输出编写模型方程式(Y = 0.7914715 + 1.38594198X )。

This approach offers a better solution for smaller data, easy, and quick explainable model.

这种方法为较小的数据,简单且快速的可解释模型提供了更好的解决方案。

梯度下降 (Gradient Descent)

Why we need gradient descent if the closed-form equation can solve the regression problem. There will be some situations which are;

如果封闭形式的方程可以解决回归问题,为什么我们需要梯度下降。 会有一些情况;

  • There is no closed-form solution for most nonlinear regression problems.

    对于大多数非线性回归问题,没有封闭形式的解决方案。
  • Even in linear regression, there may be some cases where it is impractical to use the formula. An example is when X is a very large, sparse matrix. The solution will be too expensive to compute.

    即使在线性回归中,在某些情况下使用该公式也不可行。 一个示例是X是一个非常大的稀疏矩阵。 该解决方案将太昂贵而无法计算。

Gradient descent is a computationally cheaper (faster) option to find the solution.

梯度下降是找到解决方案的计算便宜(更快)的选择。

Gradient descent is an optimization algorithm used to minimize some cost function by repetitively moving in the direction of steepest descent. Hence, the model weights are updated after each epoch.

梯度下降是一种优化算法,用于通过在最陡下降方向反复移动来最小化某些成本函数。 因此,在每个时期之后更新模型权重。

Image for post
Basic visualization of gradient descent — ideally gradient descent tries to converge toward global minimum
梯度下降的基本可视化-理想情况下,梯度下降试图向全局最小值收敛

There are three primary types of gradient descent used in machine learning algorithm;

机器学习算法中使用了三种主要的梯度下降类型:

  1. Batch gradient descent

    批次梯度下降
  2. Stochastic gradient descent

    随机梯度下降
  3. Mini-batch gradient descent

    小批量梯度下降

Let us go through each type in more detail and implementation.

让我们更详细地介绍每种类型和实现。

批次梯度下降 (Batch Gradient Descent)

This approach is the most straightforward. It calculates the error for each observation within the training set. It will update the model parameters after all training observations are evaluated. This process can be called a training epoch.

这种方法是最简单的。 它为训练集中的每个观测值计算误差。 在评估所有训练观察结果后,它将更新模型参数 。 这个过程可以称为训练时期

The main advantages of this approach are computationally efficient and producing a stable error gradient and stable convergence, however, it requires the entire training set in memory, also the stable error gradient can sometimes result in the not-the-best model (converge to local minimum trap, instead attempt to find the best global minimum).

这种方法的主要优点是计算效率高,并产生稳定的误差梯度和稳定的收敛性,但是,它需要将整个训练集存储在内存中,而且稳定的误差梯度有时会导致生成非最佳模型(收敛到局部最小陷阱,而是尝试找到最佳的全局最小值)。

Let’s observe the python implementation of the regression problem.

让我们观察一下回归问题的python实现。

Image for post
The cost function of linear regression
线性回归的成本函数
Image for post
Batch gradient descent — cost and MSE per epoch
批次梯度下降-每个时期的成本和MSE

As we can see, the cost is reducing stably and reach a minimum of around 150–200 epochs.

正如我们所看到的,成本正在稳定下降,并至少达到150-200个时代。

During the computation, we also use vectorization for better performance. However, if the training set is very large, the performance will be slower.

在计算过程中,我们还使用矢量化以获得更好的性能。 但是,如果训练集很大,则性能会变慢。

随机梯度下降 (Stochastic Gradient Descent)

In stochastic gradient descent, SGD (or sometimes referred to as iterative or online GD). The name “stochastic” and “online GD” come from the fact that the gradient-based on single training observation is a “stochastic approximation” of the true cost gradient. However, because of this, the path towards the global cost minimum is not direct and may go up-and-down before converging to the global cost minimum.

在随机梯度下降中,SGD(或有时称为迭代在线 GD)。 名称“随机”和“在线GD”来自以下事实:基于单一训练观察的梯度是真实成本梯度的“随机近似”。 但是,由于这个原因,通往全球最低成本的道路并不直接,可能会在达到全球最低成本之前起伏不定。

Hence;

因此;

  • This makes SGD faster than batch GD (in most cases).

    这使得SGD比批处理GD更快(在大多数情况下)。
  • We can view the insight and rate of improvement of the model in real-time.

    我们可以实时查看模型的洞察力和改进率。
  • Increased model update frequency can result in faster learning.

    增加模型更新频率可以加快学习速度。
  • Noisy update of stochastic nature can help to avoid the local minimum.

    随机性的噪声更新可以帮助避免局部最小值。

However, some disadvantages are;

但是,有一些缺点。

  • Due to update frequency, this can be more computation expensive, which can take a longer time to complete than another approach.

    由于更新频率的原因,与其他方法相比,这可能会增加计算开销,并且可能需要更长的时间才能完成。
  • The frequent updates will result in a noisy gradient signal, which causes the model parameters and error to jump around, higher variance over training epochs.

    频繁的更新将导致嘈杂的梯度信号,这会导致模型参数和误差跳来跳去,在训练时期上变化更大。

Let’s look at how we can implement this in Python.

让我们看一下如何在Python中实现它。

Image for post
Stochastic gradient descent — performance per epoch
随机梯度下降-每个时期的表现

小批量梯度下降 (Mini-Batch Gradient Descent)

Mini-batch gradient descent (MB-GD) is a more preferred method since it compromises between Batch gradient descent and stochastic gradient descent. It separates the training set into small batches and feeds to the algorithm. The model will get updates based on these batches. The model will converge more quickly than batch GD because the weights get updated more frequently.

小批量梯度下降(MB-GD)是一种更优选的方法,因为它会在批量梯度下降和随机梯度下降之间进行折衷。 它将训练集分成小批,并馈给算法。 该模型将基于这些批次获得更新。 该模型将比批处理GD更快收敛,因为权重得到更频繁的更新。

This method combines the efficiency of batch GD and the robustness of stochastic GD. One (small) downside is this method introduces a new parameter “batch size”, which may require the fine-tuning as part of model tuning/optimization.

该方法结合了批处理GD的效率和随机GD的鲁棒性。 一个(小的)缺点是该方法引入了一个新的参数“批大小”,这可能需要进行微调,作为模型调整/优化的一部分。

We can imagine batch size as a slider on the learning process.

我们可以想象批量大小是学习过程中的滑块。

  • Small value gives a learning process converges quickly at the cost of noise in the training process

    较小的价值使学习过程Swift收敛,但以训练过程中的噪音为代价
  • Large value gives a learning process converges slowly with accurate estimates of the error gradient

    较大的值可让学习过程缓慢收敛,并准确估计误差梯度

We can reuse the above function but need to specify the batch size to be len(training set) > batch_size > 1.

我们可以重用上面的函数,但是需要将批处理大小指定为len(训练集)> batch_size> 1。

theta, _, mse_ = _sgd_regressor(X_, y, learning_rate=learning_rate, n_epochs=n_epochs, batch_size=50)

theta, _, mse_ = _sgd_regressor(X_, y, learning_rate=learning_rate, n_epochs=n_epochs, batch_size=50)

Image for post
Mini-batch gradient descent — performance over an epoch
小批量梯度下降—历时性能

We can see that only the first few epoch, the model is able to converge immediately.

我们可以看到,只有前几个时期,模型才能够立即收敛。

SGD回归器(scikit-learn) (SGD Regressor (scikit-learn))

In python, we can implement a gradient descent approach on regression problem by using sklearn.linear_model.SGDRegressor . Please refer to the documentation for more details.

在python中,我们可以使用sklearn.linear_model.SGDRegressor在回归问题上实现梯度下降方法。 请参阅文档以获取更多详细信息。

Below is how we can implement a stochastic and mini-batch gradient descent method.

以下是我们如何实现随机和小批量梯度下降方法。

Image for post
scikit-learn SGD model detail and performance
scikit-learn SGD模型的详细信息和性能
Image for post
scikit-learn SGD mini-batch model detail and performance
scikit-learn SGD微型批次模型的详细信息和性能

尾注 (EndNote)

In this post, I have provided the explanation of linear regression on both closed-form equation and optimization algorithm, gradient descent, by implementing them from scratch and using a built-in library.

在这篇文章中,我通过从头开始实现并使用内置库,对闭式方程式和优化算法(梯度下降)进行了线性回归的解释。

其他阅读和Github存储库: (Additional reading and Github repository:)

翻译自: https://medium.com/towards-artificial-intelligence/closed-form-and-gradient-descent-regression-explained-with-python-1627c9eeb60e

python 梯度下降

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值