如何学习机器学习微积分_机器学习中的微积分

最新推荐文章于 2023-05-06 19:39:00 发布

weixin_26726011

最新推荐文章于 2023-05-06 19:39:00 发布

阅读量905

点赞数 1

文章标签：机器学习人工智能 python

原文链接：https://medium.com/towards-artificial-intelligence/calculus-in-machine-learning-2e7cddafa21f

版权

如何学习机器学习微积分

重点 (Top highlight)

机器学习，数学 (Machine Learning, Mathematics)

一，引言 (I. Introduction)

A machine learning algorithm (such as classification, clustering or regression) uses a training dataset to determine weight factors that can be applied to unseen data for predictive purposes. Behind every machine learning model is an optimization algorithm that relies heavily on calculus. In this article, we discuss one such optimization algorithm, namely, the Gradient Descent Approximation (GDA) and we’ll show how it can be used to build a simple regression estimator.

机器学习算法(例如分类，聚类或回归)使用训练数据集来确定权重因数，这些权重因数可用于预测目的而应用于未见数据。 每个机器学习模型的背后都有一个很大程度上依赖于微积分的优化算法 。在本文中，我们将讨论一种这样的优化算法，即梯度下降近似(GDA)，并说明如何将其用于构建简单的回归估计量。

二。使用梯度下降算法的优化 (II. Optimization Using the Gradient Descent Algorithm)

II.1导数和梯度 (II.1 Derivatives and Gradients)

In one-dimension, we can find the maximum and minimum of a function using derivatives. Let us consider a simple quadratic function f(x) as shown below.

在一维中，我们可以使用导数找到函数的最大值和最小值。让我们考虑一个简单的二次函数f(x) ，如下所示。

Image for post — **Minimum of a simple function using gradient descent algorithm. Image by Benjamin O. Tayo**

Suppose we want to find the minimum of the function f(x). Using the gradient descent method with some initial guess, X gets updated according to this equation:

假设我们要找到函数f(x)的最小值。使用梯度下降方法进行一些初始猜测， X根据以下等式进行更新：

where the constant eta is a small positive constant called the learning rate. Note the following:

其中常数eta是一个小的正常数，称为学习率。请注意以下几点：

when X_n > X_min, f’(X_n) > 0: this ensures that X_n+1 is less than X_n. Hence we are taking steps in the left direction to get to the minimum.
当X_n> X_min时，f'(X_n)> 0：这确保X_n + 1小于X_n。 因此，我们正在朝左方向迈出最小的步伐。
when X_n < X_min, f’(X_n) < 0: this ensures that X_n+1 is greater than X_n. Hence we are taking steps in the right direction to get to X_min.
当X_n <X_min，f'(X_n)<0：这确保X_n + 1大于X_n。 因此，我们正在朝着正确的方向采取步骤以达到X_min。

The above observation shows that it doesn’t matter what the initial guess is, the gradient descent algorithm will always find the minimum. How many optimization steps it’s going to take to get to X_min depends on how good the initial guess is. Sometimes if the initial guess or the learning rate is not carefully chosen, the algorithm can completely miss the minimum. This is often referred to as an “overshoot”. Generally, one could ensure convergence by adding a convergence criterion such as:

上面的观察表明，最初的猜测并不重要，梯度下降算法将始终找到最小值。达到X_min需要多少优化步骤取决于初始猜测的好坏。有时，如果未正确选择初始猜测或学习率，则该算法可能会完全错过最小值。这通常称为“ 过冲 ”。通常，可以通过添加收敛标准来确保收敛，例如：

where epsilon is a small positive number.

其中epsilon是一个小的正数。

In higher dimensions, a function of several variables can be optimized (minimized) using the gradient descent algorithm as well. In this case, we use the gradient to update the vector X:

在更高的维度上，也可以使用梯度下降算法优化(最小化)几个变量的函数。在这种情况下，我们使用渐变来更新向量X ：

As in one-dimension, one could ensure convergence by adding a convergence criterion such as:

与在一维中一样，可以通过添加收敛准则来确保收敛，例如：

II.2案例研究：建立简单的回归估计量 (II.2 Case Study: Building a Simple Regression Estimator)

In this subsection, we describe how a simple python estimator can be built to perform linear regression using the gradient descent method. Let’s assume we have a one-dimensional dataset containing a single feature (X) and an outcome (y), and let’s assume there are N observations in the dataset:

在本小节中，我们描述如何使用梯度下降法构建简单的python估计器以执行线性回归。假设我们有一个包含单个特征( X )和结果( y )的一维数据集，并且假设数据集中有N个观测值：

A linear model to fit the data is given as:

拟合数据的线性模型为：

where w0 and w1 are the weights that the algorithm learns during training.

其中w0和w1是算法在训练过程中学习的权重。

II.3梯度下降算法 (II.3 Gradient Descent Algorithm)

If we assume that the error in the model is independent and normally distributed, then the likelihood function is given as:

如果我们假设模型中的误差是独立的并且是正态分布的，则似然函数为：

To maximize the likelihood function, we minimize the sum of squared errors (SSE) with respect to w0 and w1:

为了最大化似然函数，我们最小化关于w0和w1的平方误差和( SSE )：

The objective function or our SSE function is often minimized using the gradient descent approximation(GDA) algorithm. In the GDA method, the weights are updated according to the following procedure:

通常使用梯度下降近似(GDA)算法将目标函数或我们的SSE函数最小化。在GDA方法中，权重根据以下过程进行更新：

i.e., in the direction opposite to the gradient. Here, eta is a small positive constant referred to as the learning rate. This equation can be written in component form as:

即，在与梯度相反的方向上。在此， eta是一个小的正常数，称为学习率。该方程式可以用以下形式表示：

II.4 Python实现 (II.4 Python Implementation)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
class GradientDescent(object):
    """Gradient descent optimizer.
    Parameters
    ------------
    eta : float
        Learning rate (between 0.0 and 1.0)
    n_iter : int
        Passes over the training dataset.
        
    Attributes
    -----------
    w_ : 1d-array
        Weights after fitting.
    errors_ : list
        Error in every epoch.
    """    def __init__(self, eta=0.01, n_iter=10):
        self.eta = eta
        self.n_iter = n_iter
        
    def fit(self, X, y):
        """Fit the data.
        
        Parameters
        ----------
        X : {array-like}, shape = [n_points]
        Independent variable or predictor.
        y : array-like, shape = [n_points]
        Outcome of prediction.
        Returns
        -------
        self : object
        """
        self.w_ = np.zeros(2)
        self.errors_ = []
        
        for i in range(self.n_iter):
            errors = 0
            for j in range(X.shape[0]):
                self.w_[1:] += self.eta*X[j]*(y[j] - self.w_[0] -                     self.w_[1]*X[j])
                self.w_[0] += self.eta*(y[j] - self.w_[0] - self.w_[1]*X[j])
                errors += 0.5*(y[j] - self.w_[0] - self.w_[1]*X[j])**2
            self.errors_.append(errors)
        return self    def predict(self, X):
        """Return predicted y values"""
        return self.w_[0] + self.w_[1]*X

II.5基本回归模型的应用 (II.5 Application of basic regression model)

a) Create dataset

a)创建数据集

np.random.seed(1)
X=np.linspace(0,1,10)
y = 2*X + 1
y = y + np.random.normal(0,0.05,X.shape[0])

b) Fit and Predict

b)适合和预测

gda = GradientDescent(eta=0.1, n_iter=100)
gda.fit(X,y)
y_hat=gda.predict(X)

c) Plot Output

c)绘图输出

plt.figure()
plt.scatter(X,y, marker='x',c='r',alpha=0.5,label='data')
plt.plot(X,y_hat, marker='s',c='b',alpha=0.5,label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

d) Calculate R-square value

d)计算R平方值

R_sq = 1-((y_hat - y)**2).sum()/((y-np.mean(y))**2).sum()
R_sq
0.991281901588877

三，总结与结论 (III. Summary and Conclusion)

In summary, we have shown how a simple linear regression estimator using the GDA algorithm can be built and implemented in Python. Behind every machine learning model is an optimization algorithm that relies heavily on calculus. If you would like to see how the GDA algorithm is used in a real machine learning classification algorithm, see the following Github repository.

总而言之，我们展示了如何使用GDA算法构建简单的线性回归估算器并在Python中实现。每个机器学习模型的背后都有一个优化算法，该算法高度依赖微积分。如果您想了解在实际的机器学习分类算法中如何使用GDA算法，请参见以下Github存储库。