如何学习机器学习微积分_机器学习中的微积分

如何学习机器学习微积分

重点 (Top highlight)

机器学习数学 (Machine Learning, Mathematics)

一,引言 (I. Introduction)

A machine learning algorithm (such as classification, clustering or regression) uses a training dataset to determine weight factors that can be applied to unseen data for predictive purposes. Behind every machine learning model is an optimization algorithm that relies heavily on calculus. In this article, we discuss one such optimization algorithm, namely, the Gradient Descent Approximation (GDA) and we’ll show how it can be used to build a simple regression estimator.

机器学习算法(例如分类,聚类或回归)使用训练数据集来确定权重因数,这些权重因数可用于预测目的而应用于未见数据。 每个机器学习模型的背后都有一个很大程度上依赖于微积分的优化算法 在本文中,我们将讨论一种这样的优化算法,即梯度下降近似(GDA),并说明如何将其用于构建简单的回归估计量。

二。 使用梯度下降算法的优化 (II. Optimization Using the Gradient Descent Algorithm)

II.1导数和梯度 (II.1 Derivatives and Gradients)

In one-dimension, we can find the maximum and minimum of a function using derivatives. Let us consider a simple quadratic function f(x) as shown below.

在一维中,我们可以使用导数找到函数的最大值和最小值。 让我们考虑一个简单的二次函数f(x) ,如下所示。

Image for post
Minimum of a simple function using gradient descent algorithm. Image by Benjamin O. Tayo
使用梯度下降算法的简单函数的最小值。 图片来源:Benjamin O. Tayo

Suppose we want to find the minimum of the function f(x). Using the gradient descent method with some initial guess, X gets updated according to this equation:

假设我们要找到函数f(x)的最小值。 使用梯度下降方法进行一些初始猜测, X根据以下等式进行更新:

Image for post

where the constant eta is a small positive constant called the learning rate. Note the following:

其中常数eta是一个小的正常数,称为学习率。 请注意以下几点:

  • when X_n > X_min, f’(X_n) > 0: this ensures that X_n+1 is less than X_n. Hence we are taking steps in the left direction to get to the minimum.

    当X_n> X_min时,f'(X_n)> 0:这确保X_n + 1小于X_n。 因此,我们正在朝左方向迈出最小的步伐。

  • when X_n < X_min, f’(X_n) < 0: this ensures that X_n+1 is greater than X_n. Hence we are taking steps in the right direction to get to X_min.

    当X_n <X_min,f'(X_n)<0:这确保X_n + 1大于X_n。 因此,我们正在朝着正确的方向采取步骤以达到X_min。

The above observation shows that it doesn’t matter what the initial guess is, the gradient descent algorithm will always find the minimum. How many optimization steps it’s going to take to get to X_min depends on how good the initial guess is. Sometimes if the initial guess or the learning rate is not carefully chosen, the algorithm can completely miss the minimum. This is often referred to as an “overshoot”. Generally, one could ensure convergence by adding a convergence criterion such as:

上面的观察表明,最初的猜测并不重要,梯度下降算法将始终找到最小值。 达到X_min需要多少优化步骤取决于初始猜测的好坏。 有时,如果未正确选择初始猜测或学习率,则该算法可能会完全错过最小值。 这通常称为“ 过冲 ”。 通常,可以通过添加收敛标准来确保收敛,例如:

Image for post

where epsilon is a small positive number.

其中epsilon是一个小的正数。

In higher dimensions, a function of several variables can be optimized (minimized) using the gradient descent algorithm as well. In this case, we use the gradient to update the vector X:

在更高的维度上,也可以使用梯度下降算法优化(最小化)几个变量的函数。 在这种情况下,我们使用渐变来更新向量X

Image for post

As in one-dimension, one could ensure convergence by adding a convergence criterion such as:

与在一维中一样,可以通过添加收敛准则来确保收敛,例如:

Image for post

II.2案例研究:建立简单的回归估计量 (II.2 Case Study: Building a Simple Regression Estimator)

In this subsection, we describe how a simple python estimator can be built to perform linear regression using the gradient descent method. Let’s assume we have a one-dimensional dataset containing a single feature (X) and an outcome (y), and let’s assume there are N observations in the dataset:

在本小节中,我们描述如何使用梯度下降法构建简单的python估计器以执行线性回归。 假设我们有一个包含单个特征( X )和结果( y )的一维数据集,并且假设数据集中有N个观测值:

Image for post

A linear model to fit the data is given as:

拟合数据的线性模型为:

Image for post

where w0 and w1 are the weights that the algorithm learns during training.

其中w0w1是算法在训练过程中学习的权重。

II.3梯度下降算法 (II.3 Gradient Descent Algorithm)

If we assume that the error in the model is independent and normally distributed, then the likelihood function is given as:

如果我们假设模型中的误差是独立的并且是正态分布的,则似然函数为:

Image for post

To maximize the likelihood function, we minimize the sum of squared errors (SSE) with respect to w0 and w1:

为了最大化似然函数,我们最小化关于w0w1的平方误差和( SSE ):

Image for post

The objective function or our SSE function is often minimized using the gradient descent approximation(GDA) algorithm. In the GDA method, the weights are updated according to the following procedure:

通常使用梯度下降近似(GDA)算法将目标函数或我们的SSE函数最小化。 在GDA方法中,权重根据以下过程进行更新:

Image for post

i.e., in the direction opposite to the gradient. Here, eta is a small positive constant referred to as the learning rate. This equation can be written in component form as:

即,在与梯度相反的方向上。 在此, eta是一个小的正常数,称为学习率。 该方程式可以用以下形式表示:

Image for post

II.4 Python实现 (II.4 Python Implementation)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
class GradientDescent(object):
"""Gradient descent optimizer.
Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.

Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Error in every epoch.
""" def __init__(self, eta=0.01, n_iter=10):
self.eta = eta
self.n_iter = n_iter

def fit(self, X, y):
"""Fit the data.

Parameters
----------
X : {array-like}, shape = [n_points]
Independent variable or predictor.
y : array-like, shape = [n_points]
Outcome of prediction.
Returns
-------
self : object
"""
self.w_ = np.zeros(2)
self.errors_ = []

for i in range(self.n_iter):
errors = 0
for j in range(X.shape[0]):
self.w_[1:] += self.eta*X[j]*(y[j] - self.w_[0] - self.w_[1]*X[j])
self.w_[0] += self.eta*(y[j] - self.w_[0] - self.w_[1]*X[j])
errors += 0.5*(y[j] - self.w_[0] - self.w_[1]*X[j])**2
self.errors_.append(errors)
return self def predict(self, X):
"""Return predicted y values"""
return self.w_[0] + self.w_[1]*X

II.5基本回归模型的应用 (II.5 Application of basic regression model)

a) Create dataset

a)创建数据集

np.random.seed(1)
X=np.linspace(0,1,10)
y = 2*X + 1
y = y + np.random.normal(0,0.05,X.shape[0])

b) Fit and Predict

b)适合和预测

gda = GradientDescent(eta=0.1, n_iter=100)
gda.fit(X,y)
y_hat=gda.predict(X)

c) Plot Output

c)绘图输出

plt.figure()
plt.scatter(X,y, marker='x',c='r',alpha=0.5,label='data')
plt.plot(X,y_hat, marker='s',c='b',alpha=0.5,label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
Image for post
Image by Benjamin O. Tayo 图片来源:Benjamin O. Tayo

d) Calculate R-square value

d)计算R平方值

R_sq = 1-((y_hat - y)**2).sum()/((y-np.mean(y))**2).sum()
R_sq
0.991281901588877

三, 总结与结论 (III. Summary and Conclusion)

In summary, we have shown how a simple linear regression estimator using the GDA algorithm can be built and implemented in Python. Behind every machine learning model is an optimization algorithm that relies heavily on calculus. If you would like to see how the GDA algorithm is used in a real machine learning classification algorithm, see the following Github repository.

总而言之,我们展示了如何使用GDA算法构建简单的线性回归估算器并在Python中实现。 每个机器学习模型的背后都有一个优化算法,该算法高度依赖微积分。 如果您想了解在实际的机器学习分类算法中如何使用GDA算法,请参见以下Github存储库

其他数据科学/机器学习资源 (Additional Data Science/Machine Learning Resources)

How Much Math do I need in Data Science?

我在数据科学中需要多少数学?

Data Science Curriculum

数据科学课程

5 Best Degrees for Getting into Data Science

进入数据科学的5个最佳学位

Theoretical Foundations of Data Science — Should I Care or Simply Focus on Hands-on Skills?

数据科学的理论基础-我应该关心还是仅仅关注动手技能?

Machine Learning Project Planning

机器学习项目规划

How to Organize Your Data Science Project

如何组织数据科学项目

Productivity Tools for Large-scale Data Science Projects

大型数据科学项目的生产力工具

A Data Science Portfolio is More Valuable than a Resume

数据科学组合比简历更有价值

For questions and inquiries, please email me: benjaminobi@gmail.com

如有疑问和疑问,请给我发送电子邮件 :benjaminobi@gmail.com

翻译自: https://medium.com/towards-artificial-intelligence/calculus-in-machine-learning-2e7cddafa21f

如何学习机器学习微积分

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值