fcn从头开始_在Excel中从头开始进行线性回归

最新推荐文章于 2024-04-06 01:00:00 发布

吴雄辉

最新推荐文章于 2024-04-06 01:00:00 发布

阅读量291

点赞数

文章标签： excel

原文链接：https://towardsdatascience.com/linear-regression-from-scratch-in-excel-3d8192214752

版权

fcn从头开始

While using Excel/Google Sheet for solving an actual problem with machine learning algorithms can be a bad idea, implementing the algorithm from scratch with simple formulas and a simple dataset is very helpful to understand how the algorithm works. After doing it for almost all the common algorithms including the Neural Network, it helps me a lot.

虽然使用Excel / Google Sheet解决机器学习算法的实际问题可能是一个坏主意，但是从头开始使用简单的公式和简单的数据集来实现算法对于了解算法的工作原理非常有帮助。在对几乎所有常用算法(包括神经网络)进行了处理之后，它对我有很大帮助。

In this article, I will share how I implemented a simple Linear Regression with Gradient Descent. You can use this link Simple linear regression with gradient descent to get the Excel/Google Sheet file.

在本文中，我将分享如何使用Gradient Descent实现简单的线性回归。您可以使用此链接使用梯度下降的简单线性回归来获取Excel / Google表格文件。

Now let’s get our hands dirty!

现在让我们弄脏双手！

使用简单的数据集 (Using a simple dataset)

First I use a very simple dataset with one feature, you can see the graph below showing the target variable y and the feature variable x.

首先，我使用一个具有一个特征的非常简单的数据集，您可以看到下面的图形，其中显示了目标变量y和特征变量x。

创建线性模型 (Creating the linear model)

In Google Sheet or Excel, you can add a trendline. So you get the result of Linear Regression.

在Google表格或Excel中，您可以添加趋势线。这样就得到了线性回归的结果。

But if you want to use the model to do predictions, then it is necessary to implement the model and in this case, the model is quite simple: for each new observation x, we can just create a formula: y=a*x + b. Where a and b are the parameters of the model.

但是，如果您想使用该模型进行预测，则有必要实施该模型，在这种情况下，该模型非常简单：对于每个新的观测值x，我们都可以创建一个公式：y = a * x + b。其中a和b是模型的参数。

模型的成本函数 (The cost function of the model)

How can we obtain the parameters a and b? Well, the optimal values for a and b are those minimizing the cost function, which is the Squared Error of the model. So for each data point, we can calculate the Squared Error.

我们如何获得参数a和b？好吧，a和b的最佳值是那些使成本函数最小的值，这是模型的平方误差。因此，对于每个数据点，我们可以计算平方误差。

Squared Error = (prediction-real value)²=(a*x+b-real value)²

平方误差=(预测实际值)²=(a * x + b实际值)²

In order to find the minimum of the cost function, we use the gradient descent algorithm.

为了找到成本函数的最小值，我们使用了梯度下降算法。

简单梯度下降 (Simple gradient descent)

Before implementing the gradient descent for the Linear Regression, we can first do it for a simple function: (x-2)^2.

在为线性回归实现梯度下降之前，我们可以先针对一个简单函数(x-2)^ 2进行此操作。

The idea is to find the minimum of this function using the following process:

想法是使用以下过程找到此功能的最小值：

First, we randomly choose an initial value.
首先，我们随机选择一个初始值。
Then for each step, we calculate the value of the derivative function df (for this x value): df(x)
然后，对于每个步骤，我们计算导数函数df的值(对于此x值)： df(x)
And the next value of x is obtained by subtracting the value of derivative multiplied by a step size: x = x - step_size*df(x)
x的下一个值是通过将导数的值乘以步长得到的： x = x-step_size * df(x)

You can modify the two parameters of the gradient descent: the initial value of x and the step size.

您可以修改梯度下降的两个参数：x的初始值和步长。

And in some cases, the gradient descent will not work. For example, if the step size is too big, the x value can explode.

在某些情况下，梯度下降将不起作用。例如，如果步长太大，则x值可能会爆炸。

线性下降的梯度下降 (Gradient descent for linear regression)

The principle of the gradient descent algorithm is the same for linear regression: we have to calculate the partial derivatives of the cost function with respect to the parameters a and b. Let’s note them as da and db.

对于线性回归，梯度下降算法的原理是相同的：我们必须针对参数a和b计算成本函数的偏导数。让我们将它们记为da和db。

Squared Error = (prediction-real value)²=(a*x+b-real value)²

平方误差=(预测实际值)²=(a * x + b实际值)²

da=2(a*x+b-real value)*x

da = 2(a * x + b实值)* x

db=2(a*x+b-real value)

db = 2(a * x + b实值)

In the following graph, you can see how a and b converge towards the target value.

在下图中，您可以看到a和b如何收敛到目标值。

Now in practice, we have many observations and this should be done for each data point. That’s where things become crazy in Google Sheet. So, we use only 10 data points.

现在，在实践中，我们有很多观察结果，应该针对每个数据点执行此操作。这就是Google Sheet变得疯狂的地方。因此，我们仅使用10个数据点。

You will see that I first created a sheet with long formulas to calculate da and db, which contain the sum of the derivatives of all the observations. Then I created another sheet to show all the details.

您将看到，我首先创建了一个包含长公式的工作表来计算da和db，其中包含所有观测值的导数之和。然后，我创建了另一个工作表以显示所有详细信息。

If you open Google Sheet, you can play yourself by modifying the parameters of the gradient descent: the initial values of a and b, and the step size. Enjoy!

如果您打开Google表格，则可以通过修改梯度下降的参数(a和b的初始值以及步长)来发挥自己的作用。请享用！

Now if you want to understand other algorithms, please free feel copy this Google Sheet, and change it a little bit for Logistic Regression or even Neural Network.

现在，如果您想了解其他算法，请随意复制此Google表格，并为Logistic回归甚至神经网络进行一些更改。

翻译自: https://towardsdatascience.com/linear-regression-from-scratch-in-excel-3d8192214752

fcn从头开始

吴雄辉

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
fcn从头开始_在Excel中从头开始进行线性回归

fcn从头开始While using Excel/Google Sheet for solving an actual problem with machine learning algorithms can be a bad idea, implementing the algorithm from scratch with simple formulas and a simple datase...
复制链接

扫一扫