神经网络逻辑回归_神经网络心态的逻辑回归

最新推荐文章于 2024-05-10 11:24:49 发布

Big黄勇

最新推荐文章于 2024-05-10 11:24:49 发布

阅读量808

点赞数

文章标签：神经网络逻辑回归 python 深度学习机器学习

原文链接：https://medium.com/analytics-vidhya/logistic-regression-with-a-neural-network-mindset-f10da5033d30

版权

本文探讨了如何从神经网络的角度理解逻辑回归，通过对比分析，深入浅出地阐述了逻辑回归在神经网络中的实现和应用。

摘要由CSDN通过智能技术生成

神经网络逻辑回归

Note From Author:

作者注：

This tutorial is the foundation of computer vision delivered as “Lesson 8” of the series, there are more Lessons upcoming which would talk to the extend of building your own deep learning based computer vision projects. You can find the complete syllabus and table of content here

本教程是本系列“第8课”中交付的计算机视觉的基础，接下来还将有更多课程与构建基于深度学习的计算机视觉项目相关。您可以在此处找到完整的课程提纲和目录

Target Audience : Final year College Students, New to Data Science Career, IT employees who wants to switch to data science Career .

目标受众 ：最后一年的大学生，数据科学职业新手，想要转用数据科学职业的IT员工。

Takeaway : Main takeaway from this article :

外卖 ：本文的主要外卖：

Logistic Regression
逻辑回归
Approaching Logistic Regression with Neural Network mindset
用神经网络思维方式接近Logistic回归

逻辑回归 (Logistic Regression)

Logistic Regression is an algorithm for binary classification. In a binary classification problem the input (X) will be a feature vector of 1-D dimension and the output (Y) label will be a 1 or 0

Logistic回归是用于二进制分类的算法。在二元分类问题中，输入( X )将是一维维的特征向量，而输出( Y )标签将是1或0

The logistic regression output label lies between the range 0 and 1 .

逻辑回归输出标签位于0到1之间。

0 ≤ Y ≤ 1, where Y is the probability of the output label being 1 given the input X

0≤Y≤1，其中Y是在给定输入X的情况下输出标签为1的概率

Y = P(y=1 | x) For a learning algorithm to find Y it takes two parameters W and B. Where, W is the weight associated with the input feature vector X and B bias.

Y = P(y = 1 | x)对于查找Y i t的学习算法，它采用两个参数W和B。其中， W是与输入特征向量X和B bias关联的权重。

To find Y . Well, one thing you could try that doesn’t work would be to have Y be w transpose X plus B, kind of a linear function of the input X. And in fact, this is what you use if you were doing linear regression. As shown below.

寻找Y。好吧，您可以尝试的不起作用的一件事是让W换位X加B，这是输入X的线性函数。实际上，如果要进行线性回归，这就是您要使用的东西。如下所示。

But this isn’t a very good algorithm for binary classification

但这不是二进制分类的很好算法

Because you want Y to be the chance that y is equal to one Y = P(y=1 | x). So Y should really be between zero and one and it’s difficult to enforce that because W transpose X plus B can be much bigger than one or it can even be negative, which doesn’t make sense for probability. That you want it to be between zero and one.

因为您希望Y成为y等于1的机会，所以Y = P(y = 1 | x) 。因此Y确实应该在零和一之间，并且很难强制执行，因为W转置X加上B可以比1大得多，或者甚至可以是负数，这对于概率没有意义。您希望它介于零和一之间。

So in logistic regression, our output is instead going to be Y equals the sigmoid function applied to this quantity.

因此，在逻辑回归中，我们的输出将变为Y等于应用于此数量的S型函数。

σ is the Sigmoid function to which we pass the quantity w^T X+B

σ是Sigmoid函数，我们将w ^ T X + B传递给它

A sigmoid function would look like this,

乙状结肠功能看起来像这样，

If , Z is very large σ(z) = 1/1+0 = 1

如果， Z很大σ(z)= 1/1 + 0 = 1

If, Z is very small (large negative number) σ(z) = 1/1+big number = 0

如果Z非常小(大负数)σ(z)= 1/1 +大数= 0

where Z is the quantity w^T X+B

Z是数量w ^ T X + B

The loss function is given by

损失函数由下式给出

L(Y , y) = −y log(Y)−(1−y)log(1−Y)

L(Y，y)= -y log(Y)-(1-y)log(1-Y)

Where, Y — predicted label to the y — ground truth label that comes along with the training data-set.

其中， Y –训练数据集随附的y-地面真相标签的 预测标签 。

The loss function measures how well you’re doing on a single training example.

损失函数可衡量您在一个训练示例中的表现。

The cost function is given by

成本函数由

Which measures how well you’re doing an entire training set. So in training your logistic regression model. we’re going to try to find parameters W and B that minimize the overall costs function J .

哪个衡量您在整个培训课程中做得如何 。因此，在训练您的逻辑回归模型时。我们将尝试找到最小化总体成本函数J的参数W和B。

So, you’ve just seen the set up for the logistic regression algorithm, the loss function for training example and the overall cost function for the parameters of your algorithm.

因此，您已经看到了逻辑回归算法的设置，训练示例的损失函数以及算法参数的总体成本函数。

It turns out that logistic regression can be viewed as a very very small neural network.

事实证明，逻辑回归可以看作是一个非常小的神经网络。

Our goal is to find the values of parameter W that make our classifier as accurate as possible; and in order to find appropriate values of parameter W, we’ll need to apply gradient ascent/descent.

我们的目标是找到使分类器尽可能准确的参数W的值；并且为了找到参数W的适当值，我们需要应用渐变上升/下降。

导数或斜率 (Derivative or Slope)

Before understanding the gradient descent, lets try to understand what an derivative is.

在了解梯度下降之前，让我们尝试了解什么是导数。

Derivative means slope

导数均值斜率

From the high school math we learnt, we understand Slope = Height/Width

从中学到的数学中，我们了解Slope = Height / Width / Height / Width

Lets take a simple function f(a)=3a of a straight line as shown below

取直线的简单函数f(a)= 3a 如下所示

For a = 2, f(a) = 3(2) = 6 | for a=2.001 , f(a) = 3(2.001) = 6.003

对于a = 2，f(a)= 3(2)= 6 | 对于a = 2.001，f(a)= 3(2.001)= 6.003

Therefore, the slope / derivative of a function for straight line is height/width = 0.003/00.001 = 3. Given by,

因此，直线的斜率/导数为高度/宽度= 0.003 / 00.001 = 3。

The derivative of the function just means the slope of a function and the slope of a function can be different at different points on the function. In our first example where f(a) = 3a those a straight line. The derivative was the same everywhere, it was three everywhere.

函数的导数仅表示函数的斜率和函数的斜率在函数的不同点上可以不同。在我们的第一个示例中，f(a)= 3a表示一条直线。到处都是导数相同，到处都是三个。

Similarly, lets take a complex function f(a) = a² where the slope of the function can be different to different points in the function unlike the above straight line function.

类似地，让我们考虑一个复杂的函数f(a)=a² ，其中函数的斜率可能与上述直线函数不同，函数的斜率可能与函数中的不同点不同。

For other functions like f(a) = a² or f(a) = log(a), the slope of the line varies. So, the slope or the derivative can be different at different points on the curve.

对于其他函数，例如f(a)=a²或f(a)= log(a)，直线的斜率会变化。因此，斜率或导数在曲线上的不同点可以不同。

梯度下降 (Gradient Descent)

Gradient ascent and descent are very simple first-order optimization algorithms based on the derivative of an optimization function. We use gradient ascent and descent to find the local minimum/maximum of a function.

梯度上升和下降是基于优化函数导数的非常简单的一阶优化算法。我们使用梯度上升和下降来找到函数的局部最小值/最大值。

So in order to learn the set of parameters W and B it seems natural that we want to find W and B that make the cost function J(W, B) as small as possible.

因此，为了学习参数W和B的集合，我们自然想找到使成本函数J(W，B)尽可能小的W和B。

In other words, with gradient descent we minimize the over all cost function J(W,B) by moving towards the global minima in a convex function. The slope towards the global minima is obtained by taking the derivative of the cost function ⅆ/ⅆw J(W,B)

换句话说，在梯度下降的情况下，我们通过向凸函数中的全局最小值移动，使总成本函数J(W，B)最小化。通过取成本函数ⅆ/ⅆwJ(W，B)的导数，可以得出朝向全局最小值的斜率。

So to find a good value for the parameters, we initialize W and B to some initial value, for logistic regression almost any initialization method works,usually you initialize the value to zero. But because this function is convex, no matter where you initialize, you should get to the same point or roughly the same point. And what gradient descent does is it starts at that initial point and then takes a step in the steepest downhill direction. So after one step of gradient descent you might end up towards global minima downhill as shown in the PIC above, because it’s trying to take a step downhill in the direction of steepest descent or as quickly downhill as possible. So that’s one iteration of gradient descent. And after two iterations of gradient descent you might step further towards the global minima, three iterations and so on.

因此，为了找到合适的参数值，我们将W和B初始化为某个初始值，为了进行逻辑回归，几乎所有初始化方法都可以使用，通常将其初始化为零。但是由于此函数是凸函数，所以无论您在何处进行初始化，都应到达相同的点或大致相同的点。梯度下降的作用是从该起始点开始，然后沿最陡峭的下坡方向迈出一步。因此，在经过一个梯度下降步骤之后，您可能最终会走向上图PIC中所示的全局最小值下坡，因为它试图朝着最陡的下降方向或尽可能快的下坡方向走下坡路。这是梯度下降的一次迭代。在两次梯度下降迭代之后，您可能会进一步向全局最小值迈进，进行三次迭代，依此类推。

This step of updating the weight and bias happens in an iterative way until the point of convergence occurs or the global minima is reached from the initial values of W and B.

更新权重和偏差的此步骤以迭代方式进行，直到出现收敛点或从W和B的初始值达到全局最小值为止。

The whole of factor of moving towards the global minima is driven by alpha α the learning rate, it controls how big a step we take on each iteration or gradient descent.

整体走向全球最小移动的因素是由阿尔法α学习率驱动，它控制着我们一件多么大步走在每次迭代或梯度下降。

用神经网络思维方式接近Logistic回归 (Approaching Logistic Regression with Neural Network mindset)

In this exercise, you will build a Logistic Regression, using a Neural Network mindset. The following Figure explains why Logistic Regression is actually a very simple Neural Network!

在本练习中，您将使用神经网络心态构建Logistic回归。下图说明了为什么逻辑回归实际上是一个非常简单的神经网络！

The main steps for building a Neural Network are:

建立神经网络的主要步骤是：

Initialize the model’s parameters W and B
初始化模型的参数W和B
Loop: Forward and Backward propagation
循环：向前和向后传播

Calculate current loss (forward propagation) L
计算电流损耗(正向传播) L
Calculate current gradient (backward propagation) J
计算电流梯度(向后传播) J
Update parameters (gradient descent) θ
更新参数(梯度下降) θ

3. Use the learned (w,b) to predict the labels for a given set of examples

3.使用学到的(w，b)来预测给定示例集的标签

Step 1:

第1步：

Initialize parameters W and B manually.

手动初始化参数W和B。

W -- initialized vector of shape (dim, 1)B -- initialized scalar (corresponds to the bias)

Step 2:

第2步：

Forward Propagation , Backward Propagation and Optimization

正向传播，向后传播和优化

We obtain the cost J(W, B) and gradient of loss with respect to W and B by using the below formulas for forward propagation and backward propagation

通过使用以下用于正向传播和反向传播的公式，我们可以获得成本J(W，B)和相对于W和B的损耗梯度

# compute cost (Forward Propagation)
cost = -(1/m) * np.sum(Y.T * np.log(A) + (1 - Y.T) * (np.log(1-A)) )#where A is the sigmoid Activation , A = sigmoid(np.dot(X.T,w) + b) #Gradients of loss with respect to W and B:(Backward Propagation)    
dw = (1/m) * np.dot(X,(A-Y.T))
db = (1/m) * np.sum(A-Y.T)

The goal is to learn W and B by minimizing the cost function J. For a parameter θ, the update rule is

目标是通过最小化成本函数J来学习W和B。对于参数θ ，更新规则为

where alpha is the learning rate

α是学习率

Optimization is finding the updated parameter of W and B after minimizing the cost function J by applying the update rule.

优化是在通过应用更新规则将成本函数J最小化之后找到W和B的更新参数。

W = W- learning_rate * dw
B = B- learning_rate * db

Step 3:

第三步：

Use the learned (w,b) to predict the labels for a given set of input examples. Process of computing the cost , gradient and updated parameter by gradient descent for a set of input examples is called iteration or epoch. Typically we run such step for multiple epochs or iterations to obtain the desired result.

使用学到的(w，b)来预测给定一组输入示例的标签。对于一组输入示例，通过梯度下降计算成本，梯度和更新的参数的过程称为迭代或时期。通常，我们针对多个时期或迭代运行此步骤，以获得所需的结果。

For a learning rate alpha of 0.005, you can see the cost decreasing. It shows that the parameters are being learned. However, you see that you could train the model even more on the training set. By increasing the number of iterations you might see that the training set accuracy goes up, but the test set accuracy goes down. This is called overfitting.

对于0.005的学习率α，您可以看到成本下降。它表明正在学习参数。但是，您看到可以在训练集上训练更多模型。通过增加迭代次数，您可能会看到训练集的准确性提高了，但是测试集的准确性却降低了。这称为过拟合 。

In order for Gradient Descent to work you must choose the learning rate wisely. The learning rate alpha determines how rapidly we update the parameters. If the learning rate is too large we may “overshoot” the optimal value. Similarly, if it is too small we will need too many iterations to converge to the best values. That’s why it is crucial to use a well-tuned learning rate.

为了使Gradient Descent起作用，您必须明智地选择学习率。学习率alpha决定了我们更新参数的速度。如果学习率太大，我们可能会“超调”最佳值。同样，如果它太小，我们将需要太多的迭代才能收敛到最佳值。这就是为什么使用调整后的学习率至关重要。

摘要 (Summary)

Logistic Regression is a simple Neural Network. The main objective of a Logistic regression algorithm is to find the updated parameters by minimizing the cost function J, where cost function J measures how well you’re doing an entire training set.

Logistic回归是一个简单的神经网络。 Logistic回归算法的主要目标是通过最小化成本函数J来找到更新的参数，其中成本函数J衡量您在整个训练集中做得如何。

Logistic Regression undergoes 3 steps, first we initialize parameters W and B as zeros. Next we compute cost of the entire training set (J) and obtain derivatives of parameters dw and db which is nothing but the gradient of loss with respect to W and B. Finally we apply update rule to minimize the cost function and obtain the updated parameters.

Logistic回归经历3个步骤，首先我们将参数W和B初始化为零。接下来，我们计算整个训练集(J)的成本，并获得参数dw和db的导数，它们仅是相对于W和B的损耗梯度。最后，我们应用更新规则以最小化成本函数并获得更新后的参数。

We repeat this process of updating parameters W and B for multiple iterations or epochs by taking the updated parameters from previous iterations or epochs as initial parameters in the current epochs.

通过将来自先前迭代或历元的更新后的参数作为当前历元中的初始参数，我们重复针对多个迭代或历元更新参数W和B的过程。

To read the other Lessons from this course, Jump to this article to find the complete syllabus and table of content

要阅读本课程的其他课程，请跳至本文以找到完整的课程提纲和目录

— — — — — — — — — — -> Click Here

— — — — — — — — — — — —> 单击此处