机器学习学习吴恩达逻辑回归
In the previous stories, I had given an explanation of the program for implementation of various Regression models. As we move on to Classification, isn’t it surprising as to why the title of this algorithm still has the name, Regression. Let us understand the mechanism of the Logistic Regression and learn to build a classification model with an example.
在先前的故事中 ,我已经解释了用于实现各种回归模型的程序。 当我们继续进行分类时 ,为什么该算法的标题仍然具有名称Regression也不奇怪。 让我们了解Logistic回归的机制,并通过示例学习构建分类模型。
Logistic回归概述 (Overview of Logistic Regression)
Logistic Regression is a classification model that is used when the dependent variable (output) is in the binary format such as 0 (False) or 1 (True). Examples include such as predicting if there is a tumor (1) or not (0) and if an email is a spam (1) or not (0).
Logistic回归是一种分类模型,当因变量(输出)采用二进制格式(例如0(假)或1(真))时使用。 例如,例如预测是否有肿瘤(1)(0)和电子邮件是否为垃圾邮件(1)(0)。
The logistic function, also called as sigmoid function was initially used by statisticians to describe properties of population growth in ecology. The sigmoid function is a mathematical function used to map the predicted values to probabilities. Logistic Regression has an S-shaped curve and can take values between 0 and 1 but never exactly at those limits. It has the formula of 1 / (1 + e^-value)
.
统计学家最初使用逻辑函数(也称为S型函数)来描述生态学中人口增长的特性。 S形函数是用于将预测值映射到概率的数学函数。 Logistic回归具有S形曲线,并且可以采用0到1之间的值,但永远不能精确地处于那些极限。 它的公式为1 / (1 + e^-value)
。
Logistic Regression is an extension of the Linear Regression model. Let us understand this with a simple example. If we want to classify if an email is a spam or not, if we apply a Linear Regression model, we would get only continuous values between 0 and 1 such as 0.4, 0.7 etc. On the other hand, the Logistic Regression extends this linear regression model by setting a threshold at 0.5, hence the data point will be classified as spam if the output value is greater than 0.5 and not spam if the output value is lesser than 0.5.
Logistic回归是线性回归模型的扩展。 让我们用一个简单的例子来理解这一点。 如果我们要分类电子邮件是否为垃圾邮件,则应用线性回归模型,我们将只能获得0到1之间的连续值,例如0.4、0.7等。另一方面,逻辑回归可以扩展此线性通过将阈值设置为0.5来建立回归模型,因此,如果输出值大于0.5,则数据点将被归类为垃圾邮件;如果输出值小于0.5,则数据点将被归类为垃圾邮件。
In this way, we can use Logistic Regression to classification problems and get accurate predictions.
这样,我们可以使用Logistic回归对问题进行分类并获得准确的预测。
问题分析 (Problem Analysis)
To apply the Logistic Regression model in practical usage, let us consider a DMV Test dataset which consists of three columns. The first two columns consist of the two DMV written tests (DMV_Test_1 and DMV_Test_2) which are the independent variables and the last column consists of the dependent variable, Results which denote th