ml回归_ML中的分类和回归是什么？

最新推荐文章于 2024-03-14 09:27:49 发布

weixin_26752765

最新推荐文章于 2024-03-14 09:27:49 发布

阅读量537

点赞数

文章标签： python 机器学习 java 人工智能逻辑回归

原文链接：https://towardsdatascience.com/what-are-classification-and-regression-3677987b9422

版权

本文探讨了机器学习中的核心概念——分类和回归。分类主要用于预测离散目标变量，如二元逻辑回归；而回归则针对连续目标变量进行预测。这两种技术广泛应用于数据科学中，借助Python和Java等编程语言实现，是人工智能领域的基础工具。

摘要由CSDN通过智能技术生成

ml回归

机器学习教程 (MACHINE LEARNING TUTORIAL)

ML is extracting data from knowledge.

ML正在从知识中提取数据。

Machine learning is a study of algorithms that uses a provides computers the ability to learn from the data and predict outcomes with accuracy, without being explicitly programmed. Machine learning is sub-branched into three categories- supervised learning, unsupervised learning, and reinforcement learning.

机器学习是对算法的研究，它使计算机能够从数据中学习并准确预测结果，而无需进行显式编程。机器学习可分为三类：监督学习，无监督学习和强化学习。

Author) 作者提供 ) **Machine Learning Model** **机器学习模型**

监督学习 (Supervised learning)

As the name “supervised learning” suggests, here learning is based through example. We have a known set of inputs (called features, x) and outputs (called labels, y ). The goal of the algorithm is to train the model on the given data and predict the correct value (y) for an unknown input (x). Supervised learning can be further classified into two categories- classification and regression.

就像“监督学习”这个名字所暗示的那样，这里的学习是通过示例进行的。我们有一组已知的输入(称为特征，x)和输出(称为标签，y)。该算法的目标是在给定数据上训练模型并预测未知输入(x)的正确值(y)。监督学习可以进一步分为两类：分类和回归。

Classification and regression are two basic concepts in supervised learning. However, understanding the difference between the two can be confusing and can lead to the implementation of the wrong algorithm for prediction. If we can understand the difference between the two and identify the algorithm that has to be used, then structuring the model becomes easy.

分类和回归是监督学习中的两个基本概念。但是，了解两者之间的差异可能会造成混淆，并可能导致错误的预测算法实现。如果我们能够理解两者之间的区别并确定必须使用的算法，那么构造模型就变得容易了。

Classification and regression follow the same basic concept of supervised learning i.e. to train the model on a known dataset to make predict the outcome.

分类和回归遵循监督学习的相同基本概念 ，即在已知数据集上训练模型以预测结果。

Here the major difference is that in the classification problem the output variable will be assigned to a category or class (i.e. it is discrete), while in regression the variable output is a continuous numerical value.

此处的主要区别在于，在分类问题中，输出变量将分配给类别或类(即，它是离散的)，而在回归中，变量输出是连续的数值。

分类 (Classification)

In classification, the model is trained in such a way that the output data is separated into different labels (or categories) according to the given input data.

在分类中，以如下方式训练模型：根据给定的输入数据将输出数据分为不同的标签(或类别)。

The algorithm maps the input data (x) to discrete labels (y).

该算法将输入数据(x)映射到离散标签(y)。

二进制分类 (Binary classification)

If there are only two categories in which the given data has to be classified then it is called binary classification. For example- checking a bank transaction whether it is a fraudulent or a genuine transaction. Here, there are only two categories (i.e. fraudulent or genuine) where the output can be labeled.

如果只有两个类别必须对给定数据进行分类，则称为二进制分类。例如，检查银行交易是欺诈交易还是真实交易。在这里，只有两个类别(即欺诈或真实)可以标记输出。

多类别分类 (Multiclass classification)

In this kind of problem, the input is categorized into one class out of three or more classes.

在这种问题中，输入被分为三类或更多类中的一类。

Iris dataset is a perfect example of multiclass classification. Iris data set contains data of fifty samples of three species of flower (setosa, versicolor, and virginica) which are classified based on four parameters (sepal length, sepal width, petal length, and petal width).

虹膜数据集是多类分类的完美示例。虹膜数据集包含三种花(setosa，versicolor和virginica)的五十个样本的数据，它们基于四个参数(花冠长度，萼片宽度，花瓣长度和花瓣宽度)进行分类。

Iris Dataset Graphical Representation — Author) 作者提供的图像) **Graphical representation of a linear discriminant model of Iris dataset** **Iris数据集的线性判别模型的图形表示**

两种分类器 (Two Kind of Classifiers)

Soft Classifier

软分类器

A soft classifier predicts the labels for inputs based on the probabilities. For a given input probability for each class (label) is calculated and the input is classified into the class with the highest probability. Higher probability also shows higher accuracy and precision of the model.

软分类器根据概率预测输入的标签。对于给定的输入概率，将计算每个类别(标签)的概率，并将输入分类为具有最高概率的类别。较高的概率也表明该模型具有较高的准确性和精度。

The sigmoid function can be used in this model since we have to predict the probabilities. This is because the sigmoid function exists between (0,1) and probability also exists between the same range.

由于我们必须预测概率，因此可以在此模型中使用S型函数。这是因为S形函数存在于(0,1)之间，概率也存在于相同范围之间。

Sigmoid activation function formula — Author) 作者 **Sigmoid Function** **)S形函数**

Hard Classifier

硬分类器

Hard classifiers do not calculate the probabilities for different categories and give the classification decision based on the decision boundary.

硬分类器不会计算不同类别的概率，而是基于决策边界给出分类决策。

线性和非线性分类器 (Linear and Non- Linear Classifiers)

Linear and Non- Linear Classifiers graphical representation — (Image by Sebastian Raschka on WikimediaCommons) **Graph A represents a linear classifier model. Graph B represents a non-linear classifier model.**

Linear Classification Model

线性分类模型

When the given data of two classes represented on a graph can be separated by drawing a straight line than the two classes are called linearly separable (in graph A above, green dots and blue dots, these two classes are completely separated by a single straight line).

如果可以通过绘制直线来分离图上表示的两个类别的给定数据，则将这两个类别称为线性可分离的(在上面的图表A中，绿点和蓝点，这两个类别完全由一条直线分开)。

There can be infinite lines that can differentiate between two classes.

可能有无限的直线可以区分两个类别。

To find the exact position of the line, the type of classifier used is called a linear classifier. Few examples of linear classifiers are- Logistic Regression, Perceptron, Naive Bayes, etcetera.

为了找到线的确切位置，使用的分类器类型称为线性分类器。线性分类器的几个例子是-Logistic回归，Perceptron，朴素贝叶斯等。

Non-Linear Classification Model

非线性分类模型

Here as we can see in graph B (above), two classes cannot be separated by drawing a straight line and therefore requires an alternative way to solve this kind of problem. Here model generates nonlinear boundaries and how that boundary will look like is defined by non-linear classifiers. Few examples of non-linear classifiers are- Decision Trees, K-Nearest Neighbour, Random Forest, etcetera.

正如我们在图B(上图)中所见，这里无法通过绘制直线来分离两个类，因此需要一种替代方法来解决此类问题。在这里，模型生成非线性边界，非线性分类器定义了边界的外观。非线性分类器的几个例子是-决策树，K最近邻，随机森林等。

回归 (Regression)

Unlike classification, here the regression model is trained in such a way that it predicts continuous numerical value as an output based on input variables.

与分类不同，此处对回归模型进行了训练，使其基于输入变量将连续数值预测为输出。

The algorithm maps the input data (x) to continuous or numerical data(y).

该算法将输入数据(x)映射到连续或数值数据(y)。

There are several kinds of regression algorithms in machine learning like- linear regression, polynomial regression, quantile regression, lasso regression, etc. Linear regression is the simplest method of regression.

机器学习中有几种回归算法，例如线性回归，多项式回归，分位数回归，套索回归等。线性回归是最简单的回归方法。

线性回归 (Linear Regression)

Sewaqu on Sewaqu在 WikimediaCommons) WikimediaCommons上的图像) **Graphical Representation of Linear Regression Problem** **线性回归问题的图形表示**

This approach is generally used for predictive analysis. In this case, a linear relationship is set up between the x-axis feature and the y-axis feature. But as you can see in the graph the line does not pass through every point, but it represents a relationship between the two.

这种方法通常用于预测分析。在这种情况下，在x轴特征和y轴特征之间建立了线性关系。但是，正如您在图表中看到的那样，该线并没有穿过每个点，而是代表了两者之间的关系。

Simple linear regression relation can be represented in the form of an equation as:

简单的线性回归关系可以用方程的形式表示为：

y = wx + b

y = wx + b

Here, y is numerical output, w is the weight (slope), x is the input variable and b is the bias (or y-intercept).

此处，y是数字输出，w是权重(斜率)，x是输入变量，b是偏差(或y截距)。

Regression models can be used in the prediction of temperature, trend forecast, analyze the effect of change of one variable on other variables.

回归模型可用于温度预测，趋势预测， 分析一个变量的变化对其他变量的影响。

结论 (Conclusion)

Supervised learning is the easiest and simplest sub-branch of machine learning. Identification of the correct algorithm to structure the model is very necessary and I hope you are able to understand the difference between regression and classification after reading this article. Try implementing these concepts for better understanding.

监督学习是机器学习最简单，最简单的分支。确定正确的算法以构建模型非常必要，我希望您在阅读本文后能够理解回归和分类之间的区别。尝试实施这些概念以更好地理解。

If you have any questions or comments, please post them in the comment section.

如果您有任何问题或意见，请在评论部分中发布。

Understanding Classification and Regression 了解分类和回归

Sources:https://developers.google.com/machine-learning/crash-course/ml-introhttps://www.educative.io/edpresso/what-is-the-difference-between-regression-and-classificationhttps://www.statisticssolutions.com/what-is-linear-regression/https://www.geeksforgeeks.org/ml-classification-vs-regression/