Difference between Classification and Regression

最新推荐文章于 2024-10-31 17:05:07 发布

我不读书的

最新推荐文章于 2024-10-31 17:05:07 发布

阅读量1.1k

点赞数 33

分类专栏：机器学习文章标签：人工智能机器学习

本文链接：https://blog.csdn.net/dadwantu/article/details/135856079

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本文对比了分类和回归在预测问题中的应用，强调了分类用于预测离散类别标签，而回归则处理连续数值。作者通过实例介绍了决策树分类、K近邻算法以及线性回归（包括简单线性、多元线性和多项式回归）等方法。

摘要由CSDN通过智能技术生成

Difference between Classification and Regression

一、Abstract
二、Definition
- 1.Classification
- 2.Regression
三、Conclusion

一、Abstract

As we all know,both regression and classification can be used in prediction problems.After I have read the paper-“You Only Look once”(YOLO),the approach they used in object detection is regression instead of repurposes classifier,I write this blog to help myself understand the difference.

二、Definition

1.Classification

from my perspective,classification is a model that you input variables(such as attributes like age,gender),the model make a decision to output a label or category.And the mainly classification algorithms are as follow:
(1)Decision Tree classification:
In this algorithm, a classification model is created by building a decision tree where every node of the tree is a test case for an attribute and each branch coming from the node is a possible value for that attribute.There is a instance of it.

# 加载数据集iris，仅使用前面两列特征
dataset=load_iris()
X=dataset.data[:,:2]
y=dataset.target
# 生成所有测试样本点
xZero,xFirst=X[:,0],X[:,1]
xMin,xMax=xZero.min()-1,xZero.max()+1
yMin,yMax=xFirst.min()-1,xFirst.max()+1
# 定义栅格间距
h=0.2
xx,yy=np.meshgrid(np.arange(xMin,xMax,h),
                 np.arange(yMin,yMax,h))
                 # 创建分类器，并进行训练
modal=DecisionTreeClassifier()
modal.fit(X,y)
# 创建画布
fig,ax=plt.subplots(figsize=(5,5))
plt.subplots_adjust(wspace=0.5,hspace=0.5)
# 预测并显示结果
z=modal.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)
ax.contourf(xx,yy,z,cmap=plt.cm.coolwarm,alpha=0.8)
# 显示训练样本
ax.scatter(xZero,xFirst,c=y,cmap=plt.cm.coolwarm,s=10,edgecolors='k')
ax.set_xlim(xx.min(),xx.max())
ax.set_ylim(yy.min(),yy.max())
ax.set_xlabel('sepal length')
ax.set_ylabel('sepal width')
ax.set_title('DecisionTreeClassifier')
# 下图三种颜色的圈分别代表三种种类
print(dataset['target_names'])

The result is as Fig.1.
Fig.1
(2)K-nearest neighbors
The K-nearest neighbors algorithm assumes that similar things exist in close proximity to each other. The main goal of the algorithm is to determine how likely it is for a data point to be a part of the specific group.There is a instance of KNN.

# 生成数据并可视化
# center=5，代表5个聚类
data=make_blobs(n_samples=500,n_features=2,centers=5,
               cluster_std=1.0,random_state=3)
x,y=data
plt.scatter(x[:,0],x[:,1],s=80,c=y,cmap=plt.cm.spring,edgecolors='k')
# 创建分类器并拟合
clf=KNeighborsClassifier()
clf.fit(x,y)
xMin,xMax=x[:,0].min()-1,x[:,0].max()+1
yMin,yMax=x[:,0].min()-1,x[:,0].max()+1
xx,yy=np.meshgrid(np.arange(xMin,xMax,.02),np.arange(yMin,yMax,.02))
z=clf.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)
plt.pcolormesh(xx,yy,z,cmap=plt.cm.Pastel1)
plt.scatter(x[:,0],x[:,1],s=80,c=y,cmap=plt.cm.spring,edgecolors='k')
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.title("Classifier:KNN")
# 预测
plt.scatter(-5,5,marker="*",c='red',s=200)
res=clf.predict([[-5,5]])
plt.text(-5,5,res)
plt.text(3.75,-13,"Score{:.2f}".format(clf.score(x,y)))

The result is as Fig.2.
Fig.2

2.Regression

Regression algorithms predict a continuous value based on the input variables. The main goal of regression problems is to estimate a mapping function based on the input and output variables.The mainly algorithms are as follow:
**(1)Simple Linear regression
With simple linear regression, you can estimate the relationship between one independent variable and another dependent variable using a straight line, given both variables are quantitative.There is a instance as follow:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
# Data Prepare
# load datasets
diabeteX,diabeteY=load_diabetes(return_X_y=True)
# use one feature
diabeteX=diabeteX[:,np.newaxis,2]
# split the data into train|test sets
# from first element to -20,trian size U-20
diabeteX_train=diabeteX[:-20]
# from -20 to the end,test size 20
diabeteX_test=diabeteX[-20:]
# split the label into trian|test sets
diabeteY_trian=diabeteY[:-20]
diabeteY_test=diabeteY[-20:]
# MM
# create ordinary linear regression modle
reg=linear_model.LinearRegression()
# train the modle using train set
reg.fit(diabeteX_train,diabeteY_trian)
# P&S
# make predition using test sets
diabeteY_prediction=reg.predict(diabeteX_test)
# print the parameters
print('coefficients:',reg.coef_)
print('mean suqared error:{:.2f}'.format(mean_squared_error(
    diabeteY_test,diabeteY_prediction)))
print('coefficient of determination:{:.2f}'.format(r2_score(diabeteY_test,                                                      diabeteY_prediction)))
# show in plt
plt.scatter(diabeteX_test,diabeteY_test,color='black')
plt.plot(diabeteX_test,diabeteY_prediction,color='blue',linewidth=3)
plt.title('Ordinary linear regression')

The result is as shown in Fig.3.

Fig.3

(2)Multiple Linear Regression

An extension of simple linear regression, multiple regression can predict the values of a dependent variable based on the values of two or more independent variables.

(3)Polynomial Regression

The main aim of polynomial regression is to model or find a nonlinear relationship between dependent and independent variables.

三、Conclusion

Maybe the most difference between Regression and Classification is that classification helps us to predict discrete class labels,while regression predicts continuous quantity. Of course there is also overlaps between them:
regression model can predict discrete value which is in the form of an integer quantity,for example,we can use sigmoid function to map the outputs from zero to one,and then convert them to different categories.Classification model can predicts a continuous value if it is in the form of a class label probability.