Difference between Classification and Regression

本文对比了分类和回归在预测问题中的应用,强调了分类用于预测离散类别标签,而回归则处理连续数值。作者通过实例介绍了决策树分类、K近邻算法以及线性回归(包括简单线性、多元线性和多项式回归)等方法。
摘要由CSDN通过智能技术生成

Difference between Classification and Regression


一、Abstract

As we all know,both regression and classification can be used in prediction problems.After I have read the paper-“You Only Look once”(YOLO),the approach they used in object detection is regression instead of repurposes classifier,I write this blog to help myself understand the difference.

二、Definition

1.Classification

from my perspective,classification is a model that you input variables(such as attributes like age,gender),the model make a decision to output a label or category.And the mainly classification algorithms are as follow:
(1)Decision Tree classification:
In this algorithm, a classification model is created by building a decision tree where every node of the tree is a test case for an attribute and each branch coming from the node is a possible value for that attribute.There is a instance of it.

# 加载数据集iris,仅使用前面两列特征
dataset=load_iris()
X=dataset.data[:,:2]
y=dataset.target
# 生成所有测试样本点
xZero,xFirst=X[:,0],X[:,1]
xMin,xMax=xZero.min()-1,xZero.max()+1
yMin,yMax=xFirst.min()-1,xFirst.max()+1
# 定义栅格间距
h=0.2
xx,yy=np.meshgrid(np.arange(xMin,xMax,h),
                 np.arange(yMin,yMax,h))
                 # 创建分类器,并进行训练
modal=DecisionTreeClassifier()
modal.fit(X,y)
# 创建画布
fig,ax=plt.subplots(figsize=(5,5))
plt.subplots_adjust(wspace=0.5,hspace=0.5)
# 预测并显示结果
z=modal.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)
ax.contourf(xx,yy,z,cmap=plt.cm.coolwarm,alpha=0.8)
# 显示训练样本
ax.scatter(xZero,xFirst,c=y,cmap=plt.cm.coolwarm,s=10,edgecolors='k')
ax.set_xlim(xx.min(),xx.max())
ax.set_ylim(yy.min(),yy.max())
ax.set_xlabel('sepal length')
ax.set_ylabel('sepal width')
ax.set_title('DecisionTreeClassifier')
# 下图三种颜色的圈分别代表三种种类
print(dataset['target_names'])

The result is as Fig.1.
Fig.1
(2)K-nearest neighbors
The K-nearest neighbors algorithm assumes that similar things exist in close proximity to each other. The main goal of the algorithm is to determine how likely it is for a data point to be a part of the specific group.There is a instance of KNN.

# 生成数据并可视化
# center=5,代表5个聚类
data=make_blobs(n_samples=500,n_features=2,centers=5,
               cluster_std=1.0,random_state=3)
x,y=data
plt.scatter(x[:,0],x[:,1],s=80,c=y,cmap=plt.cm.spring,edgecolors='k')
# 创建分类器并拟合
clf=KNeighborsClassifier()
clf.fit(x,y)
xMin,xMax=x[:,0].min()-1,x[:,0].max()+1
yMin,yMax=x[:,0].min()-1,x[:,0].max()+1
xx,yy=np.meshgrid(np.arange(xMin,xMax,.02),np.arange(yMin,yMax,.02))
z=clf.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)
plt.pcolormesh(xx,yy,z,cmap=plt.cm.Pastel1)
plt.scatter(x[:,0],x[:,1],s=80,c=y,cmap=plt.cm.spring,edgecolors='k')
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.title("Classifier:KNN")
# 预测
plt.scatter(-5,5,marker="*",c='red',s=200)
res=clf.predict([[-5,5]])
plt.text(-5,5,res)
plt.text(3.75,-13,"Score{:.2f}".format(clf.score(x,y)))

The result is as Fig.2.
Fig.2

2.Regression

Regression algorithms predict a continuous value based on the input variables. The main goal of regression problems is to estimate a mapping function based on the input and output variables.The mainly algorithms are as follow:
**(1)Simple Linear regression
With simple linear regression, you can estimate the relationship between one independent variable and another dependent variable using a straight line, given both variables are quantitative.There is a instance as follow:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
# Data Prepare
# load datasets
diabeteX,diabeteY=load_diabetes(return_X_y=True)
# use one feature
diabeteX=diabeteX[:,np.newaxis,2]
# split the data into train|test sets
# from first element to -20,trian size U-20
diabeteX_train=diabeteX[:-20]
# from -20 to the end,test size 20
diabeteX_test=diabeteX[-20:]
# split the label into trian|test sets
diabeteY_trian=diabeteY[:-20]
diabeteY_test=diabeteY[-20:]
# MM
# create ordinary linear regression modle
reg=linear_model.LinearRegression()
# train the modle using train set
reg.fit(diabeteX_train,diabeteY_trian)
# P&S
# make predition using test sets
diabeteY_prediction=reg.predict(diabeteX_test)
# print the parameters
print('coefficients:',reg.coef_)
print('mean suqared error:{:.2f}'.format(mean_squared_error(
    diabeteY_test,diabeteY_prediction)))
print('coefficient of determination:{:.2f}'.format(r2_score(diabeteY_test,                                                      diabeteY_prediction)))
# show in plt
plt.scatter(diabeteX_test,diabeteY_test,color='black')
plt.plot(diabeteX_test,diabeteY_prediction,color='blue',linewidth=3)
plt.title('Ordinary linear regression')

The result is as shown in Fig.3.

Fig.3

(2)Multiple Linear Regression

An extension of simple linear regression, multiple regression can predict the values of a dependent variable based on the values of two or more independent variables.

(3)Polynomial Regression

The main aim of polynomial regression is to model or find a nonlinear relationship between dependent and independent variables.

三、Conclusion

Maybe the most difference between Regression and Classification is that classification helps us to predict discrete class labels,while regression predicts continuous quantity. Of course there is also overlaps between them:
regression model can predict discrete value which is in the form of an integer quantity,for example,we can use sigmoid function to map the outputs from zero to one,and then convert them to different categories.Classification model can predicts a continuous value if it is in the form of a class label probability.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值