python logisticregression_LogisticRegression:未知标签类型:在python中使用sklearn的“ continuous”...

在尝试使用sklearn的LogisticRegression时,遇到了'Unknown label type: 'continuous''错误。问题出在将连续数值作为分类目标传递给LogisticRegression。解决方案是使用LabelEncoder将连续数值转换为类别,以便正确地进行分类任务。
摘要由CSDN通过智能技术生成

I have the following code to test some of most popular ML algorithms of sklearn python library:

import numpy as np

from sklearn import metrics, svm

from sklearn.linear_model import LinearRegression

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.naive_bayes import GaussianNB

from sklearn.svm import SVC

trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])

trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )

predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])

clf = LinearRegression()

clf.fit(trainingData, trainingScores)

print("LinearRegression")

print(clf.predict(predictionData))

clf = svm.SVR()

clf.fit(trainingData, trainingScores)

print("SVR")

print(clf.predict(predictionData))

clf = LogisticRegression()

clf.fit(trainingData, trainingScores)

print("LogisticRegression")

print(clf.predict(predictionData))

clf = DecisionTreeClassifier()

clf.fit(trainingData, trainingScores)

print("DecisionTreeClassifier")

print(clf.predict(predictionData))

clf = KNeighborsClassifier()

clf.fit(trainingData, trainingScores)

print("KNeighborsClassifier")

print(clf.predict(predictionData))

clf = LinearDiscriminantAnalysis()

clf.fit(trainingData, trainingScores)

print("LinearDiscriminantAnalysis")

print(clf.predict(predictionData))

clf = GaussianNB()

clf.fit(trainingData, trainingScores)

print("GaussianNB")

print(clf.predict(predictionData))

clf = SVC()

clf.fit(trainingData, trainingScores)

print("SVC")

print(clf.predict(predictionData))

The first two works ok, but I got the following error in LogisticRegression call:

root@ubupc1:/home/ouhma# python stack.py

LinearRegression

[ 15.72023529 6.46666667]

SVR

[ 3.95570063 4.23426243]

Traceback (most recent call last):

File "stack.py", line 28, in

clf.fit(trainingData, trainingScores)

File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit

check_classification_targets(y)

File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets

raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

The input data is the same as in the previous calls, so what is going on here?

And by the way, why there is a huge diference in the first prediction of LinearRegression() and SVR() algorithms (15.72 vs 3.95)?

解决方案

You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it).

It would be better to convert your training scores by using scikit's labelEncoder function.

The same is true for your DecisionTree and KNeighbors qualifier.

from sklearn import preprocessing

from sklearn import utils

lab_enc = preprocessing.LabelEncoder()

encoded = lab_enc.fit_transform(trainingScores)

>>> array([1, 3, 2, 0], dtype=int64)

print(utils.multiclass.type_of_target(trainingScores))

>>> continuous

print(utils.multiclass.type_of_target(trainingScores.astype('int')))

>>> multiclass

print(utils.multiclass.type_of_target(encoded))

>>> multiclass

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值