svc预测概率_如何基于sklearn中的预测概率对实例进行排名

最新推荐文章于 2022-06-01 15:12:46 发布

weixin_39773239

最新推荐文章于 2022-06-01 15:12:46 发布

阅读量153

点赞数

文章标签： svc预测概率

本文链接：https://blog.csdn.net/weixin_39773239/article/details/112841233

版权

I am using sklearn's support vector machine (SVC) as follows to get the prediction probability of my instances in my dataset as follows using 10-fold cross validation.

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data

y = iris.target

clf=SVC(class_weight="balanced")

proba = cross_val_predict(clf, X, y, cv=10, method='predict_proba')

print(clf.classes_)

print(proba[:,1])

print(np.argsort(proba[:,1]))

My expected output is as follows for print(proba[:,1]) and print(np.argsort(proba[:,1])) where the first one indicates the prediction probability of all instances for class 1 and the second one indicates the corresponding index of the data instance for each probability.

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.

0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

0.2 0. 0. 0. 0. 0.1 0. 0. 0. 0. 0. 0. 0. 0. 0.9 1. 0.7 1.

1. 1. 1. 0.7 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.9 0.9 0.1 1.

0.6 1. 1. 1. 0.9 0. 1. 1. 1. 1. 1. 0.4 0.9 0.9 1. 1. 1. 0.9

1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.9 0.

0.1 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.8 0. 0.1 0. 0.1 0. 0.1

0.3 0.2 0. 0.6 0. 0. 0. 0.6 0.4 0. 0. 0. 0.8 0. 0. 0. 0. 0.

0. 0. 0. 0. 0. 0. ]

[ 0 113 112 111 110 109 107 105 104 114 103 101 100 77 148 49 48 47

46 102 115 117 118 147 146 145 144 143 142 141 140 139 137 136 135 132

131 130 128 124 122 120 45 44 149 42 15 26 16 17 18 19 20 21

22 43 23 24 35 34 33 32 31 30 29 28 27 37 13 25 9 10

7 6 5 4 3 8 11 2 1 38 39 40 12 108 116 41 121 70

14 123 125 36 127 126 134 83 72 133 129 52 57 119 138 89 76 50

84 106 85 69 68 97 98 66 65 64 63 62 61 67 60 58 56 55

54 53 51 59 71 73 75 96 95 94 93 92 91 90 88 87 86 82

81 80 79 78 99 74]

My first question is; it seems like SVC does not support predict_proba. Therefore, is it correct if I use proba = cross_val_predict(clf, X, y, cv=10, method='decision_function') instead?

My second question is how to print the classes of prediction probability? I tried clf_classes_. But, I get an error saying AttributeError: 'SVC' object has no attribute 'classes_'. Is there a way to resolve this issue?

Note: I want to get the prediction probability for all the instances using cross validation.

EDIT:

The answer of @KRKirov is great. However, I do not need GridSearchCV and only want to use normal cross validation. Therefore, I changed his code use cross_val_score. Now, I am getting the error NotFittedError: Call fit before prediction.

Is there a way to resolve this issue?

I am happy to provide more details if needed.

解决方案

Cross_val predict is a function which does not return the classifier (in your case the SVC) as part of its output. Therefore you don't get access to the latter and its methods and attributes.

To perform cross-validation and calculate probabilities use scikit-learn's GridSearchCV or RandomizedSearchCV. If you want just a simple cross-validation, pass a parameter dictionary with only one parameter. Once you have the probabilities you can use either pandas or numpy to sort them according to a particular class (1 in the example below).

from sklearn.svm import SVC

from sklearn.model_selection import GridSearchCV

from sklearn import datasets

import pandas as pd

import numpy as np

iris = datasets.load_iris()

X = iris.data

y = iris.target

parameters = {'kernel':(['rbf'])}

svc = SVC(gamma="scale", probability=True)

clf = GridSearchCV(svc, parameters, cv=10)

clf.fit(iris.data, iris.target)

probabilities = pd.DataFrame(clf.predict_proba(X), columns=clf.classes_)

probabilities['Y'] = iris.target

probabilities.columns.name = 'Classes'

probabilities.head()

# Sorting in ascending order by the probability of class 1.

# Showing only the first five rows.

# Note that all information (indices, values) is in one place

probabilities.sort_values(1).head()

Out[49]:

Classes 0 1 2 Y

100 0.006197 0.000498 0.993305 2

109 0.009019 0.001023 0.989959 2

143 0.006664 0.001089 0.992248 2

105 0.010763 0.001120 0.988117 2

144 0.006964 0.001295 0.991741 2

# Alternatively using numpy

indices = np.argsort(probabilities.values[:,1])

proba = probabilities.values[indices, :]

print(indices)

[100 109 143 105 144 122 135 118 104 107 102 140 130 117 120 136 132 131

128 124 125 108 22 148 112 13 115 14 32 37 33 114 35 40 16 4

42 103 2 0 6 36 139 19 145 38 17 47 48 28 49 15 46 129

10 21 7 27 12 39 8 11 1 3 9 45 34 116 29 137 5 31

26 30 141 43 18 111 25 20 41 44 24 23 147 134 113 101 142 110

146 121 149 83 123 127 77 119 133 126 138 70 72 106 52 76 56 86

68 63 54 98 50 84 66 85 78 91 73 51 57 58 93 55 87 75

65 79 90 64 61 60 97 74 94 59 96 81 88 53 95 99 89 80

71 82 69 92 67 62]

# Showing only the first five values of the sorted probabilities for class 1

print(proba[:5, 1])

[0.00049785 0.00102258 0.00108851 0.00112034 0.00129501]

weixin_39773239

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
svc预测概率_如何基于sklearn中的预测概率对实例进行排名

I am using sklearn's support vector machine (SVC) as follows to get the prediction probability of my instances in my dataset as follows using 10-fold cross validation.from sklearn import datasetsiris ...
复制链接

扫一扫