sklearn.multiclass 可以处理多类别 (multi-class) 的多标签 (multi-label) 的分类问题。
多类别分类
手写数字有 0-9 十类,但手头上只有两分类估计器 (比如像支撑向量机) 怎么用呢?我们可以采取下面三种常见策略:
-
一对一 (One vs One, OvO):一个分类器用来处理数字 0 和数字 1,一个用来处理数字 0 和数字 2,一个用来处理数字 1 和 2,以此类推。N 个类需要 N(N-1)/2 个分类器。
-
一对其他 (One vs All, OvA):训练 10 个二分类器,每一个对应一个数字,第一个分类 1 和「非1」,第二个分类 2 和「非2」,以此类推。N 个类需要 N 个分类器。
OneVsOneClassifier:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.multiclass import OneVsOneClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import metrics#metrics 来计算各种性能指标
digits = load_digits()
X_train, X_test, y_train, y_test= train_test_split( digits['data'], digits['target'], test_size=0.2 )
print( 'The size of X_train is ', X_train.shape )
print( 'The size of y_train is ', y_train.shape )
print( 'The size of X_test is ', X_test.shape )
print( 'The size of y_test is ', y_test.shape )
fig,axes = plt.subplots(10,10,figsize=(8,8))
fig.subplots_adjust(hspace=0.1,wspace=0.1)
for i,ax in enumerate(axes.flat):
ax.imshow(X_train[i,:].reshape(8,8),cmap="binary",interpolation="nearest")
ax.text(0.05,0.05,str(y_train[i]),transform=ax.transAxes,color="blue")
ax.set_xticks([])
ax.set_yticks([])
plt.show()
ovo_lr=OneVsOneClassifier(LogisticRegression(solver="lbfgs",max_iter=200))#创建一个一对一的多分类器
ovo_lr.fit(X_train,y_train) #开始分类
print(len(ovo_lr.estimators_))#查看分类器的数量
print("train_OVO LR",metrics.accuracy_score(y_train,ovo_lr.predict(X_train)))
print("test_ovo LR",metrics.accuracy_score(y_test,ovo_lr.predict(X_test)))
查看结果:
F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn01.py
The size of X_train is (1437, 64)
The size of y_train is (1437,)
The size of X_test is (360, 64)
The size of y_test is (360,)
45
train_OVO LR 1.0
test_ovo LR 0.9833333333333333
Process finished with exit code 0
OneVsRestClassifier:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsRestClassifier
from sklearn import metrics#metrics 来计算各种性能指标
from sklearn.linear_model import LogisticRegression
digits = load_digits()
X_train, X_test, y_train, y_test= train_test_split( digits['data'], digits['target'], test_size=0.2 )
ova_lr=OneVsRestClassifier(LogisticRegression(solver="lbfgs",max_iter=800))
ova_lr.fit(X_train,y_train)
print( len(ova_lr.estimators_) ) #查看分类器的数量
print("train_ova_lr",metrics.accuracy_score(y_train,ova_lr.predict(X_train)))
print("test_ova_lr",metrics.accuracy_score(y_test,ova_lr.predict(X_test)))
测试结果:
F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn02.py
10
train_ova_lr 0.9972164231036882
test_ova_lr 0.9527777777777777
Process finished with exit code 0
多标签分类:
我们特意为每个数字设计了多标签:
-
标签 1 - 奇数、偶数
-
标签 2 - 小于等于 4,大于 4
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsRestClassifier
from sklearn import metrics# metrics 来计算各种性能指标
from sklearn.linear_model import LogisticRegression
import numpy as np
digits = load_digits()
X_train, X_test, y_train, y_test= train_test_split( digits['data'], digits['target'], test_size=0.2 )
y_train_multilabel = np.c_[ y_train%2==0, y_train<=4 ]
ova_lr =OneVsRestClassifier(LogisticRegression(solver="lbfgs" ,max_iter=800))
ova_lr.fit(X_train ,y_train_multilabel)
print(y_train_multilabel)
print( len(ova_lr.estimators_) ) # 查看分类器的数量
print( y_test[:1] )
print( ova_lr.predict(X_test[:1,:]) )
测试结果:
F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn03.py
[[False False]
[False False]
[ True False]
...
[ True False]
[False True]
[False True]]
2
[6]
[[1 0]]
Process finished with exit code 0