sklearn分类

最新推荐文章于 2024-05-30 16:02:36 发布

littletomatodonkey

最新推荐文章于 2024-05-30 16:02:36 发布

阅读量5.4k

点赞数

分类专栏： python相关机器学习文章标签： python sklearn 机器学习分类器

本文链接：https://blog.csdn.net/u012526003/article/details/79054012

版权

python相关同时被 2 个专栏收录

24 篇文章 0 订阅

订阅专栏

机器学习

24 篇文章 4 订阅

订阅专栏

Classification

MNIST测试

MNIST是一个非常基本的数据集，利用sklearn可以直接得到，如果在fetch_mldata的时候一直出现问题，则可以先下载好数据集，到指定文件夹，然后在fetct时，指定一下数据所在文件夹即可
MNIST是一个字典型数据，里面包含3个key
- DESCR：数据集的描述
- data：数据，每一行是一个样本，每一列是一个特征
- target：数据的labels
出错以及下载参考链接：http://blog.csdn.net/u014567062/article/details/78879635
数据集中已经将数据按照训练集和测试集排序好，其中前60000行数据是训练集，后60000行数据是测试集

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original", data_home="./datasets")
print(mnist)
X, y = mnist["data"], mnist["target"]
print( X.shape )

{'DESCR': 'mldata.org dataset: mnist-original', 'data': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8), 'COL_NAMES': ['label', 'data'], 'target': array([ 0.,  0.,  0., ...,  9.,  9.,  9.])}
(70000, 784)

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow( some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()
print("value is : " , y[36000] )
plt.plot( y )
plt.show()

这里写图片描述

value is :  5.0

这里写图片描述

SGD分类测试

可以进行二分类测试
测试之前建议把训练数据打乱(洗牌)

from sklearn.linear_model import SGDClassifier
X_train,X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

shuffle_idx = np.random.permutation( 60000 )
X_train, y_train = X_train[shuffle_idx], y_train[shuffle_idx]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd_clf = SGDClassifier( )
sgd_clf.fit( X_train, y_train_5 )
print(sgd_clf.predict( [some_digit] ))

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ True]

交叉验证的准确率测试

测量模型准确性比较好的方法是交叉验证
StratiKFold可以实现K折分层验证
直接利用K折交叉验证cross_val_score可以直接实现对所有数据的交叉验证

from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
# 将数据分为三部分，每次取两部分进行训练，另一部分进行测试
skfolds = StratifiedKFold( n_splits=3 )

for train_idx, test_idx in skfolds.split(X_train, y_train_5):
    clone_clf = clone( sgd_clf )
    X_train_folds = X_train[train_idx]
    y_train_folds = (y_train_5[train_idx])
    X_test_fold = X_train[test_idx]
    y_test_fold = (y_train_5[test_idx])

    clone_clf.fit( X_train_folds, y_train_folds )
    y_pred = clone_clf.predict( X_test_fold )
    n_correct = sum( y_pred == y_test_fold )
    print( n_correct / len(y_pred) )

print("-------------------------------")

from sklearn.model_selection import cross_val_score
accu = cross_val_score( sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy" )
print( accu )

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.9596


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.85915


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.95505
-------------------------------


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.95375  0.9636   0.9649 ]

问题

尽管上面的准确率看起来很高，但是结果并不是非常可靠，因为只是采用了一个准确率的指标。
我们的目标是找出是否为5，因为测试集中大概有10%的数据是5，因此即使我们给出一个非常不靠谱的结果，对于所有测试集，均返回0，得到的准确率也有90%左右（所有结果不是5的测试集均预测正确）
在对模型进行准确性判断时，尤其是对于分布有偏差的数据，需要采用更多的模型评判方法

from sklearn.base import BaseEstimator
class Never5Classifier( BaseEstimator ):
    def fit( self, X, y=None ):
        pass
    def predict(slef, X):
        return np.zeros( (len(X), 1), dtype=bool )
never_5_clf = Never5Classifier()
n5 = cross_val_score( never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy" )
print( n5 )

[ 0.90835  0.91015  0.91045]

混淆矩阵

可以采用confusion matrix，它可以统计模型预测的TP,TN,FP,FN的值，比之前单一的准确性判断方法要可靠一些
混淆矩阵中，每一行都代表真实结果的一个分类，每一列代表预测结果的一个分类

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
y_train_pred = cross_val_predict( sgd_clf, X_train, y_train_5, cv=3 )

cm = confusion_matrix( y_train_5, y_train_pred )
print(cm)

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)

from sklearn.metrics import precision_score, recall_score, f1_score
p_score = precision_score( y_train_5, y_train_pred )
r_score = recall_score( y_train_5, y_train_pred )
f1 = f1_score( y_train_5, y_train_pred )
print( p_score, r_score, f1 )

0.763479511143 0.783619258439 0.773418297679

选择合适的分类阈值

对于这种分类问题，不同的分类阈值可以给出不同的输出结果，但是在sklearn中，无法直接通过直接修改阈值而输出结果，但是我们可以首先得到决策函数得到的结果，然后再手动确定阈值，得到预测的结果
为了使得模型更加完善，我们需要选择合适的阈值，即使得准确率和召回率都比较大，因此在这里我们可以首先绘制出准确率和召回率随阈值的变化关系，然后再选择合适的阈值

y_scores = sgd_clf.decision_function( [some_digit] )
print( y_scores)
thres = 0
y_some_digit_pred = (y_scores > thres)
print( y_some_digit_pred )

### 绘制准确率和召回率曲线
from sklearn.metrics import precision_recall_curve
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot( thresholds, precisions[:-1], "b--", label="Precision" )
    plt.plot( thresholds, recalls[:-1], "g-", label="Recall" )
    plt.xlabel( "Threshold" )
    plt.legend(loc="upper left")
    plt.ylim( [0, 1] )
    plt.show()

y_scores = cross_val_predict( sgd_clf, X_train, y_train_5, cv=3, method="decision_function" )
precisions, recalls, thresholds = precision_recall_curve( y_train_5, y_scores )
plot_precision_recall_vs_threshold( precisions, recalls, thresholds )
plt.plot( recalls, precisions )
plt.show()

[ 34731.47869452]
[ True]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)

这里写图片描述

plt.plot( recalls, precisions )
plt.show()

这里写图片描述

y_train_pred_90 = (y_scores > 150000)
print( precision_score( y_train_5, y_train_pred_90 ) )
print( recall_score( y_train_5, y_train_pred_90 ) )

0.907714701601
0.575170632725

ROC曲线

也可以用ROC曲线对模型的性能进行评价
ROC曲线的x轴是FPR，，即1与特异性的差值；y轴是TPR，即召回率
评价一个模型的好坏可以用AUC值来判断，即ROC曲线与x轴的包络面积，AUC越大，模型越好
下面同时也利用RF(随机森林)进行建模，比较SGD与RF的性能
直接由ROC曲线可以看出，RF比SGD得到的模型性能要更好

from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier
def plot_roc_curve( fpr, tpr, label=None):
    plt.plot( fpr, tpr, linewidth=2, label=label )
    plt.plot( [0,1], [0,1], "k--" )
    plt.axis([0,1,0,1])
    plt.xlabel( "False Positive Rate" )
    plt.ylabel( "True Positive Rate" )

sgd_score = roc_auc_score(y_train_5, y_scores)
print("auc : ", s)
fpr, tpr, thresholds = roc_curve( y_train_5, y_scores )
plot_roc_curve( fpr, tpr, "SGD" )

forest_clf = RandomForestClassifier(random_state=42)
# RF的预测结果，是NX2的矩阵
y_probas_forest = cross_val_predict(forest_clf, X_train, y_train_5, cv=3, method="predict_proba")
y_scores_forest = y_probas_forest[:,1]
fpr_forest, tpr_forest, thresholds_forest = roc_curve( y_train_5, y_scores_forest )
plot_roc_curve( fpr_forest, tpr_forest, "RF" )
plt.legend(loc="lower right")
plt.show()

auc :  0.961351839086

这里写图片描述

多分类问题

scikit-learn能够非常方便地处理多分类问题，除了SVM分类（只能处理二分类问题），其他的可以自动识别是否为多分类问题，并且生成对应的模型
对于多分类问题，decision_function返回的结果是一个1XN的矩阵，其中N是分类的类别数，取其中的最大值，对应的下标就是预测的类别
要看一下具体分成了多少类，可以利用分类器的classes_属性
如果要强行使用二分类的思想去解决多分类问题。比如说10分类的问题，则需要训练45个二分类器，找出得分最高的一个分类。那么可以使用OneVsOneClassifier
训练多分类的随机森的方式类似，也很简单

from sklearn.multiclass import OneVsOneClassifier
sgd_clf.fit( X_train, y_train )
res = sgd_clf.predict( [some_digit] )
scores = sgd_clf.decision_function( [some_digit] )
print( scores )
print( np.argmax( scores ))
print( res )
print( sgd_clf.classes_ )

# 下面的代码会训练出45个二分类的SGD分类器，并给出最后的结果
ovo_clf = OneVsOneClassifier( SGDClassifier(random_state=42) )
ovo_clf.fit( X_train, y_train )
res = ovo_clf.predict( [some_digit] )
print(res)
print( len(ovo_clf.estimators_) )

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[[ -34733.73484798 -409402.23947981 -300691.59109179 -159122.19065008
  -461813.88562589  142331.94837348 -880567.93015997 -234982.6458743
  -540463.58233764 -681560.4715936 ]]
5
[ 5.]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 5.]

forest_clf.fit( X_train, y_train )
res = forest_clf.predict( [some_digit] )
prob = forest_clf.predict_proba( [some_digit] )
print( res )
print( np.argmax( prob ) )

[ 5.]
5

数据标准化对模型准确率的影响

对数据进行标准化之后，模型的精度会有升高
数据标准化的话，主要是之前说的变为[0,1]区间的数据或者变为N(0,1)的数据
在MNIST数据集中，相对minmax来说，使用zscore标准化能使得sgd模型的精度更高

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
# sgd_scores = cross_val_score( sgd_clf, X_train, y_train, cv=3, scoring="accuracy" )
# print( sgd_scores )
scaler = StandardScaler()
X_train_standard_scaler = scaler.fit_transform( X_train )
standard_scores = cross_val_score( sgd_clf, X_train_standard_scaler, y_train, scoring="accuracy" )
print( standard_scores )
minmax_scaler = MinMaxScaler()
X_train_minmax_scaler = minmax_scaler.fit_transform( X_train )
minmax_scores = cross_val_score( sgd_clf, X_train_minmax_scaler, y_train, scoring="accuracy" )
print( minmax_scores )

# 关于混淆矩阵的可视化
plt.matshow( conf_mx, cmap=plt.cm.spring )
plt.show()

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\utils\validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.91096781  0.90989549  0.90543582]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\utils\validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.87422515  0.87924396  0.88283242]

对于多分类的问题，也可以给出对应的混淆矩阵

y_train_pred = cross_val_predict( sgd_clf, X_train_standard_scaler, y_train, cv=3 )
conf_mx = confusion_matrix( y_train, y_train_pred )
print( conf_mx )

plt.matshow( conf_mx, cmap=plt.cm.spring )
plt.show()

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[[5728    3   22    9    8   54   48   10   38    3]
 [   1 6476   43   25    6   43    8   12  117   11]
 [  57   44 5308  108   81   25   96   57  166   16]
 [  43   39  148 5356    2  214   35   55  133  106]
 [  18   28   43    7 5368   12   57   32   83  194]
 [  63   41   39  196   71 4634  108   29  149   91]
 [  29   27   49    2   46   97 5619    8   41    0]
 [  25   22   67   28   52   11    6 5798   14  242]
 [  49  149   74  165   15  167   59   23 4998  152]
 [  40   31   24   87  159   38    3  224   76 5267]]

# 每一行的所有列求和，最后化为M行1列的矩阵
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = conf_mx / row_sums

np.fill_diagonal(norm_conf_mx, 0)
plt.matshow( norm_conf_mx, cmap=plt.cm.gray )
plt.show()

png

多标签的分类(multilabel)

之前我们的分类任务都是进行单一分类，即所有样本都是属于同一种类别，数据的标签是NX1的矩阵。但是scikit-learn可以对多标签的结果进行分类，即数据的标签含有多列(其实之前的多目标分类也可以视为一种特殊的多标签分类问题，只是所有标签的物理含义都是相同的)
多标签分类中，每个标签的取值都是0或1，即只有2中取值方法
K近邻分类可以用于解决这种多分类问题
在进行预测的时候，我们可以给不同的标签赋予不同的权重，下面的macro指的是给所有标签赋予相同的权重

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = ( y_train % 2 == 1 )
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit( X_train, y_multilabel )

res = knn_clf.predict( [some_digit] )
print( res )

[[False  True]]


y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)
res = f1_score( y_multilabel, y_train_knn_pred, average="macro" )
print( res )

多输出分类( multioutput )

对于多输出分类，每个输出的分类可能会包含多种的情况(多于2种分类)

import numpy as np
noise = np.random.randint( 0, 100, (len(X_train, 784)) )
X_train_mod = X_train + noise
noise = np.random.randint( 0, 100, (len(X_test, 784)) )
X_test_mod = X_test + noise
y_train_mod = y_train
y_test_mod = y_test

# 找出与其最近的样本，相当于去噪
knn_clf.fit( X_train_mod, y_train_mod )
clean_digit = knn_clf.predict( [X_test_mod[100]] )