sklearn分类

Classification

MNIST测试

  • MNIST是一个非常基本的数据集,利用sklearn可以直接得到,如果在fetch_mldata的时候一直出现问题,则可以先下载好数据集,到指定文件夹,然后在fetct时,指定一下数据所在文件夹即可
  • MNIST是一个字典型数据,里面包含3个key
    • DESCR:数据集的描述
    • data:数据,每一行是一个样本,每一列是一个特征
    • target:数据的labels
  • 出错以及下载参考链接:http://blog.csdn.net/u014567062/article/details/78879635
  • 数据集中已经将数据按照训练集和测试集排序好,其中前60000行数据是训练集,后60000行数据是测试集
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original", data_home="./datasets")
print(mnist)
X, y = mnist["data"], mnist["target"]
print( X.shape )
{'DESCR': 'mldata.org dataset: mnist-original', 'data': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8), 'COL_NAMES': ['label', 'data'], 'target': array([ 0.,  0.,  0., ...,  9.,  9.,  9.])}
(70000, 784)
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow( some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()
print("value is : " , y[36000] )
plt.plot( y )
plt.show()

这里写图片描述

value is :  5.0

这里写图片描述

SGD分类测试

  • 可以进行二分类测试
  • 测试之前建议把训练数据打乱(洗牌)
from sklearn.linear_model import SGDClassifier
X_train,X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

shuffle_idx = np.random.permutation( 60000 )
X_train, y_train = X_train[shuffle_idx], y_train[shuffle_idx]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd_clf = SGDClassifier( )
sgd_clf.fit( X_train, y_train_5 )
print(sgd_clf.predict( [some_digit] ))
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ True]

交叉验证的准确率测试

  • 测量模型准确性比较好的方法是交叉验证
  • StratiKFold可以实现K折分层验证
  • 直接利用K折交叉验证cross_val_score可以直接实现对所有数据的交叉验证
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
# 将数据分为三部分,每次取两部分进行训练,另一部分进行测试
skfolds = StratifiedKFold( n_splits=3 )

for train_idx, test_idx in skfolds.split(X_train, y_train_5):
    clone_clf = clone( sgd_clf )
    X_train_folds = X_train[train_idx]
    y_train_folds = (y_train_5[train_idx])
    X_test_fold = X_train[test_idx]
    y_test_fold = (y_train_5[test_idx])

    clone_clf.fit( X_train_folds, y_train_folds )
    y_pred = clone_clf.predict( X_test_fold )
    n_correct = sum( y_pred == y_test_fold )
    print( n_correct / len(y_pred) )

print("-------------------------------")

from sklearn.model_selection import cross_val_score
accu = cross_val_score( sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy" )
print( accu )
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.9596


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.85915


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


0.95505
-------------------------------


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.95375  0.9636   0.9649 ]

问题

  • 尽管上面的准确率看起来很高,但是结果并不是非常可靠,因为只是采用了一个准确率的指标。
  • 我们的目标是找出是否为5,因为测试集中大概有10%的数据是5,因此即使我们给出一个非常不靠谱的结果,对于所有测试集,均返回0,得到的准确率也有90%左右(所有结果不是5的测试集均预测正确)
  • 在对模型进行准确性判断时,尤其是对于分布有偏差的数据,需要采用更多的模型评判方法
from sklearn.base import BaseEstimator
class Never5Classifier( BaseEstimator ):
    def fit( self, X, y=None ):
        pass
    def predict(slef, X):
        return np.zeros( (len(X), 1), dtype=bool )
never_5_clf = Never5Classifier()
n5 = cross_val_score( never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy" )
print( n5 )
[ 0.90835  0.91015  0.91045]

混淆矩阵

  • 可以采用confusion matrix,它可以统计模型预测的TP,TN,FP,FN的值,比之前单一的准确性判断方法要可靠一些
  • 混淆矩阵中,每一行都代表真实结果的一个分类,每一列代表预测结果的一个分类
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
y_train_pred = cross_val_predict( sgd_clf, X_train, y_train_5, cv=3 )

cm = confusion_matrix( y_train_5, y_train_pred )
print(cm)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
from sklearn.metrics import precision_score, recall_score, f1_score
p_score = precision_score( y_train_5, y_train_pred )
r_score = recall_score( y_train_5, y_train_pred )
f1 = f1_score( y_train_5, y_train_pred )
print( p_score, r_score, f1 )
0.763479511143 0.783619258439 0.773418297679

选择合适的分类阈值

  • 对于这种分类问题,不同的分类阈值可以给出不同的输出结果,但是在sklearn中,无法直接通过直接修改阈值而输出结果,但是我们可以首先得到决策函数得到的结果,然后再手动确定阈值,得到预测的结果
  • 为了使得模型更加完善,我们需要选择合适的阈值,即使得准确率和召回率都比较大,因此在这里我们可以首先绘制出准确率和召回率随阈值的变化关系,然后再选择合适的阈值
y_scores = sgd_clf.decision_function( [some_digit] )
print( y_scores)
thres = 0
y_some_digit_pred = (y_scores > thres)
print( y_some_digit_pred )

### 绘制准确率和召回率曲线
from sklearn.metrics import precision_recall_curve
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot( thresholds, precisions[:-1], "b--", label="Precision" )
    plt.plot( thresholds, recalls[:-1], "g-", label="Recall" )
    plt.xlabel( "Threshold" )
    plt.legend(loc="upper left")
    plt.ylim( [0, 1] )
    plt.show()

y_scores = cross_val_predict( sgd_clf, X_train, y_train_5, cv=3, method="decision_function" )
precisions, recalls, thresholds = precision_recall_curve( y_train_5, y_scores )
plot_precision_recall_vs_threshold( precisions, recalls, thresholds )
plt.plot( recalls, precisions )
plt.show()
[ 34731.47869452]
[ True]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)

这里写图片描述

plt.plot( recalls, precisions )
plt.show()

这里写图片描述

y_train_pred_90 = (y_scores > 150000)
print( precision_score( y_train_5, y_train_pred_90 ) )
print( recall_score( y_train_5, y_train_pred_90 ) )
0.907714701601
0.575170632725

ROC曲线

  • 也可以用ROC曲线对模型的性能进行评价
  • ROC曲线的x轴是FPR,,即1与特异性的差值;y轴是TPR,即召回率
  • 评价一个模型的好坏可以用AUC值来判断,即ROC曲线与x轴的包络面积,AUC越大,模型越好
  • 下面同时也利用RF(随机森林)进行建模,比较SGD与RF的性能
  • 直接由ROC曲线可以看出,RF比SGD得到的模型性能要更好
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier
def plot_roc_curve( fpr, tpr, label=None):
    plt.plot( fpr, tpr, linewidth=2, label=label )
    plt.plot( [0,1], [0,1], "k--" )
    plt.axis([0,1,0,1])
    plt.xlabel( "False Positive Rate" )
    plt.ylabel( "True Positive Rate" )

sgd_score = roc_auc_score(y_train_5, y_scores)
print("auc : ", s)
fpr, tpr, thresholds = roc_curve( y_train_5, y_scores )
plot_roc_curve( fpr, tpr, "SGD" )

forest_clf = RandomForestClassifier(random_state=42)
# RF的预测结果,是NX2的矩阵
y_probas_forest = cross_val_predict(forest_clf, X_train, y_train_5, cv=3, method="predict_proba")
y_scores_forest = y_probas_forest[:,1]
fpr_forest, tpr_forest, thresholds_forest = roc_curve( y_train_5, y_scores_forest )
plot_roc_curve( fpr_forest, tpr_forest, "RF" )
plt.legend(loc="lower right")
plt.show()

auc :  0.961351839086

这里写图片描述

多分类问题

  • scikit-learn能够非常方便地处理多分类问题,除了SVM分类(只能处理二分类问题),其他的可以自动识别是否为多分类问题,并且生成对应的模型
  • 对于多分类问题,decision_function返回的结果是一个1XN的矩阵,其中N是分类的类别数,取其中的最大值,对应的下标就是预测的类别
  • 要看一下具体分成了多少类,可以利用分类器的classes_属性
  • 如果要强行使用二分类的思想去解决多分类问题。比如说10分类的问题,则需要训练45个二分类器,找出得分最高的一个分类。那么可以使用OneVsOneClassifier
  • 训练多分类的随机森的方式类似,也很简单
from sklearn.multiclass import OneVsOneClassifier
sgd_clf.fit( X_train, y_train )
res = sgd_clf.predict( [some_digit] )
scores = sgd_clf.decision_function( [some_digit] )
print( scores )
print( np.argmax( scores ))
print( res )
print( sgd_clf.classes_ )

# 下面的代码会训练出45个二分类的SGD分类器,并给出最后的结果
ovo_clf = OneVsOneClassifier( SGDClassifier(random_state=42) )
ovo_clf.fit( X_train, y_train )
res = ovo_clf.predict( [some_digit] )
print(res)
print( len(ovo_clf.estimators_) )
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[[ -34733.73484798 -409402.23947981 -300691.59109179 -159122.19065008
  -461813.88562589  142331.94837348 -880567.93015997 -234982.6458743
  -540463.58233764 -681560.4715936 ]]
5
[ 5.]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 5.]
forest_clf.fit( X_train, y_train )
res = forest_clf.predict( [some_digit] )
prob = forest_clf.predict_proba( [some_digit] )
print( res )
print( np.argmax( prob ) )
[ 5.]
5

数据标准化对模型准确率的影响

  • 对数据进行标准化之后,模型的精度会有升高
  • 数据标准化的话,主要是之前说的变为[0,1]区间的数据或者变为N(0,1)的数据
  • 在MNIST数据集中,相对minmax来说,使用zscore标准化能使得sgd模型的精度更高
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
# sgd_scores = cross_val_score( sgd_clf, X_train, y_train, cv=3, scoring="accuracy" )
# print( sgd_scores )
scaler = StandardScaler()
X_train_standard_scaler = scaler.fit_transform( X_train )
standard_scores = cross_val_score( sgd_clf, X_train_standard_scaler, y_train, scoring="accuracy" )
print( standard_scores )
minmax_scaler = MinMaxScaler()
X_train_minmax_scaler = minmax_scaler.fit_transform( X_train )
minmax_scores = cross_val_score( sgd_clf, X_train_minmax_scaler, y_train, scoring="accuracy" )
print( minmax_scores )

# 关于混淆矩阵的可视化
plt.matshow( conf_mx, cmap=plt.cm.spring )
plt.show()

E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\utils\validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.91096781  0.90989549  0.90543582]


E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\utils\validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[ 0.87422515  0.87924396  0.88283242]
  • 对于多分类的问题,也可以给出对应的混淆矩阵
y_train_pred = cross_val_predict( sgd_clf, X_train_standard_scaler, y_train, cv=3 )
conf_mx = confusion_matrix( y_train, y_train_pred )
print( conf_mx )

plt.matshow( conf_mx, cmap=plt.cm.spring )
plt.show()
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)
E:\InstallFolders\Anaconda342\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
  "and default tol will be 1e-3." % type(self), FutureWarning)


[[5728    3   22    9    8   54   48   10   38    3]
 [   1 6476   43   25    6   43    8   12  117   11]
 [  57   44 5308  108   81   25   96   57  166   16]
 [  43   39  148 5356    2  214   35   55  133  106]
 [  18   28   43    7 5368   12   57   32   83  194]
 [  63   41   39  196   71 4634  108   29  149   91]
 [  29   27   49    2   46   97 5619    8   41    0]
 [  25   22   67   28   52   11    6 5798   14  242]
 [  49  149   74  165   15  167   59   23 4998  152]
 [  40   31   24   87  159   38    3  224   76 5267]]
# 每一行的所有列求和,最后化为M行1列的矩阵
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = conf_mx / row_sums

np.fill_diagonal(norm_conf_mx, 0)
plt.matshow( norm_conf_mx, cmap=plt.cm.gray )
plt.show()

png

多标签的分类(multilabel)

  • 之前我们的分类任务都是进行单一分类,即所有样本都是属于同一种类别,数据的标签是NX1的矩阵。但是scikit-learn可以对多标签的结果进行分类,即数据的标签含有多列(其实之前的多目标分类也可以视为一种特殊的多标签分类问题,只是所有标签的物理含义都是相同的)
  • 多标签分类中,每个标签的取值都是0或1,即只有2中取值方法
  • K近邻分类可以用于解决这种多分类问题
  • 在进行预测的时候,我们可以给不同的标签赋予不同的权重,下面的macro指的是给所有标签赋予相同的权重
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = ( y_train % 2 == 1 )
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit( X_train, y_multilabel )

res = knn_clf.predict( [some_digit] )
print( res )
[[False  True]]

y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)
res = f1_score( y_multilabel, y_train_knn_pred, average="macro" )
print( res )

多输出分类( multioutput )

  • 对于多输出分类,每个输出的分类可能会包含多种的情况(多于2种分类)
import numpy as np
noise = np.random.randint( 0, 100, (len(X_train, 784)) )
X_train_mod = X_train + noise
noise = np.random.randint( 0, 100, (len(X_test, 784)) )
X_test_mod = X_test + noise
y_train_mod = y_train
y_test_mod = y_test

# 找出与其最近的样本,相当于去噪
knn_clf.fit( X_train_mod, y_train_mod )
clean_digit = knn_clf.predict( [X_test_mod[100]] )

(60000,)
1
1
  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

littletomatodonkey

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值