代码如下:
from sklearn import *
from sklearn.model_selection import KFold
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def result(y_test, pred):
print("Accuracy: ", metrics.accuracy_score(y_test, pred))
print("F1-score: ", metrics.f1_score(y_test, pred))
print("AUC ROC ", metrics.roc_auc_score(y_test, pred))
C = [1e-02, 1e-01, 1e00, 1e01, 1e02]
n_estimator = [10, 100, 1000]
dataset = datasets.make_classification(n_samples=2000, n_features=10)
kf = KFold(n_splits=10)
for train_index, test_index in kf.split(dataset[0]):
x_train, x_test = dataset[0][train_index], dataset[0][test_index]
y_train, y_test = dataset[1][train_index], dataset[1][test_index]
print()
print("Naive Bayes:")
clf = GaussianNB()
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
result(y_test, pred)
print()
print("SVM:")
for c_value in C:
print("C = " + str(c_value) + ": ")
clf = SVC(C = c_value, kernel = 'rbf')
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
result(y_test, pred)
print("##########################")
print()
print("RandomForestClassifier:")
for n_value in n_estimator:
print("n_estimators = " + str(n_value) + ": ")
clf = RandomForestClassifier(n_estimators = n_value)
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
result(y_test, pred)
print("##########################")
由于一共进行了10次训练,每次训练有9组输出,因此输出量庞大,这里挑出第一次和最后一次训练结果展示:
Naive Bayes:
Accuracy: 0.885
F1-score: 0.8959276018099547
AUC ROC 0.8863289594140983
SVM:
C = 0.01:
Accuracy: 0.88
F1-score: 0.888888888888889
AUC ROC 0.8845488760044756
##########################
C = 0.1:
Accuracy: 0.895
F1-score: 0.9032258064516129
AUC ROC 0.8991455599633812
##########################
C = 1.0:
Accuracy: 0.905
F1-score: 0.9140271493212669
AUC ROC 0.9066727698097853
##########################
C = 10.0:
Accuracy: 0.91
F1-score: 0.9181818181818181
AUC ROC 0.9124198962465669
##########################
C = 100.0:
Accuracy: 0.88
F1-score: 0.891891891891892
AUC ROC 0.8805818329773167
##########################
RandomForestClassifier:
n_estimators = 10:
Accuracy: 0.925
F1-score: 0.9315068493150683
AUC ROC 0.9283389278811922
##########################
n_estimators = 100:
Accuracy: 0.945
F1-score: 0.9506726457399104
AUC ROC 0.94603804292544
##########################
n_estimators = 1000:
Accuracy: 0.945
F1-score: 0.9506726457399104
AUC ROC 0.94603804292544
##########################
Naive Bayes:
Accuracy: 0.865
F1-score: 0.8402366863905325
AUC ROC 0.8601591187270503
SVM:
C = 0.01:
Accuracy: 0.89
F1-score: 0.8705882352941177
AUC ROC 0.8863729090167278
##########################
C = 0.1:
Accuracy: 0.905
F1-score: 0.888888888888889
AUC ROC 0.9023867809057526
##########################
C = 1.0:
Accuracy: 0.91
F1-score: 0.8953488372093024
AUC ROC 0.9082007343941247
##########################
C = 10.0:
Accuracy: 0.925
F1-score: 0.9112426035502958
AUC ROC 0.921358629130967
##########################
C = 100.0:
Accuracy: 0.905
F1-score: 0.8875739644970415
AUC ROC 0.9009587923296614
##########################
RandomForestClassifier:
n_estimators = 10:
Accuracy: 0.92
F1-score: 0.9058823529411765
AUC ROC 0.9169726642186863
##########################
n_estimators = 100:
Accuracy: 0.93
F1-score: 0.9176470588235294
AUC ROC 0.927172582619339
##########################
n_estimators = 1000:
Accuracy: 0.935
F1-score: 0.923076923076923
AUC ROC 0.9315585475316197
##########################
总结:可以看到SVC和RFC算法在三种方法下的评估结果均比NaiveBayes高,表明SVC和RFC的分类比NaiveBayes更加精确。从结果中看来,三种算法中,RCF算法的平均准确度最高。对于三种评估方法,F1和ROCAUC方法评估结果普遍比ACC高,表明F1与ROCAUC方法的评估标准更加严格。