解题步骤大多数参照老师给的PPT
1.Create a classification dataset (n samples 1000, n features 10) 创建一个分类数据集(n个样本1000个,n个特征10个)
X,Y = datasets.make_classification(n_samples = 1000, n_features = 10)
2.Split the dataset using 10-fold cross validation 使用10倍交叉验证分割数据集
kf = cross_validation.KFold(len(iris.data), n_folds=10, shuffle=True)
for train_index, test_index in kf:
X_train, y_train = X[train_index], Y[train_index]
X_test, y_test = X[test_index], Y[test_index]
3.4评估交叉验证的性能
Train the algorithms
GaussianNB
SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
RandomForestClassifier (possible n estimators values [10, 100, 1000])
Evaluate the cross-validated performance
Accuracy
F1-score
AUC ROC
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_for_NB.append(metrics.accuracy_score(y_test, pred))
f1_for_NB.append(metrics.f1_score(y_test, pred,average='micro'))
auc_for_NB.append(metrics.roc_auc_score(y_test, pred))
clf = SVC(C=1e00, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_for_SVC.append(metrics.accuracy_score(y_test, pred))
f1_for_SVC.append(metrics.f1_score(y_test, pred))
auc_for_SVC.append(metrics.roc_auc_score(y_test, pred))
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_for_RFC.append(metrics.accuracy_score(y_test, pred))
f1_for_RFC.append(metrics.f1_score(y_test, pred))
auc_for_RFC.append(metrics.roc_auc_score(y_test, pred))
![](https://i-blog.csdnimg.cn/blog_migrate/0f121b2835724fd99f399d0bdda6b41c.png)
5.总结
可以看到很明显RFC评估效果最好,无论是在Accuracy 、 F1-score 还是在 AUC ROC上
而对于评估方法来说,f1-score 和AUC方法较为严格。