python第15周作业——sklearn

Assignment

In the second ML assignment you have to compare the performance of three di↵erent classification algorithms, namely Naive Bayes, SVM, and Random Forest.
For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.

Steps

1 Create a classification dataset (n samples≥1000, n features≥10)
2 Split the dataset using 10-fold cross validation
3 Train the algorithms
(1)GaussianNB
(2)SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
(3)RandomForestClassifier (possible n estimators values [10, 100, 1000])
4 Evaluate the cross-validated performance
(1)Accuracy
(2) F1-score
(3)AUC ROC
5 Write a short report summarizing the methodology and the results

代码

from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics


#生成数据集
data = datasets.make_classification(n_samples=1000,n_features=10,n_informative=2, n_redundant=2, n_repeated=0, n_classes=2)
#分成10组
kf = KFold(n_splits=10,shuffle=True)

for train_index,test_index in kf.split(data[0]):
    X_train, Y_train = data[0][train_index],data[1][train_index]
    X_test, Y_test = data[0][test_index], data[1][test_index]

#朴素贝叶斯
clf = GaussianNB()
clf.fit(X_train,Y_train)
pred = clf.predict(X_test)
acc = metrics.accuracy_score(Y_test,pred)
f1 = metrics.f1_score(Y_test, pred)
auc = metrics.roc_auc_score(Y_test,pred)
print("Naive Bayes:")
print("Accuracy:",acc)
print("F1-score:",f1)
print("AUC ROC :",auc)

#SVC
ave_acc = 0
ave_f1 = 0
ave_auc = 0
C_values = [1e-02, 1e-01, 1e00, 1e01, 1e02]
for C_value in C_values:
    clf = SVC(C=C_value, kernel='rbf', gamma ='auto')
    clf.fit(X_train, Y_train)
    pred = clf.predict(X_test)
    acc = metrics.accuracy_score(Y_test, pred)
    ave_acc+=acc
    f1 = metrics.f1_score(Y_test, pred)
    ave_f1+=f1
    auc = metrics.roc_auc_score(Y_test, pred)
    ave_auc+=auc
    print("SVC,C:",C_value)
    print("Accuracy:", acc)
    print("F1-score:", f1)
    print("AUC ROC :", auc)
ave_f1/=5
ave_acc/=5
ave_auc/=5
print("SVC average:")
print("average Accuracy:",ave_acc)
print("average F1-score:", ave_f1)
print("average AUC ROC :",ave_auc)

#使用随机森林,可能的 n_estimators 为 [10,100,1000])
ave_acc = 0
ave_f1 = 0
ave_auc = 0
N = [10, 100, 1000]
for n in N:
    clf = RandomForestClassifier(n_estimators=n)
    clf.fit(X_train, Y_train)
    pred_RanFor=clf.predict(X_test)
    acc = metrics.accuracy_score(Y_test, pred)
    ave_acc+=acc
    f1 = metrics.f1_score(Y_test, pred)
    ave_f1+=f1
    auc = metrics.roc_auc_score(Y_test, pred)
    ave_auc+=auc
    print("RandomForest,N:",n)
    print("Accuracy:",acc)
    print("F1-score:",f1)
    print("AUC ROC :",auc)
ave_f1 /= 3
ave_acc /= 3
ave_auc /= 3
print("RandomForest average")
print("average Accuracy:",ave_acc)
print("average F1-score:", ave_f1)
print("average AUC ROC :",ave_auc)

运行结果

Naive Bayes:
Accuracy: 0.91
F1-score: 0.9072164948453608
AUC ROC : 0.9134460547504026
SVC,C: 0.01
Accuracy: 0.76
F1-score: 0.7931034482758621
AUC ROC : 0.7777777777777778
SVC,C: 0.1
Accuracy: 0.93
F1-score: 0.9263157894736843
AUC ROC : 0.9319645732689211
SVC,C: 1.0
Accuracy: 0.93
F1-score: 0.924731182795699
AUC ROC : 0.9303542673107892
SVC,C: 10.0
Accuracy: 0.94
F1-score: 0.9361702127659574
AUC ROC : 0.9412238325281804
SVC,C: 100.0
Accuracy: 0.92
F1-score: 0.9166666666666666
AUC ROC : 0.9227053140096618
SVC average:
average Accuracy: 0.8960000000000001
average F1-score: 0.899397459995574
average AUC ROC : 0.9008051529790662
RandomForest,N: 10
Accuracy: 0.92
F1-score: 0.9166666666666666
AUC ROC : 0.9227053140096618
RandomForest,N: 100
Accuracy: 0.92
F1-score: 0.9166666666666666
AUC ROC : 0.9227053140096618
RandomForest,N: 1000
Accuracy: 0.92
F1-score: 0.9166666666666666
AUC ROC : 0.9227053140096618
RandomForest average
average Accuracy: 0.92
average F1-score: 0.9166666666666666
average AUC ROC : 0.9227053140096618

分析

可以看出一般情况下,RandomForest的性能较好,SVC的参数C对性能差异的影响较大。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值