高编课后作业------第十五周- Sklearn

题目

这里写图片描述

先导入各种库

from sklearn import datasets,cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
c:\users\sunyy\appdata\local\programs\python\python36-32\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

有个警告,不理他
生成数据集,要求 样本数大于等于1000(取2000), 样本特征大于等于10(取15),分成10折

dataset = datasets.make_classification(n_samples=2000, n_features=15) #二分类问题,n_classes的默认值为2所以不用给
data, target = dataset[0], dataset[1] # dataset 是一个包含2个array的list ,第一个array是样本输入,第二个array是样本输出
#将训练集分成10折 , n_folds = 10
kf = cross_validation.KFold(len(data), n_folds=10, shuffle=True) #kf包含各种分好的下标

然后用不同分类方法分别进行交叉检验,代码照抄pdf上面的内容.
为了评估每种方法的结果,对三个评价参数取10次验证的平均数(虽然我也不知道这么做对不对)

首先是朴素贝叶斯

avr_acc = 0
avr_f1 = 0
avr_auc = 0
for train_index, test_index in kf:
    X_train, y_train = data[train_index], target[train_index]
    X_test, y_test = data[test_index], target[test_index]

    clf = GaussianNB()
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)
    acc = metrics.accuracy_score(y_test, pred)
    avr_acc += acc
    f1 = metrics.f1_score(y_test, pred)
    avr_f1 += f1
    auc = metrics.roc_auc_score(y_test, pred)
    avr_auc += auc

avr_acc /= 10
avr_f1 /= 10
avr_auc /= 10
print("朴素贝叶斯:")
print("Accuracy: %f"  % (avr_acc))
print("F1-score: %f"  % (avr_f1))
print("AUC ROC : %f"  % (avr_auc))
朴素贝叶斯:
Accuracy: 0.885000
F1-score: 0.885972
AUC ROC : 0.884704

SVM, 要求用rbf核,参数C取[1e-02, 1e-01, 1e00, 1e01, 1e02]

for cc in [1e-02, 1e-01, 1e00, 1e01, 1e02]:
    avr_acc = 0
    avr_f1 = 0
    avr_auc = 0
    for train_index, test_index in kf:
        X_train, y_train = data[train_index], target[train_index]
        X_test, y_test = data[test_index], target[test_index]
        clf = SVC(C=cc, kernel='rbf', gamma=0.1)
        clf.fit(X_train, y_train)
        pred = clf.predict(X_test)

        acc = metrics.accuracy_score(y_test, pred)
        avr_acc += acc
        f1 = metrics.f1_score(y_test, pred)
        avr_f1 += f1
        auc = metrics.roc_auc_score(y_test, pred)
        avr_auc += auc
    avr_acc /= 10
    avr_f1 /= 10
    avr_auc /= 10
    print("SVM:参数C = %f" % (cc))
    print("Accuracy: %f"  % (avr_acc))
    print("F1-score: %f"  % (avr_f1))
    print("AUC ROC : %f"  % (avr_auc))
SVM:参数C = 0.010000
Accuracy: 0.791500
F1-score: 0.796085
AUC ROC : 0.804599
SVM:参数C = 0.100000
Accuracy: 0.891000
F1-score: 0.894124
AUC ROC : 0.891249
SVM:参数C = 1.000000
Accuracy: 0.889500
F1-score: 0.889850
AUC ROC : 0.889226
SVM:参数C = 10.000000
Accuracy: 0.864500
F1-score: 0.862406
AUC ROC : 0.864076
SVM:参数C = 100.000000
Accuracy: 0.848000
F1-score: 0.846445
AUC ROC : 0.847830

随机森林:要求n_estimators 为 [10, 100, 1000] ,1000的要算好久

for nn in [10, 100, 1000]:
    avr_acc = 0
    avr_f1 = 0
    avr_auc = 0
    for train_index, test_index in kf:
        X_train, y_train = data[train_index], target[train_index]
        X_test, y_test = data[test_index], target[test_index]
        clf = RandomForestClassifier(n_estimators=nn)
        clf.fit(X_train, y_train)
        pred = clf.predict(X_test)

        acc = metrics.accuracy_score(y_test, pred)
        avr_acc += acc
        f1 = metrics.f1_score(y_test, pred)
        avr_f1 += f1
        auc = metrics.roc_auc_score(y_test, pred)
        avr_auc += auc
    avr_acc /= 10
    avr_f1 /= 10
    avr_auc /= 10
    print("随机森林:参数n_estimators = %d" % (nn))
    print("Accuracy: %f"  % (avr_acc))
    print("F1-score: %f"  % (avr_f1))
    print("AUC ROC : %f"  % (avr_auc))
随机森林:参数n_estimators = 10
Accuracy: 0.906500
F1-score: 0.904558
AUC ROC : 0.906317
随机森林:参数n_estimators = 100
Accuracy: 0.919000
F1-score: 0.918893
AUC ROC : 0.919083
随机森林:参数n_estimators = 1000
Accuracy: 0.916500
F1-score: 0.916685
AUC ROC : 0.916653

综合比较,随机森林算法准确度最高,SVM当参数选择合适时准确度也比较高,朴素贝叶斯比参数选择差的SVM表现要好

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值