Scikit-Learn

最新推荐文章于 2024-07-14 21:35:07 发布

sysueric

最新推荐文章于 2024-07-14 21:35:07 发布

阅读量194

点赞数

分类专栏：高级编程技术文章标签： sklearn

本文链接：https://blog.csdn.net/ericsysu/article/details/80738600

版权

高级编程技术专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Scikit-Learn

Assignment

In the second ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest.
For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.

Solution

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

dataset = datasets.make_classification(n_samples=1000, n_features=10)
X,y = dataset
kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)

acc_average, f1_average, auc_average = [0,0,0],[0,0,0],[0,0,0]

for train_index, test_index in kf:
    X_train, y_train = X[train_index], y[train_index]
    X_test, y_test   = X[test_index],  y[test_index]

    # GaussianNB
    clf = GaussianNB()
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    acc_average[0] += metrics.accuracy_score(y_test, pred)
    f1_average[0] += metrics.f1_score(y_test, pred)
    auc_average[0] += metrics.roc_auc_score(y_test, pred)

    # SVC
    clf = SVC(C=1e-02, kernel='rbf', gamma=0.1)
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    acc_average[1] += metrics.accuracy_score(y_test, pred)
    f1_average[1] += metrics.f1_score(y_test, pred)
    auc_average[1] += metrics.roc_auc_score(y_test, pred)

    # RandomForestClassifier
    clf = RandomForestClassifier(n_estimators=10)
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    acc_average[2] += metrics.accuracy_score(y_test, pred)
    f1_average[2] += metrics.f1_score(y_test, pred)
    auc_average[2] += metrics.roc_auc_score(y_test, pred)

train_methods = ['GaussianNB','SVC','RandomForestClassifier']
for method in train_methods:
    print(method + ":\nAccuracy:%f\nF1-score:%f\nAUC ROC:%f\n"%(acc_average[train_methods.index(method)]/10,f1_average[train_methods.index(method)]/10,auc_average[train_methods.index(method)]/10))

Result

# Average
GaussianNB:
Accuracy:0.904000
F1-score:0.902537
AUC ROC:0.904208

SVC:
Accuracy:0.924000
F1-score:0.922635
AUC ROC:0.924301

RandomForestClassifier:
Accuracy:0.968000
F1-score:0.967945
AUC ROC:0.967754

Report

朴素贝叶斯

朴素贝叶斯分类器基于一个简单的假定：给定目标值时属性之间相互条件独立。所以在属性相关性较小时，朴素贝叶斯性能最为良好。

支持向量机

支持向量机（Support Vector Machine, SVM）的基本模型是在特征空间上找到最佳的分离超平面使得训练集上正负样本间隔最大。SVM是用来解决二分类问题的有监督学习算法，在引入了核方法之后SVM也可以用来解决非线性问题。

随机森林

顾名思义，是用随机的方式建立一个森林，森林里面有很多的决策树组成，随机森林的每一棵决策树之间是没有关联的。在得到森林之后，当有一个新的输入样本进入的时候，就让森林中的每一棵决策树分别进行一下判断，看看这个样本应该属于哪一类（对于分类算法），然后看看哪一类被选择最多，就预测这个样本为那一类。

Algorithms setting

GaussianNB
SVC: C=1e-02, Kernel = RBF
RandomForestClassifier: n_estimators = 10

Output

使用评测值的平均结果

Evaluation

根据结果可得，三种算法的优劣性GaussianNB>SVC>RandomForestClassifier
RandomForestClassifier算法中，随着n_estimators增大运算时间增长，但各项指标增幅不大

sysueric

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Scikit-Learn

Scikit-LearnAssignmentIn the second ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest. For this assignment...
复制链接

扫一扫