[高级编程技术作业-Week 15]Scikit-Learn: Machine Learning in Python

最新推荐文章于 2018-07-08 22:10:43 发布

ZacharyLei

最新推荐文章于 2018-07-08 22:10:43 发布

阅读量377

点赞数 1

分类专栏：高级编程技术文章标签： Python

本文链接：https://blog.csdn.net/ZacharyLei/article/details/80717458

版权

高级编程技术专栏收录该内容

16 篇文章 0 订阅

订阅专栏

In this ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest.

For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.

Steps:
1. Create a classification dataset (n_samples >= 1000, n_features >= 10)
2. Split the dataset using 10-fold cross validation
3. Train the algorithms
    GaussianNB
    SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
    RandomForestClassifier (possible n estimators values [10, 100, 1000])
4. Evaluate the cross-validated performance
    Accuracy
    F1-score
    AUC ROC

5. Write a short report summarizing the methodology and the results

Codes:

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

#Step 1
#Create a classification dataset (n_samples >= 1000, n_features >= 10)
dataset = datasets.make_classification(n_samples=1000, n_features=10)

#Step 2
#Split the dataset using 10-fold cross validation
kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
for train_index, test_index in kf:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

#Step 3&4
#Train the algorithms and evaluate the cross-validated performance
#GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
#Accuracy
acc = metrics.accuracy_score(y_test, pred)
print('GaussianNB Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

#SVC
clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print('SVC Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

#RandomForestClassifier
clf = RandomForestClassifier(n_estimators=10)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print('RandomForestClassifier Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

Methodology and Results:

1. Load and generate datasets
2. Split them to perform cross-validation
3. Apply learning algorithms
4. Evaluate the performace of such algorithms

GaussianNB Evaluation:
Accuracy 0.95
F1-score 0.9473684210526316
AUC ROC 0.9493797519007602
SVC Evaluation:
Accuracy 0.95
F1-score 0.9278350515463918
AUC ROC 0.9297719087635054
RandomForestClassifier Evaluation:
Accuracy 0.95
F1-score 0.9896907216494846
AUC ROC 0.9897959183673469

In the case above, Random Forest Classification algorithm has the best performance.

ZacharyLei

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[高级编程技术作业-Week 15]Scikit-Learn: Machine Learning in Python

In this ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest.For this assignment you need to generate a random bin...
复制链接

扫一扫