[高级编程技术作业-Week 15]Scikit-Learn: Machine Learning in Python

In this ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest.

For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.

Steps:
1. Create a classification dataset (n_samples >= 1000, n_features >= 10)
2. Split the dataset using 10-fold cross validation
3. Train the algorithms
    GaussianNB
    SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
    RandomForestClassifier (possible n estimators values [10, 100, 1000])
4. Evaluate the cross-validated performance
    Accuracy
    F1-score
    AUC ROC

5. Write a short report summarizing the methodology and the results


Codes:

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

#Step 1
#Create a classification dataset (n_samples >= 1000, n_features >= 10)
dataset = datasets.make_classification(n_samples=1000, n_features=10)

#Step 2
#Split the dataset using 10-fold cross validation
kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
for train_index, test_index in kf:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

#Step 3&4
#Train the algorithms and evaluate the cross-validated performance
#GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
#Accuracy
acc = metrics.accuracy_score(y_test, pred)
print('GaussianNB Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

#SVC
clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print('SVC Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

#RandomForestClassifier
clf = RandomForestClassifier(n_estimators=10)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print('RandomForestClassifier Evaluation:')
print('Accuracy', acc)
#F1-score
f1 = metrics.f1_score(y_test, pred)
print('F1-score', f1)
#AUC ROC
auc = metrics.roc_auc_score(y_test, pred)
print('AUC ROC', auc)

Methodology and Results:

1. Load and generate datasets
2. Split them to perform cross-validation
3. Apply learning algorithms
4. Evaluate the performace of such algorithms

GaussianNB Evaluation:
Accuracy 0.95
F1-score 0.9473684210526316
AUC ROC 0.9493797519007602
SVC Evaluation:
Accuracy 0.95
F1-score 0.9278350515463918
AUC ROC 0.9297719087635054
RandomForestClassifier Evaluation:
Accuracy 0.95
F1-score 0.9896907216494846
AUC ROC 0.9897959183673469
In the case above, Random Forest Classification algorithm has the best performance.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值