scikit learn
In the second ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest.
For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.
import numpy as np
from sklearn import metrics
from sklearn import datasets
from sklearn.model_selection import cross_validate
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
mydata = datasets.make_classification(n_samples=2000, n_features=10)
clfs = [GaussianNB(),
SVC(C=0.1, kernel='rbf', gamma=0.1),
RandomForestClassifier(n_estimators=100)]
scoring = ['f1_micro', 'f1_macro']
for clf in clfs:
scores = cross_validate(clf, mydata[0], mydata[1], scoring=scoring, cv=10)
print('###################################')
print(str(clf))
print()
print('micro: ')
print(scores['test_f1_micro'])
print('macro: ')
print(scores['test_f1_macro'])
print('average: ', np.mean(scores['test_f1_micro']), np.mean(scores['test_f1_macro']))
输出结果:
###################################
GaussianNB(priors=None)
micro:
[0.89552239 0.86567164 0.87064677 0.85572139 0.795 0.87
0.85929648 0.88944724 0.88944724 0.87939698]
macro:
[0.89551204 0.86561842 0.87025819 0.85570711 0.79499487 0.86994798
0.8592076 0.88944444 0.8893774 0.8793208 ]
ave: 0.8670150128753219 0.8669388865273854
###################################
SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
micro:
[0.94029851 0.90049751 0.89552239 0.90547264 0.825 0.91
0.86934673 0.90452261 0.90954774 0.93467337]
macro:
[0.94022601 0.90037669 0.89535662 0.90543519 0.82446902 0.90977444
0.86918487 0.90448402 0.90943568 0.93461393]
ave: 0.8994881497037426 0.8993356456286277
###################################
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
micro:
[0.94527363 0.95024876 0.96517413 0.92537313 0.925 0.935
0.88944724 0.91959799 0.95979899 0.93467337]
macro:
[0.94525195 0.95024752 0.96517413 0.92530658 0.92499812 0.93498537
0.88942211 0.91959596 0.95979798 0.93461393]
ave: 0.9349587239680991 0.9349393649549326
通过micro与macro两个参数来判断性能,可见SVM效率最高。