第十五周python作业：sklearn练习

最新推荐文章于 2022-06-20 13:06:03 发布

qq_798779022zzc

最新推荐文章于 2022-06-20 13:06:03 发布

阅读量305

点赞数

本文链接：https://blog.csdn.net/qq_798779022zzc/article/details/80719295

版权

Assignment

In the second ML assignment you have to compare theperformance of three different classification algorithms, namely Naive Bayes,SVM, and Random Forest. For this assignment you need to generate a randombinary classification problem, and then train and test (using 10-fold crossvalidation) the three algorithms. For some algorithms inner cross validation(5-fold) for choosing the parameters is needed. Then, show the classificationperformace (per-fold and averaged) in the report, and briefly discussing the results.

Steps

Create a classification dataset (n samples 1000, nfeatures 10)

Split the dataset using 10-fold cross validation

Train the algorithms

GaussianNB

SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02],RBF kernel)

RandomForestClassifier (possible n estimators values [10,100, 1000])

Evaluate the cross-validated performance

Accuracy

F1-score

AUC ROC

代码：

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

dataset = datasets.make_classification(n_samples=2000, n_features=10,
			n_informative=2, n_redundant=2, n_repeated=0, n_classes=2)

# spilt using 10-fold
kf = cross_validation.KFold(1000, n_folds=10, shuffle=True)
for train_index, test_index in kf:
	X_train, y_train = dataset[0][train_index], dataset[1][train_index]
	X_test, y_test = dataset[0][test_index], dataset[1][test_index]

# Gaussian Naive Bayes
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("GaussianNB:")
print("pred: \n", pred)
print("y_test: \n", y_test)
# Evaluate the cross-validated performance
print("Evaluate the cross-validated performance:")
acc = metrics.accuracy_score(y_test, pred)
print("Accuracy: ", acc)
f1 = metrics.f1_score(y_test, pred)
print("F1-score: ",f1)
auc = metrics.roc_auc_score(y_test, pred)
print("AUC ROC: ", auc)
print("\n")

# SVC
clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("SVC: ")
print("pred: \n", pred)
print("y_test: \n", y_test)
# Evaluate the cross-validated performance
print("Evaluate the cross-validated performance:")
acc = metrics.accuracy_score(y_test, pred)
print("Accuracy: ", acc)
f1 = metrics.f1_score(y_test, pred)
print("F1-score: ",f1)
auc = metrics.roc_auc_score(y_test, pred)
print("AUC ROC: ", auc)
print("\n")

# Random Forest
clf = RandomForestClassifier(n_estimators=6)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("RandomForestClassifier: ")
print("pred: \n", pred)
print("y_test: \n", y_test)
# Evaluate the cross-validated performance
print("Evaluate the cross-validated performance:")
acc = metrics.accuracy_score(y_test, pred)
print("Accuracy: ", acc)
f1 = metrics.f1_score(y_test, pred)
print("F1-score: ",f1)
auc = metrics.roc_auc_score(y_test, pred)
print("AUC ROC: ", auc)
print("\n")

运行的结果：

GaussianNB:
pred: 
 [0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 1 1 0 0
 1 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0
 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0]
y_test: 
 [0 0 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0
 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 1
 1 1 1 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 0]
Evaluate the cross-validated performance:
Accuracy:  0.88
F1-score:  0.8799999999999999
AUC ROC:  0.8814102564102564


SVC: 
pred: 
 [0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 1 1 0 0
 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0
 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0]
y_test: 
 [0 0 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0
 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 1
 1 1 1 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 0]
Evaluate the cross-validated performance:
Accuracy:  0.9
F1-score:  0.8979591836734695
AUC ROC:  0.9022435897435899


RandomForestClassifier: 
pred: 
 [0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 1 1 0 0
 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1
 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 1 1 1 0 1 1 0 0 1 0 0]
y_test: 
 [0 0 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0
 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 1
 1 1 1 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 0]
Evaluate the cross-validated performance:
Accuracy:  0.94
F1-score:  0.9400000000000001
AUC ROC:  0.9415064102564101

由三组性能结果可以看出，在这组数据时，RandomForestClassifier性能在Accuracy，F1-score，AUC ROC中都较其他两种算法要好。