高级编程技术 sklearn课后作业

作业要求:

1、Create a classification dataset (n samples ≥ 1000, n features ≥ 10)

2、Split the dataset using 10-fold cross validation

3、Train the algorithms:

GaussianNB

SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)

RandomForestClassifier (possible n estimators values [10, 100, 1000])

4、Evaluate the cross-validated performance: Accuracy, F1-score and AUC ROC

5、Write a short report summarizing the methodology and the results

用sklearn创建一个分类问题的数据集,然后用三种不同的机器学习方法对数据集进行学习,并对三种方法的准确度、F1分数、受试者工作特征进行计算。

代码如下:

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import svc
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

dataset = datasets.make_classification(n_samples = 1000, n_features = 10, n_informative = 2, n_redundant = 2, n_repeated = 0, n_classes = 2)
kf = cross_validation.KFold(len(dataset[0]), n_folds = 10, shuffle = True)
for train_index, test_index in kf:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("Gaussian NB")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc))

clf = svc(C = 1e-01, kernel = 'rbf', gamma = 0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("\nSVC")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc))

clf = RandomForestClassifier(n_estimators = 6)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("\nRandom Forest")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc)) 

最终输出如下:

Gaussian NB
acc: 0.89
f1: 0.9059829059829059
auc: 0.8881769326167839

SVC
acc: 0.92
f1: 0.9333333333333333
auc: 0.9136006614303431

Random Forest
acc: 0.93
f1: 0.9391304347826087
auc: 0.9332368747416288

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值