第十五周 —— sklearn练习

最新推荐文章于 2021-12-09 18:57:04 发布

qq_37875323

最新推荐文章于 2021-12-09 18:57:04 发布

阅读量214

点赞数

本文链接：https://blog.csdn.net/qq_37875323/article/details/80739343

版权

Assignment
In the second ML assignment you have to compare the performance of
three di↵erent classification algorithms, namely Naive Bayes, SVM, and
Random Forest.
For this assignment you need to generate a random binary classification
problem, and then train and test (using 10-fold cross validation) the three
algorithms. For some algorithms inner cross validation (5-fold) for choosing
the parameters is needed. Then, show the classification performace
(per-fold and averaged) in the report, and briefly discussing the results.
Note
The report has to contain also a short description of the methodology used
to obtain the results.
Dragone, Passerini (DISI) Scikit-Learn Machine Learning 20 / 22

Steps
1、 Create a classification dataset (n samples ! 1000, n features ! 10)
2 、 Split the dataset using 10-fold cross validation
3 、 Train the algorithms
   GaussianNB
   SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
   RandomForestClassifier (possible n estimators values [10, 100, 1000])
4、 Evaluate the cross-validated performance
   Accuracy
   F1-score
   AUC ROC

5 、Write a short report summar

 
  from sklearn 
  import datasets 
 
  from sklearn 
  import cross_validation 
 
  from sklearn.naive_bayes 
  import GaussianNB 
 
  from sklearn.svm 
  import SVC 
 
  from sklearn.ensemble 
  import RandomForestClassifier 
 
  from sklearn 
  import metrics 
 
   dataset = datasets.make_classification(n_samples= 
  2000, n_features= 
  10, 
 
   n_informative= 
  2, n_redundant= 
  2, n_repeated= 
  0, n_classes= 
  2) 
 
  # spilt using 10-fold  
 
   kf = cross_validation.KFold( 
  1000, n_folds= 
  10, shuffle= 
  True) 
 
  for train_index, test_index 
  in kf: 
 
   X_train, y_train = dataset[ 
  0][train_index], dataset[ 
  1][train_index] 
 
   X_test, y_test = dataset[ 
  0][test_index], dataset[ 
  1][test_index] 
 
  # Evaluate the cross-validated performance  
 
   print( 
  "Evaluate the cross-validated performance:") 
 
   acc = metrics.accuracy_score(y_test, pred) 
 
   print( 
  "Accuracy: ", acc) 
 
   f1 = metrics.f1_score(y_test, pred) 
 
   print( 
  "F1-score: ",f1) 
 
   auc = metrics.roc_auc_score(y_test, pred) 
 
   print( 
  "AUC ROC: ", auc) 
 
   print( 
  "\n") 
 
  # SVC  
 
   clf = SVC(C= 
  1e-01, kernel= 
  'rbf', gamma= 
  0.1) 
 
   clf.fit(X_train, y_train) 
 
   pred = clf.predict(X_test) 
 
   print( 
  "SVC: ") 
 
   print( 
  "pred: \n", pred) 
 
   print( 
  "y_test: \n", y_test) 
 
  # Evaluate the cross-validated performance  
 
   print( 
  "Evaluate the cross-validated performance:") 
 
   acc = metrics.accuracy_score(y_test, pred) 
 
   print( 
  "Accuracy: ", acc) 
 
   f1 = metrics.f1_score(y_test, pred) 
 
   print( 
  "F1-score: ",f1) 
 
   auc = metrics.roc_auc_score(y_test, pred) 
 
   print( 
  "AUC ROC: ", auc) 
 
   print( 
  "\n") 
 
  # Gaussian Naive Bayes  
 
   clf = GaussianNB() 
 
   clf.fit(X_train, y_train) 
 
   pred = clf.predict(X_test) 
 
   print( 
  "GaussianNB:") 
 
   print( 
  "pred: \n", pred) 
 
   print( 
  "y_test: \n", y_test) 
 
  # Evaluate the cross-validated performance  
 
   print( 
  "Evaluate the cross-validated performance:") 
 
   acc = metrics.accuracy_score(y_test, pred) 
 
   print( 
  "Accuracy: ", acc) 
 
   f1 = metrics.f1_score(y_test, pred) 
 
   print( 
  "F1-score: ",f1) 
 
   auc = metrics.roc_auc_score(y_test, pred) 
 
   print( 
  "AUC ROC: ", auc) 
 
   print( 
  "\n") 
 
  # Random Forest  
 
   clf = RandomForestClassifier(n_estimators= 
  6) 
 
   clf.fit(X_train, y_train) 
 
   pred = clf.predict(X_test) 
 
   print( 
  "RandomForestClassifier: ") 
 
   print( 
  "pred: \n", pred) 
 
   print( 
  "y_test: \n", y_test)

qq_37875323

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第十五周 —— sklearn练习

AssignmentIn the second ML assignment you have to compare the performance ofthree di↵erent classification algorithms, namely Naive Bayes, SVM, andRandom Forest.For this assignment you need to generate...
复制链接

扫一扫