1 任务
使用网格搜索法对7个模型进行调优(调参时采用五折交叉验证的方式),并进行模型评估.
2 不同模型调参前后的性能
模型 | 默认参数下的roc_auc_score | 调整参数后的roc_auc_score | 调整的参数 |
---|---|---|---|
Logistic Regression | 0.766 | 0.767 | {['solver':['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],'max_iter':[10,50,100,150]}] |
决策树 | 0.594 | 0.656 | [{'max_depth': range(6, 10)}] |
SVC | 0.753 | 0.776 | [{'kernel': ['linear','poly','rbf','sigmoid']}] |
随机森林 | 0.720 | 0.749 | {'n_estimators': range(100,105)} |
GBDT | 0.764 | ||
xgboost | 0.771 | ||
LightGBM | 0.761 |
3 问题
- 网格搜索找出来的参数,是根据验证集的表现,在测试集上不一定表现提高
- 对划分之后的训练集用
StandardScaler()
, 而不是划分之前, 分类器效果更好.
4 完整代码及注释
# -*- coding: utf-8 -*-
from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
# 引入要用到的评价函数
from sklearn.metrics import roc_curve, roc_auc_score ,make_scorer
# 引入用到的分类算法
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV, KFold, train_test_split
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
def cal_roc_auc(model,x_test,y_test):
test_predict=model.predict(x_test)
if hasattr(model,'decision_function')