1. GridSearchCV
注意这边有一个坑,样本划分方法不是KFold,而是Stratified KFold
我的朋友写了一个sample generator来解决这个问题:
from sklearn.model_selection import KFold
myCV = []
for train_index, test_index in KFold(5,shuffle=True).split(train[train['installment']==1]):
myCV.append( (train_index, test_index) )
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=xgb1.get_params()['n_estimators'], folds=myCV,feval=KS, early_stopping_rounds=50, show_stdv =False)
myCV是generator
然后在参数里把folds设为mycv就行了
摘录她的聊天记录。这里我还没有检验过。
另外gridsearch中的scoring函数可以传入自定义函数。如果是希望ks达到最大,可以这样写ks函数:
from scipy.stats import ks_2samp
get_ks = lambda y_pred, y_true: ks_2samp(y_pred[y_true==1], y_pred[y_true!=1]).statistic
get_ks_for_grid = lambda estimators, X, y: ks_2samp((estimators.predict_proba(X)[:,0])[pd.DataFrame(y)==1], estimators.predict_proba(X)[:,0][pd.DataFrame(y)==0]).statistic
get_ks_for_grid = lambda estimators, X, y: get_ks(estimators.predict_proba(X)[:,0], y)
2. RandomSearchCV
TBC.