背景:
希望能将训练集的概率值y_train_predict_proba分类,比如0.44879分到[0.4,0.5)这个区间中,依次类推。想法是增加一列predict_proba_range,然后使用if else,但发现还挺不容易实现,查阅各种资料后,写出如下代码。
要点:
①创建一个空列表装生成的字段predict_proba_range
②必须把series转化成list,使用.tolist()。
③使用append
df_train=pd.DataFrame(x_train)
df_test=pd.DataFrame(x_test)
df_train['y_train']=y_train
df_test['y_test']=y_test
df_train['y_train_pred']=y_train_pred
df_test['y_test_pred']=y_test_pred
df_train['y_train_predict_proba']=y_train_predict_proba
predict_proba_range=[]
for y_train_predict_proba in df_train['y_train_predict_proba'].tolist():
if y_train_predict_proba>=0.9:
tmp='>0.9'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.8 and y_train_predict_proba<0.9:
tmp='[0.8,0.9)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.7 and y_train_predict_proba<0.8:
tmp='[0.7,0.8)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.6 and y_train_predict_proba<0.7:
tmp='[0.6,0.7)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.5 and y_train_predict_proba<0.6:
tmp='[0.5,0.6)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.4 and y_train_predict_proba<0.5:
tmp='[0.4,0.5)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.3 and y_train_predict_proba<0.4:
tmp='[0.3,0.4)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.2 and y_train_predict_proba<0.3:
tmp='[0.2,0.3)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0.1 and y_train_predict_proba<0.2:
tmp='[0.1,0.2)'
predict_proba_range.append(tmp)
elif y_train_predict_proba>=0 and y_train_predict_proba<0.1:
tmp='[0,0.1)'
predict_proba_range.append(tmp)
df_train['predict_proba_range']=predict_proba_range