要求:
根据已知的80万条训练集训练,训练模型预测未知的20万条数据
预测指标:
竞赛采用AUC为评价指标,ROC曲线下与坐标轴围成的面积
分类算发常见的评价指标:
1.混淆矩阵:
真正类TP
假负类FN
假正类FP
真负类TN
2准确率:Accuracy
TP+TN/TP+TN+FP+FN
3精确率:Precision;
TP/TP+FN
4召回率:Recall
TP/TP+FN
…
数据读取
import pandas as pd
train=pd.read_csv(‘金融风控/train.csv’)
testA=pd.read_csv(‘金融风控/testA.csv’)
分类指标评价计算示例
import numpy as np
from sklearn.metrics import confusion_matrix
y_pred=[0,1,0,1]
y_true=[0,1,1,0]
print(‘混淆矩阵\n’,confusion_matrix(y_true,y_pred))
#accuracy
from sklearn.metrics import accuracy_score
y_pred=[0,1,0,1]
y_true=[0,1,1,0]
print(‘ACC’,accuracy_score(y_true,y_pred))
from sklearn import metrics
y_pred=[0,1,0,1]
y_true=[0,1,1,0]
print(‘Precision’,metrics.precision_score(y_true,y_pred))
print(‘recalln’,metrics.recall_score(y_true,y_pred))
print(‘F1score’,metrics.f1_score(y_true,y_pred))
#PR曲线
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve as prc
y_pred=[0,1,1,0,1,1,0,1,1,1]
y_true=[0,1,1,0,1,0,1,1,0,1]
precision,recall,threshold=prc(y_true,y_pred)
plt.plot(precision,recall)
plt.show()
#AUC
from sklearn.metrics import roc_auc_score
y_true=np.array([0,0,1,1])
y_scores=np.array([0.1,0.4,0.35,0.8])
print(‘AUC score’,roc_auc_score(y_true,y_scores))