#一 ROC评价 及曲线 AUC的值 准确率
auc 的参数来自假阳率 真阳率,一般与 metrics.roc_curve配合使用,来自sklearn正规示例网站
>>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr, tpr)
colors = ['r', 'g', 'b', 'y', 'k', 'c', 'm', 'brown', 'r']
lw = 1
Cs = [1e-6, 1e-4, 1e0]
plt.figure(figsize=(12,8))
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for different classifiers')
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
labels = []
for idx, C in enumerate(Cs):
clf = LogisticRegression(C = C)
clf.fit(X_train, y_train)
print("C: {}, parameters {} and intercept {}".format(C, clf.coef_, clf.intercept_))
preds = clf.predict_proba(X_test)[:,1]
print("clf.predict_proba(X_test=",clf.predict_proba(X_test))
print("y_test=",y_test)
fpr, tpr, _ = roc_curve(y_test, preds)
correct_prediction = np.equal(np.round(preds), y_test)
print("准确率=",np.mean(correct_prediction))
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, lw=lw, color=colors[idx])
labels.append("C: {}, AUC = {}".format(C, np.round(roc_auc, 4)))
plt.legend(['random AUC = 0.5'] + labels)
#result
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning:
Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
C: 1e-06, parameters [[-0.00424654 -0.00232424 -0.00354647 -0.00199886 -0.00186031]] and intercept [-0.03324687]
clf.predict_proba(X_test= [[0.50885744 0.49114256]
[0.50879396 0.49120604]
[0.50869638 0.49130362]
...
[0.50899422 0.49100578]
[0.5086329 0.4913671 ]
[0.50862148 0.49137852]]
y_test= 8067 0
368101 0
70497 0
226567 1
73186 1
..
98574 0
334252 1
293289 0
167582 0
231389 0
Name: is_duplicate, Length: 133416, dtype: int64
准确率= 0.629422258199916
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning:
Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
C: 0.0001, parameters [[-0.16061857 -0.09048901 -0.13250978 -0.07998846 0.68641435]] and intercept [-0.70401556]
clf.predict_proba(X_test= [[0.53112308 0.46887692]
[0.58009823 0.41990177]
[0.65390497 0.34609503]
...
[0.52987886 0.47012114]
[0.64214599 0.35785401]
[0.6472506 0.3527494 ]]
y_test= 8067 0
368101 0
70497 0
226567 1
73186 1
..
98574 0
334252 1
293289 0
167582 0
231389 0
Name: is_duplicate, Length: 133416, dtype: int64
准确率= 0.629422258199916
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning:
Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
C: 1.0, parameters [[-10.25339741 -0.91965268 6.77946546 -7.16268424 3.29874476]] and intercept [-1.34016168]
clf.predict_proba(X_test= [[0.24412863 0.75587137]
[0.49234752 0.50765248]
[0.85791823 0.14208177]
...
[0.3597533 0.6402467 ]
[0.81310546 0.18689454]
[0.8053035 0.1946965 ]]
y_test= 8067 0
368101 0
70497 0
226567 1
73186 1
..
98574 0
334252 1
293289 0
167582 0
231389 0
Name: is_duplicate, Length: 133416, dtype: int64
准确率= 0.6547715416441806
Out[27]:
<matplotlib.legend.Legend at 0x2c7f8a48>
#二 PR曲线绘制 代码
# precision_recall_curve 评价
pr, re, _ = precision_recall_curve(y_test, cv.best_estimator_.predict_proba(X_test)[:,1])
plt.figure(figsize=(12,8))
plt.plot(re, pr)
plt.title('PR Curve (AUC {})'.format(auc(re, pr)))
plt.xlabel('Recall')
plt.ylabel('Precision')
#result