几个概念
场景
AdaBoost的基本分类器的线性组合
f(x)=∑m=1MαmGm(x)
最终的分类器
G(x)=sign(f(x))=sign(∑m=1MαmGm(x))
这里已知 {f(xi)|i=1,2,⋯,N} 和 {labeli|i=1,2,⋯,N} ,前者是每个样本 xi 对应的基本分类器的输出的加权组合,后者是对应的标签数据。
接下来基于这两个数据做ROC曲线图。
作图
绘图代码:
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#predStrengths 和classLabels都是299个元素的ndarray对象。</span> ySum = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#variable to calculate AUC</span> N = classLabels.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#总样本个数</span> numPosClas = np.sum(classLabels==<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#样本中正例的个数</span> yStep = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>/numPosClas; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#真阳率(在纵轴上)的分母是正样本的个数</span> xStep = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>/(N-numPosClas) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#假阳率(在横轴上)的分母是负样本的个数</span> srtidxs = predStrengths.argsort()<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 从小到大排列的序号</span> fig = plt.figure() fig.clf() ax = plt.subplot(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">111</span>) cur = (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#左上顶角坐标,全部样本都判为正,真阳率和假阳率都为1</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> idx <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> srtidxs: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#从值最小到值最大,作为判断门限,将大于该值的样本判为正,将小于等于该值的样本判为负</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> classLabels[idx] == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 样本为正,影响的是真阳率,判错了,所以真阳率要减小一个刻度</span> delX = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>; delY = yStep; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 样本为负,影响的是假阳率,盘对了,故假阳率要减小一个刻度</span> delX = xStep; delY = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#每次x轴(即假阳率)调整时,将ySum加上当前的y轴刻度值,</span> ySum += cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] ax.plot([cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>],cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]-delX],[cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>],cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]-delY], c=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'b'</span>) cur = (cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]-delX,cur[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]-delY) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#更新坐标,从右上角向左下角画的曲线 </span> ax.plot([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>],[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'b--'</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 画一条对角线,从(0,0)到(1,1)</span> auc = np.str( <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"%.4f"</span>%(ySum*xStep)) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#曲线下的面积</span> plt.xlabel(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'假阳率'</span>,{<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontname'</span>:<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'STFangsong'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontsize'</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>}); plt.ylabel(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'真阳率'</span>,{<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontname'</span>:<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'STFangsong'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontsize'</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>}) plt.title(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'ROC曲线'</span>+<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'(AUC = ('</span>+auc+<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">')'</span>,{<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontname'</span>:<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'STFangsong'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'fontsize'</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>}) ax.axis([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]) fig.savefig(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'roc.png'</span>,dpi=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">300</span>,bbox_inches=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'tight'</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li></ul>
- 准确率(Accuracy), 精确率(Precision), 召回率(Recall)和F1-Measure
http://argcv.com/articles/1036.c - 准确率(Precision)、召回率(Recall)以及综合评价指标(F1-Measure )
http://www.cnblogs.com/bluepoint2009/archive/2012/09/18/2690035.html - ROC和AUC介绍以及如何计算AUC
http://alexkong.net/2013/06/introduction-to-auc-and-roc/ - An introduction to ROC analysis
https://ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdf - http://en.wikipedia.org/wiki/Precision_and_recall
公式比较全 - Macro- and micro-averaged evaluation measures
http://www.cnts.ua.ac.be/~vincent/pdf/microaverage.pdf - Micro- and Macro-average of Precision, Recall and F-Score
http://rushdishams.blogspot.com/2011/08/micro-and-macro-average-of-precision.html - Area Under the Precision-Recall Curve:Point Estimates and Confidence Intervals
http://www.ecmlpkdd2013.org/wp-content/uploads/2013/07/aucpr_2013ecml_corrected.pdf - PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R
http://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf - 再理解下ROC曲线和PR曲线
http://www.zhizhihu.com/html/y2012/4076.html - Performance Measures for Machine Learning
http://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf - Computational Statistics with Application to Bioinformatics
http://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf - The Relationship Between PR & ROC Curves
http://www.autonlab.org/icml_documents/camera-ready/030_The_Relationship_Bet.pdf - Differences between Receiver Operating Characteristic AUC (ROC AUC) and Precision Recall AUC (PR AUC)
http://www.chioka.in/differences-between-roc-auc-and-pr-auc/