校准曲线的绘制的小技巧

最新推荐文章于 2024-06-20 21:32:08 发布

生信修炼手册

最新推荐文章于 2024-06-20 21:32:08 发布

阅读量4.2k

点赞数

文章标签：机器学习数据分析 python 人工智能大数据

本文链接：https://blog.csdn.net/weixin_43569478/article/details/125308212

版权

欢迎关注”生信修炼手册”!

在之前关于列线图的文章中，我们介绍了利用列线图来可视化预后模型，同时也提到了模型性能的几种评估方式，校准度以及校准曲线就是其中一种方式。

校准度，用来描述一个模型预测个体发生临床结局的概率的准确性。在实际应用中，通常用校准曲线来表征。校准曲线展示了模型预测值与实际值之间的偏差，一个典型的校准曲线示例如下

横轴表示模型预测的不同临床结局概率，纵轴表示实际观察到的患者的临床结局的概率，用中位数加均值的errorbar 形式表征，并绘制了一条斜率为1的理想曲线作为参照，实际曲线越接近理想曲线，表明模型预测结果与实际结果的偏差越小，模型效果高好。

在数据分析过程中，我们可以通过rms包中的calibrate函数来创建校准曲线，首先来运行下官方示例

> set.seed(1)
> n <- 200
> d.time <- rexp(n)
> x1 <- runif(n)
> x2 <- factor(sample(c('a', 'b', 'c'), n, TRUE))
> f <- cph(Surv(d.time) ~ pol(x1,2) * x2, x=TRUE, y=TRUE, surv=TRUE, time.inc=1.5)
> cal <- calibrate(f, u=1.5, cmethod='KM', m=50, B=20)
> plot(cal)

效果图如下

参数u指定了我们想要分析的时间节点，m指定了样本分组个数，该参数决定了图中errorbar的个数，示例数据有200个样本，m取50时，group的个数为4。该函数通过有放回的抽样方法对模型效能进行评估，利用函数返回值可以查看具体的绘图数据，示例如下

> cal
calibrate.cph(fit = f, cmethod = "KM", u = 1.5, m = 50, B = 20)
n=200  B=20  u=1.5 Day
      index.orig     training         test mean.optimism mean.corrected  n
[1,] -0.02180909 -0.006492867  0.053098128   -0.05959099     0.03778191 20
[2,]  0.01161824  0.013463692  0.031802035   -0.01833834     0.02995658 20
[3,]  0.07007320 -0.064043654 -0.007650977   -0.05639268     0.12646588 14
[4,] -0.07103626 -0.015150576 -0.055302350    0.04015177    -0.11118804 20
     mean.predicted   KM KM.corrected   std.err
[1,]      0.1418091 0.12    0.1795910 0.3829708
[2,]      0.1883818 0.20    0.2183383 0.2828427
[3,]      0.2299268 0.30    0.3563927 0.2160247
[4,]      0.3110363 0.24    0.1998482 0.2516611

其中，mean.predicted列代表图中4处errorbar对应的x轴坐标，KM.corrected列表示图中黑色原形散点的纵坐标，星形散点的纵坐标为KM列，errobar的上下区间则通过如下公式计算

cal   <- x[,"KM"]
se <- x[,"std.err"]
ciupper <- function(surv, d) ifelse(surv==0, 0, pmin(1, surv*exp(d)))
cilower <- function(surv, d) ifelse(surv==0, 0, surv*exp(-d))
cilower(cal, 1.959964*se)
ciupper(cal, 1.959964*se)

利用KM列和std.err列的数据进行计算，我们可以提取其中的数据，自己来画图，代码如下

> x <- cal
> plot(x = x[,"mean.predicted"], y = x[,"KM"],  pch = 20, xlab = "", ylab = "")
> errbar(x[,"mean.predicted"], x[,"KM"] , cilower(x[,"KM"], 1.959964 * x[,"std.err"]), ciupper(x[,"KM"], 1.959964 * x[,"std.err"]))
> points(x = x[,"mean.predicted"], y = x[,"KM.corrected"], pch = 4)
> lines(x = x[,"mean.predicted"], y = x[,"KM"])
> plot(x = x[,"mean.predicted"], y = x[,"KM"],  pch = 20, xlab = "", ylab = "")
> errbar(x[,"mean.predicted"], x[,"KM"] , cilower(x[,"KM"], 1.959964 * x[,"std.err"]), ciupper(x[,"KM"], 1.959964 * x[,"std.err"]), xlab = "", ylab = "")
> points(x = x[,"mean.predicted"], y = x[,"KM.corrected"], pch = 4)
> lines(x = x[,"mean.predicted"], y = x[,"KM"]

效果图如下