临床预测模型之综合判别改善指数IDI计算


本文首发于公众号:医学和生信笔记,完美观看体验请至公众号查看本文。

医学和生信笔记,专注R语言在临床医学中的使用,R语言数据分析和可视化。


IDI,综合判别改善指数,也适用于评价不同模型优劣的,比起NRI,IDI能够从整体角度对模型进行评价,和NRI一起使用效果更佳!

logistic模型的IDI

二分类变量的IDI计算使用PredictABEL包。

使用survival包中的pbc数据集用于演示,这是一份关于原发性硬化性胆管炎的数据,其实是一份用于生存分析的数据,是有时间变量的,但是这里我们用于演示logistic回归,只要不使用time这一列就可以了。

library(survival)

# 只使用部分数据
dat = pbc[1:312,] 
dat = dat[ dat$time > 2000 | (dat$time < 2000 & dat$status == 2), ]

str(dat) # 数据长这样
## 'data.frame':	232 obs. of  20 variables:
##  $ id      : int  1 2 3 4 6 8 9 10 11 12 ...
##  $ time    : int  400 4500 1012 1925 2503 2466 2400 51 3762 304 ...
##  $ status  : int  2 0 2 2 2 2 2 2 2 2 ...
##  $ trt     : int  1 1 1 1 2 2 1 2 2 2 ...
##  $ age     : num  58.8 56.4 70.1 54.7 66.3 ...
##  $ sex     : Factor w/ 2 levels "m","f": 2 2 1 2 2 2 2 2 2 2 ...
##  $ ascites : int  1 0 0 0 0 0 0 1 0 0 ...
##  $ hepato  : int  1 1 0 1 1 0 0 0 1 0 ...
##  $ spiders : int  1 1 0 1 0 0 1 1 1 1 ...
##  $ edema   : num  1 0 0.5 0.5 0 0 0 1 0 0 ...
##  $ bili    : num  14.5 1.1 1.4 1.8 0.8 0.3 3.2 12.6 1.4 3.6 ...
##  $ chol    : int  261 302 176 244 248 280 562 200 259 236 ...
##  $ albumin : num  2.6 4.14 3.48 2.54 3.98 4 3.08 2.74 4.16 3.52 ...
##  $ copper  : int  156 54 210 64 50 52 79 140 46 94 ...
##  $ alk.phos: num  1718 7395 516 6122 944 ...
##  $ ast     : num  137.9 113.5 96.1 60.6 93 ...
##  $ trig    : int  172 88 55 92 63 189 88 143 79 95 ...
##  $ platelet: int  190 221 151 183 NA 373 251 302 258 71 ...
##  $ protime : num  12.2 10.6 12 10.3 11 11 11 11.5 12 13.6 ...
##  $ stage   : int  4 3 4 4 3 3 2 4 4 4 ...
dim(dat) # 232 20
## [1] 232  20

然后就是准备计算IDI所需要的各个参数。

# 定义结局事件,0是存活,1是死亡
event = ifelse(dat$time < 2000 & dat$status == 2, 1, 0)

# 建立2个模型
mstd = glm(event ~ age + bili + albumin, family = binomial(), data = dat, x=TRUE)
mnew = glm(event ~ age + bili + albumin + protime, family = binomial(), data = dat, x=TRUE)

# 取出模型预测概率
p.std = mstd$fitted.values
p.new = mnew$fitted.values

接下来就是使用PredictABEL计算IDI:

#install.packages("PredictABEL") #安装R包
library(PredictABEL)  

dat$event <- event

reclassification(data = dat,
                 cOutcome = 21, # 结果变量在哪一列
                 predrisk1 = p.std,
                 predrisk2 = p.new,
                 cutoff = c(0,0.3,0.7,1)
                 )
##  _________________________________________
##  
##      Reclassification table    
##  _________________________________________
## 
##  Outcome: absent 
##   
##              Updated Model
## Initial Model [0,0.3) [0.3,0.7) [0.7,1]  % reclassified
##     [0,0.3)       121         4       0               3
##     [0.3,0.7)       1        13       1              13
##     [0.7,1]         0         1       3              25
## 
##  
##  Outcome: present 
##   
##              Updated Model
## Initial Model [0,0.3) [0.3,0.7) [0.7,1]  % reclassified
##     [0,0.3)        14         0       0               0
##     [0.3,0.7)       0        18       3              14
##     [0.7,1]         0         1      52               2
## 
##  
##  Combined Data 
##   
##              Updated Model
## Initial Model [0,0.3) [0.3,0.7) [0.7,1]  % reclassified
##     [0,0.3)       135         4       0               3
##     [0.3,0.7)       1        31       4              14
##     [0.7,1]         0         2      55               4
##  _________________________________________
## 
##  NRI(Categorical) [95% CI]: 0.0019 [ -0.0551 - 0.0589 ] ; p-value: 0.94806 
##  NRI(Continuous) [95% CI]: 0.0391 [ -0.2238 - 0.3021 ] ; p-value: 0.77048 
##  IDI [95% CI]: 0.0044 [ -0.0037 - 0.0126 ] ; p-value: 0.28396

IDI在最后一行,同时给出了95%的可信区间和P值;还给出了NRI和P值。

生存资料的IDI

生存资料的IDI计算使用survIDINRI包计算。

# 安装R包
install.packages("survIDINRI")

加载R包并使用,还是用上面的pbc数据集。

library(survIDINRI)
## Loading required package: survC1
# 使用部分数据
dat <- pbc[1:312,]
dat$status <- ifelse(dat$status==2, 1, 0) # 0表示活着,1表示死亡

str(dat)
## 'data.frame':	312 obs. of  20 variables:
##  $ id      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ time    : int  400 4500 1012 1925 1504 2503 1832 2466 2400 51 ...
##  $ status  : num  1 0 1 1 0 1 0 1 1 1 ...
##  $ trt     : int  1 1 1 1 2 2 2 2 1 2 ...
##  $ age     : num  58.8 56.4 70.1 54.7 38.1 ...
##  $ sex     : Factor w/ 2 levels "m","f": 2 2 1 2 2 2 2 2 2 2 ...
##  $ ascites : int  1 0 0 0 0 0 0 0 0 1 ...
##  $ hepato  : int  1 1 0 1 1 1 1 0 0 0 ...
##  $ spiders : int  1 1 0 1 1 0 0 0 1 1 ...
##  $ edema   : num  1 0 0.5 0.5 0 0 0 0 0 1 ...
##  $ bili    : num  14.5 1.1 1.4 1.8 3.4 0.8 1 0.3 3.2 12.6 ...
##  $ chol    : int  261 302 176 244 279 248 322 280 562 200 ...
##  $ albumin : num  2.6 4.14 3.48 2.54 3.53 3.98 4.09 4 3.08 2.74 ...
##  $ copper  : int  156 54 210 64 143 50 52 52 79 140 ...
##  $ alk.phos: num  1718 7395 516 6122 671 ...
##  $ ast     : num  137.9 113.5 96.1 60.6 113.2 ...
##  $ trig    : int  172 88 55 92 72 63 213 189 88 143 ...
##  $ platelet: int  190 221 151 183 136 NA 204 373 251 302 ...
##  $ protime : num  12.2 10.6 12 10.3 10.9 11 9.7 11 11 11.5 ...
##  $ stage   : int  4 3 4 4 3 3 3 3 2 4 ...

构建参数需要的值:

# 两个只由预测变量组成的矩阵
z.std = as.matrix(subset(dat, select = c(age, bili, albumin)))
z.new = as.matrix(subset(dat, select = c(age, bili, albumin, protime)))

然后使用IDI.INF()函数计算IDI:

res <- IDI.INF(indata = dat[,c(2,3)],
               covs0 = z.std,
               covs1 = z.new,
               t0 = 2000, # 时间点
               npert = 500 # 重抽样次数
               )

IDI.INF.OUT(res) # 提取结果
##     Est.  Lower Upper p-value
## M1 0.020 -0.003 0.058   0.080
## M2 0.202 -0.042 0.384   0.064
## M3 0.011  0.000 0.036   0.052

m1:IDI的值,可信区间,P值

m2:NRI的值,可信区间,P值

m3:Median improvement in risk score,可信区间,p值。

以上就是IDI的计算方法。

除此之外,随机森林、决策树、lasso回归等也是可以计算IDI的,后面会继续介绍。


本文首发于公众号:医学和生信笔记,完美观看体验请至公众号查看本文。

医学和生信笔记,专注R语言在临床医学中的使用,R语言数据分析和可视化。


  • 1
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: To evaluate the accuracy of prediction, calibration of the model was assessed by estimating the C-index, a measure of discrimination, and the clinical utility and net benefit were estimated by analyzing the decision curve. In addition, another criterion for discrimination, IDI, was also calculated. ### 回答2: To validate the predictive model and compare it to the traditional ESSEN model, the accuracy of the predictions is evaluated by estimating the calibration of the model. The discriminatory power is measured using the concordance index (C-index), and the clinical usefulness and net benefits are assessed through decision curve analysis. Additionally, we also calculated another discriminative criterion, the integrated discrimination improvement (IDI). Translation: This paragraph can be translated into English as follows: In order to validate the predictive model and compare it with the conventional ESSEN model, we evaluate the accuracy of the predictions by estimating the calibration of the model. The discriminatory power is measured using the concordance index (C-index), and the clinical usefulness and net benefits are assessed through decision curve analysis. Furthermore, we also calculated another discriminative criterion, the integrated discrimination improvement (IDI). ### 回答3: In order to validate the predictive model and compare it with the traditional ESSEN model, the accuracy of the predictions is evaluated by estimating the calibration of the model. The discriminative ability is measured by the consistency index (C-index), and the clinical utility and net benefit are estimated using decision curve analysis. Additionally, another discriminative criterion, the Integrated Discrimination Improvement (IDI), is also calculated.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值