R语言-基于现有临床预测模型预测性能评估(predRupdate)

在阅读临床科研文献时,一般不会提供数据集及代码,但有时会给出模型系数。如果我们想利用文献中的模型该怎么办呢?R包predRupdate可以完成这个工作。

predRupdate 包提供在给定新数据集的情况下评估现有临床预测模型预测性能的功能。该包有三种用法:

  • 在新数据上验证现有的预测模型,以估算模型的预测性能,即外部验证;

  • 更新现有预测模型,应用到新的数据集;

  • 将多个现有模型集成为一个新的模型

一、加载R包和数据集​​​​​​​

#install.packages("predRupdat")library(predRupdate)

1.1 Logistic回归模型

将现有模型的系数及截距构建为数据框​​​​​​​

coefs_table_1 <- data.frame("Intercept" = -3.995, #the intercept needs to be named exactly as given here                          "Age" = 0.012,                          "SexM" = 0.267,                           "Smoking_Status" = 0.751,                          "Diabetes" = 0.523,                          "Creatinine" = 0.578)

构建逻辑回归模型​​​​​​​

#pass this into pred_input_info()Existing_Logistic_Model <- pred_input_info(  model_type = "logistic",  model_info = coefs_table_1)summary(Existing_Logistic_Model)​​​​​​​
## Information about 1 existing model(s) of type 'logistic' ## ## Model Coefficients ## ================================= ##   Intercept   Age  SexM Smoking_Status Diabetes Creatinine## 1    -3.995 0.012 0.267          0.751    0.523      0.578## ## Model Functional Form ## ================================= ## Age + SexM + Smoking_Status + Diabetes + Creatinine

利用新的数据集进行验证

​​​​​​​

validation_results <- pred_validate(  x = Existing_Logistic_Model,  new_data = SYNPM$ValidationData,  binary_outcome = "Y")summary(validation_results) ## --------------------------------- ##                         Estimate Std. Err Lower 95% Confidence Interval## Observed:Expected Ratio   1.8583   0.0188                        1.7911## Calibration Intercept     0.7076   0.0206                        0.6673## Calibration Slope         0.6496   0.0463                        0.5588##                         Upper 95% Confidence Interval## Observed:Expected Ratio                        1.9281## Calibration Intercept                          0.7479## Calibration Slope                              0.7403## ##  Also examine the calibration plot, if produced. ## ## Discrimination Measures ## --------------------------------- ##     Estimate Std. Err Lower 95% Confidence Interval## AUC   0.5816   0.0057                        0.5703##     Upper 95% Confidence Interval## AUC                        0.5928## ## ## Overall Performance Measures ## --------------------------------- ## Cox-Snell R-squared: -0.0446## Nagelkerke R-squared: -0.0801## Brier Score: 0.1246## ##  Also examine the distribution plot of predicted risks.

在结果中给出校准曲线的斜率0.7403,截距0.7479,AUC为0.5816及95%置信区间(0.5703-0.5928),Brier Score为0.1246。

绘制校准曲线

validation_results$flex_calibrationplot

1.2 survival模型

survival模型,需要指定模型的累积风险以及回归系数。

​​​​​​​

coefs_table_2 <- data.frame("Age" = 0.007,                          "SexM" = 0.225,                          "Smoking_Status" = 0.685,                          "Diabetes" = 0.425,                          "Creatinine" = 0.587)
#构建模型Existing_TTE_Model <- pred_input_info(  model_type = "survival",  # 指定模型类型  model_info = coefs_table_2,  cum_hazard = SYNPM$TTE_mod1_baseline) # 累积风险

​​​​​​​

# 模型验证validation_results <- pred_validate(  x = Existing_TTE_Model,  new_data = SYNPM$ValidationData,  survival_time = "ETime",  event_indicator = "Status",  time_horizon = 5)

summary(validation_results)
# Calibration Measures # --------------------------------- #                         Estimate Std. Err Lower 95% Confidence Interval# Observed:Expected Ratio     1.22    0.011                          1.19# Calibration Slope           0.71    0.028                          0.66#                         Upper 95% Confidence Interval# Observed:Expected Ratio                          1.25# Calibration Slope                                0.77# #  Also examine the calibration plot, if produced. # # Discrimination Measures # --------------------------------- #           Estimate Std. Err Lower 95% Confidence Interval# Harrell C     0.58   0.0032                          0.57#           Upper 95% Confidence Interval# Harrell C                          0.59# #  Also examine the distribution plot of predicted risks. 
plot(validation_results)

二、利用新数据集更新模型

2.1构建模型

​​​​​​​

coefs_table <- data.frame("Intercept" = -3.995,                           "Age" = 0.012,                          "SexM" = 0.267,                           "Smoking_Status" = 0.751,                          "Diabetes" = 0.523,                          "Creatinine" = 0.578)
Existing_Logistic_Model <- pred_input_info(  model_type = "logistic",  model_info = coefs_table)

2.2 更新模型(pred_updat())

​​​​​​​

Updated_model <- pred_update(  Existing_Logistic_Model,  update_type = "recalibration",  new_data = SYNPM$ValidationData,  binary_outcome = "Y")

​​​​​​​

summary(Updated_model)# Original model was updated with type recalibration# The model updating results are as follows: # # Updated Model Coefficients # ================================= # # Model Functional Form # ================================= # Age + SexM + Smoking_Status + Diabetes + Creatinine

模型验证

​​​​​​​

pred_val <- pred_validate(  Updated_model,   new_data = SYNPM$ValidationData,   binary_outcome = "Y")summary(pred_val)
## Calibration Measures ## --------------------------------- ##                         Estimate Std. Err Lower 95% Confidence Interval## Observed:Expected Ratio        1   0.0188                        0.9638## Calibration Intercept          0   0.0204                       -0.0400## Calibration Slope              1   0.0713                        0.8603##                         Upper 95% Confidence Interval## Observed:Expected Ratio                        1.0375## Calibration Intercept                          0.0400## Calibration Slope                              1.1397## ##  Also examine the calibration plot, if produced. ## ## Discrimination Measures ## --------------------------------- ##     Estimate Std. Err Lower 95% Confidence Interval## AUC   0.5816   0.0057                        0.5703##     Upper 95% Confidence Interval## AUC                        0.5928## ## ## Overall Performance Measures ## --------------------------------- ## Cox-Snell R-squared: 0.0095## Nagelkerke R-squared: 0.0171## Brier Score: 0.1203## ##  Also examine the distribution plot of predicted risks.

绘制校准曲线

plot(pred_val)

三、模型集成

当我们用模型集成时,元模型必须是同一类型的模型,比如logistic或survival模型,这与常用的模型集成有所不同。

构建集成模型

​​​​​​​

coefs_table <- data.frame(rbind(c("Intercept" = -3.995,                                  "Age" = 0.012,                                  "SexM" = 0.267,                                  "Smoking_Status" = 0.751,                                  "Diabetes" = 0.523,                                  "Creatinine" = 0.578),                                c("Intercept" = -2.282,                                  "Age" = NA,                                  "SexM" = 0.223,                                  "Smoking_Status" = 0.528,                                  "Diabetes" = 0.200,                                  "Creatinine" = 0.434),                                c("Intercept" = -3.013,                                  "Age" = NA,                                  "SexM" = NA,                                  "Smoking_Status" = 0.565,                                  "Diabetes" = -0.122,                                  "Creatinine" = 0.731)))multiple_mods <- pred_input_info(model_type = "logistic",                                 model_info = coefs_table)summary(multiple_mods)## Information about 3 existing model(s) of type 'logistic' ## ## Model Coefficients ## ================================= ## [[1]]##   Intercept   Age  SexM Smoking_Status Diabetes Creatinine## 1    -3.995 0.012 0.267          0.751    0.523      0.578## ## [[2]]##   Intercept  SexM Smoking_Status Diabetes Creatinine## 2    -2.282 0.223          0.528      0.2      0.434## ## [[3]]##   Intercept Smoking_Status Diabetes Creatinine## 3    -3.013          0.565   -0.122      0.731## ## ## Model Functional Form ## ================================= ## Model 1: Age + SexM + Smoking_Status + Diabetes + Creatinine## Model 2: SexM + Smoking_Status + Diabetes + Creatinine## Model 3: Smoking_Status + Diabetes + Creatinine

模型预测

SR <- pred_stacked_regression(x = multiple_mods,                              new_data = SYNPM$ValidationData,                              binary_outcome = "Y")summary(SR)## Existing models aggregated using stacked regression## The model stacked regression weights are as follows: ## (Intercept)         LP1         LP2         LP3 ##  0.02072515  0.48150208  0.12920227  0.16559034 ## ## Updated Model Coefficients ## ================================= ##   Intercept         Age      SexM Smoking_Status Diabetes Creatinine## 1 -2.696639 0.005778025 0.1573732      0.5233854 0.257464  0.4554285## ## Model Functional Form ## ================================= ## Age + SexM + Smoking_Status + Diabetes + Creatinine

模型验证​​​​​​​

pred_val_2 <- pred_validate(SR,               new_data = SYNPM$ValidationData,               binary_outcome = "Y")summary(pred_val_2)

​​​​​​​

# Calibration Measures # --------------------------------- #                         Estimate Std. Err# Observed:Expected Ratio        1   0.0188# Calibration Intercept          0   0.0204# Calibration Slope              1   0.0702#                         Lower 95% Confidence Interval# Observed:Expected Ratio                        0.9638# Calibration Intercept                         -0.0400# Calibration Slope                              0.8623#                         Upper 95% Confidence Interval# Observed:Expected Ratio                        1.0375# Calibration Intercept                          0.0400# Calibration Slope                              1.1377# #  Also examine the calibration plot, if produced. # # Discrimination Measures # --------------------------------- #     Estimate Std. Err Lower 95% Confidence Interval# AUC    0.583   0.0058                        0.5717#     Upper 95% Confidence Interval# AUC                        0.5943# # # Overall Performance Measures # --------------------------------- # Cox-Snell R-squared: 0.0098# Nagelkerke R-squared: 0.0175# Brier Score: 0.1203# #  Also examine the distribution plot of predicted risks.

绘制校准曲线

pred_val_2$flex_calibrationplot

参考资料:https://glenmartin31.github.io/predRupdate

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

皮肤小白生

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值