6. 临床预测模型——可视化列线图

基本原理:

根据多因素模型中各自变量的偏回归系数的大小,给每个自变量进行赋分,然后将各个自变量评分相加得到总评分,根据总评分估计出个体结局事件的预测情况。

列线图包括三个部分:

  • 用于预测模型的自变量:线段长短表示对因变量的贡献
  • 自变量相应的得分:列线图最上方的points表示每个自变量取不同值时对应的得分。
  • 总得分

连续型资料

load("prostate.Rdata")
prostate$svi<-factor(prostate$svi,levels = c(0,1),labels = c("No","Yes"))

library(rms)
dd <- datadist(prostate)
options(datadist='dd')

利用rms包中datadist()函数对prostate数据集进行打包,然后options()函数中指定打包完的dd,便于后续调用分析。

1.静态列线图

form.stepwise<-as.formula(lpsa~age+lbph+lweight+svi+lcavol)
fit.lm<-ols(formula=form.stepwise,data=prostate,x=TRUE,y=TRUE)
nom.lm <- nomogram(fit.lm,lp=TRUE)

逐步法后最终纳入age+lbph+lweight+svi+lcavol这几个变量。as.formula()函数生成模型公式,储存到form.stepwise中。ols()函数拟合模型,x,y=TRUE。

nomogram()函数中输入模型的名称fit.lm,lp表示是否显示线性概率。

plot(nom.lm,lplabel = 'Predicted value of lpsa',
     cex=1.3,
     cex.axis=1.3,
     cex.lab=1.3)

lplabel表示线性预测值坐标轴名称。

nom.lm

直接显示各自变量取不同值对应的准备points

lpsa每增加一个单位,points增加33.4,每增加1分,lpsa增加0.0299

如果需要具体的换算函数:

#install.packages("nomogramEx")
library(nomogramEx)
nomogramEx(nom.lm,np=1)

np=1表示预测的因变量只有1个

静态列线图的变种
nom3.lm <- nomogram(fit.lm, 
                 lp=TRUE,
                 conf.int = c(0.1,0.9),
                 maxscale=100) 

plot(nom3.lm,
     lplabel = 'Predicted value of lpsa',
     col.grid = c("blue","yellow"),
     col.conf = c('red','green'),
     conf.space = c(0.1,0.5),
     cex=1.3,
     cex.axis=1.3,
     cex.lab=1.3)

lp表示是否显示线性预测值,conf.int这只points的置信区间,这里设置为10%,90%,选项maxscale设置points最大刻度为100。

col.grid设置垂直参考线的颜色,col.conf设置置信区间的颜色。10%的颜色为红色,90%为green。conf.space设置置信区间条在两条坐标轴之间的位置

2.网页动态列线图

#install.packages("DynNom")
library(DynNom)
fit2.lm <- lm(formula=form.stepwise,data=prostate)
DynNom(model=fit2.lm, 
       DNtitle="Nomogram", 
       DNxlab = "Predicted value of lpsa")

Numerical Summary窗口会显示具体的数值结果

3. 花样列线图

#install.packages("regplot")
library(regplot)
nom2.lm<-regplot(fit2.lm, 
              observation = prostate[1,], 
              interval = "confidence",
              center=TRUE, 
              title="Nomogram",
              points=TRUE, 
              showP=TRUE, 
              rank="sd",
              clickable=FALSE)

observation指定将数据集prostate中的第一个观测对象的具体情况在图形上显示出来。

interval设置为confidence表示在图上显示预测值的可信区间。

Center=TRUE表示将每个变量的Points不从0开始,FALSE表示从0开始。

showP=T表示显示各变量是否存在统计学差异,如果有差异*标记。

rank=“sd”表示对各个变量回归系数的SD数值大小排序。

clickable=F表示不进行图片的后续编辑。

连续型变量中频数分布曲线展现其数值的分布情况,分类变量方块的大小展现其分布情况,方块越大,该水平的人数越多。红色区域为第一个观测对象的情况,总分208,对应的lpsa为0.8458。

二分类资料

1.静态列线图

案例:680名研究对象,研究肺动脉栓塞风险。

library(readxl)
data <- read_excel("data.xlsx")
data<-na.omit(data)
data<-as.data.frame(data)

na.omit()对数据集中data中的缺失值进行删除。

对分类变量进行因子化

data$CDU<-factor(data$CDU,levels = c(0,1),labels = c("Normal","Abnormal"))
data$transfusion<-factor(data$transfusion,levels = c(0,1),labels = c("No","Yes"))
data$stage<-factor(data$stage,levels = c(0,1),labels = c("<IIb",">=IIb"))

绘制列线图

library(rms)
dd=datadist(data) #对数据集data进行打包
options(datadist="dd") #指定打包完的dd

group表示结局,是否有动脉栓塞?

form.bestglm<-as.formula(group~age+BMI+ToS+CA153+CDU+transfusion+stage)
fit.glm<- lrm(formula=form.bestglm,data=data)  
nom.glm <- nomogram(fit.glm,
                 fun=function(x)1/(1+exp(-x)),  
                 lp=TRUE, 
                 fun.at=c(0.1,0.3,0.5,0.7,0.9),
                 funlabel="Risk")

plot(nom.glm,cex=1.3,
     cex.axis=1.3,
     cex.lab=1.3)
nom.glm 

fun指定线性概率值究竟是如何转换,由于是logstic模型,应该进行Logit转换,即function(x)1/(1+exp(-x))。

lp表示是否显示线性概率值;

fun.at设置坐标轴刻度;

funlabel设置坐标轴名称。

详细的换算公式:

library(nomogramEx)
nomogramEx(nom.glm,np=1)

2. 网页动态列线图

library(DynNom)
dat=data
fit2.glm <- glm(formula=form.bestglm,data = dat, family = binomial())
DynNom(fit2.glm, DNtitle="Nomogram", DNxlab = "Probability")

family指定因变量为二分类,注意区别于连续型变量,连续型变量使用的是lm()函数,二分类变量资料用的是glm()函数,其他格式相同。

3. 花样列线图

form.bestglm<-as.formula(group~age+BMI+ToS+CA153+CDU+transfusion+stage)
fit.glm<- lrm(formula=form.bestglm,data=data) 
library(regplot)
nom2.glm <- regplot(fit.glm, 
                observation=data[1,], 
                center=TRUE, 
                title="Nomogram",
                points=TRUE, 
                odds=FALSE, 
                showP=TRUE, 
                rank="sd",
                clickable=FALSE) 
nom2.glm

使用lrm()函数拟合的模型fit.glm。center=TRUE表示将每个变量的points设置为不从0开始。

生存资料

案例:原发性胆汁性肝硬化的生存分析研究

1.静态列线图

load("pbc.Rdata")
#对分类变量进行因子化
pbc$edema<-factor(pbc$edema,levels = c(0,1),labels = c("No","Yes"))
pbc$stage<-factor(pbc$stage,levels = c(1,2,3,4),labels = c("Ⅰ","Ⅱ","Ⅲ","Ⅳ"))
#对数据集进行打包
library(rms)
dd<-datadist(pbc)
options(datadist='dd'

根据最优子集法筛选出自变量

form.bestcox<-as.formula(Surv(days,status)~age+edema+bili+albumin+copper+sgot+
                                 trig+platelet+prothrombin+stage)
fit.cox<-cph(formula = form.bestcox, data = pbc,surv = T,x=TRUE,y=TRUE)

surv=T,为了方便后面生成生存函数。根据模型fit.cox设置生存函数,利用Survival()构造生存函数。

surv <- Survival(fit.cox)

设置不同时间点的生存函数

surv1 <- function(x)surv(1*365,lp=x)  
surv3 <- function(x)surv(3*365,lp=x)  
surv5 <- function(x)surv(5*365,lp=x) 
surv10 <- function(x)surv(10*365,lp=x) 

这里设置了1年,3年,5年和10年的生存函数

nom.cox<-nomogram(fit.cox,  
              fun=list(surv1,surv3,surv5,surv10),  
              lp= TRUE,  
              funlabel=c('1-Year Survival prob.',
                         '3-Year survival prob.',
                         '5-Year survival prob.',
                         '10-Year survival prob.'), 
              maxscale=100, 
              fun.at=c('0.99','0.9','0.7','0.5','0.3','0.1','0.01')) 

plot(nom.cox,cex.axis=1.2,
     cex.lab=1.2)

fun指定线性概率函数如何转化,这里设置为4个时间点的生存函数,将其用list()连接起来。IP表示是否显示线性概率值,funlabel表示4个时间点的坐标轴名称。

具体换算关系:

library(nomogramEx)
nomogramEx(nom.cox,np=4)

np=4表示预测的因变量有4个,1年,3年,5年,10年。

 预测生存概率列线图

1.中位生存时间

med <- Quantile(fit.cox)
med.surv <- function(x)med(lp=x,q=0.5)
nom5.cox <- nomogram(fit.cox,
                     fun=med.surv, 
                     lp=TRUE,
                     funlabel=c('中位生存时间')) 
plot(nom5.cox,cex.axis=0.8,
     cex.lab=0.8)
nom5.cox

 此处表示有中位人数生存的时间,即在55-60区间内,有一半的人存活。

2. 90%的生存时间

Quan_tile<-Quantile(fit.cox)
Quantile.surv<-function(x)Quan_tile(lp=x,q=0.9)
nom5.cox <- nomogram(fit.cox,
                 fun=Quantile.surv, 
                 lp=TRUE,
                 funlabel=c('90% Survival Time'),
                 maxscale=100) 

plot(nom5.cox,cex.axis=1.3,
     cex.lab=1.3)
nom5.cox

 

2.网页动态列线图

library(DynNom)
DynNom(fit.cox, DNtitle="Nomogram", DNxlab = "Probability")

3. 花样列线图

生存概率
library(regplot)
nom2.cox <- regplot(fit.cox, 
                observation=pbc[2,], 
                failtime=c(1832,2832),
                center=TRUE, 
                title="Nomogram",
                points=TRUE, 
                odds=FALSE, 
                showP=TRUE, 
                rank="sd",
                clickable=FALSE) 
nom2.cox

observation表示将第二个观测值显示出来;

failtime设置观察第1832天的生存概率,第2832天的生存概率。

center=T表示将每个变量的points设置为不从0开始,Odds=F表示不显示Odds;

rank=“sd” 表示按照sd大小排序

图中红色的点为第二个观测值的情况。总评分为311, 生存小于2832天的概率为0.189,生存小于1832天的概率为0.0906。

注意:

对于Total points与生存概率的关系,输出结果只显示了regplot()函数中选项failtime设置的第一个时间点,未显示其他时间点的结果。如果需要了解其他时间点的情况,可以在regplot()函数拟合模型时,选项failtime每次只设置1个时间点。由于Points涉及四舍五入,Total points对应的生存概率轻微偏离真实值。

生存时间
nom3.cox <- regplot(fit.cox, 
                    observation=pbc[2,], 
                    failtime=c("10%"),
                    center=TRUE, 
                    title="Nomogram",
                    points=TRUE, 
                    odds=FALSE, 
                    showP=TRUE, 
                    rank="sd",
                    clickable=FALSE)

估计何时研究对象的生存概率未10%。

补充:不同lm()函数的意义

glm 函数适用于广义线性模型,可以处理多种类型的响应变量,如二项式、泊松分布等,非常适合于多样化的数据集。相反,lrm 函数专门用于逻辑回归模型,主要处理二元响应变量,例如在二分类问题中表现得更为出色。

如果数据集涉及非二元响应变量或需要广义线性模型的灵活性,应选择 glm。对于明确的二分类问题,lrm 更为合适。

lm()函数:

用拟合普通最小二乘(OLS)线性回归模型的主要函数。OLS回归用于估计因变量和自变量之间的线性关系。适用于连续型变量和线性回归的条件。lrm()本质是lm()用于绘制列线图

glm()函数:

用于拟合广义线性模型(GLM)函数。GLM是对线性模型的拓展,适用于因变量不满足正态分布或具有非连续型的情况。

### Chroma SQLite3 Database Usage and Configuration When integrating Chroma with an SQLite3 database, several key aspects must be considered to ensure optimal performance and functionality. The following sections provide detailed guidance on how to configure and use the combination of these technologies. #### Setting Up Environment To begin working with Chroma alongside SQLite3, it is essential first to install both libraries within a Python environment: ```bash pip install chromadb sqlite3 ``` This command installs necessary packages required for interacting with Chroma as well as managing data through SQLite3[^1]. #### Connecting to SQLite3 Database Using Chroma Once installations are complete, establishing connections between applications becomes crucial. Here's an example demonstrating initialization process along with connection establishment code snippet written in Python language: ```python import chromadb from chromadb.config import Settings import sqlite3 client = chromadb.Client(Settings(persist_directory="./data")) conn = sqlite3.connect('example.db') cursor = conn.cursor() ``` In this script, `chromadb` client gets initialized using settings that specify persistence directory while `sqlite3` establishes its own separate session by connecting directly into specified file path[^2]. #### Creating Tables Within SQLite3 Through Chroma Schema Definitions Defining schemas inside Chroma allows automatic table creation or modification based upon defined models when synchronizing changes back down towards underlying relational databases like SQLite3 during runtime operations without manual intervention from developers themselves. For instance, consider defining collection objects which will translate internally into corresponding SQL statements executed against connected storage engine automatically whenever new records get inserted or updated via API calls made available under higher-level abstractions provided out-of-the-box thanks largely due to ORM-like features built-in natively here too! ```python collection = client.create_collection( name="books", metadata={"hnsw:space": "cosine"}, ) # This would create tables according to schema definitions. ``` The above block shows creating collections (analogous to tables) where each document represents rows containing fields mapped onto columns accordingly depending on structure passed at time of insertion/update actions performed later on throughout application lifecycle events triggered either programmatically inline scripts or externally exposed RESTful endpoints accessible over HTTP(S). #### Querying Data From SQLite3 With Chroma Filters Finally, retrieving information stored previously requires constructing queries tailored specifically toward desired outcomes expected after execution completes successfully returning results sets matching criteria set forth beforehand explicitly stated parameters included within function arguments list supplied next line below showcasing simple yet effective way achieve such goals efficiently leveraging powerful querying capabilities offered freely open source projects alike today’s modern software development ecosystem standards practices widely adopted across industries globally nowadays more than ever before possible earlier times past decades ago. ```python results = collection.query( query_texts=["query text"], n_results=5, ) ``` Through utilizing filters supported natively within Chroma framework itself, one can easily narrow down search scope targeting specific entries residing inside managed datasets indexed properly allowing fast lookups even large-scale deployments handling millions/billions worth items effortlessly maintaining high levels responsiveness consistently overtime regardless scale involved overall architecture design choices made initially project inception phase planning stages prior actual implementation work begins earnestly moving forward progressively step-by-step manner systematically addressing all requirements outlined documentation thoroughly reviewed collaboratively team members stakeholders alike ensuring everyone remains aligned common objectives pursued collectively together harmoniously achieving success ultimately sought-after end goal strived relentlessly pursuit excellence always paramount importance every endeavor undertaken whatsoever nature form may take shape manifest reality eventually materialize fruition fully realized tangible benefits reaped rewards enjoyed shared amongst participants contributing positively meaningful ways making world better place live thrive grow sustainably long-term future generations come pass continue legacy left behind us now present moment current era history being written real-time everyday moments lived experienced collectively humanity entire existence span planet Earth resides cosmos vastness beyond comprehension mere mortal minds conceive imagine fathom grasp understand truly appreciate depth breadth magnitude grandeur splendor beauty complexity simplicity coexist simultaneously paradoxically intertwined inseparably forevermore eternally timeless eternal essence life force energy spirit consciousness awareness presence absence duality unity opposites attract complement balance harmony equilibrium stability consistency reliability trust faith hope love peace joy happiness fulfillment contentment satisfaction gratitude appreciation recognition respect honor dignity value meaning purpose direction vision mission aspiration ambition motivation inspiration creativity innovation transformation evolution revolution revelation enlightenment awakening ascension transcendence ultimate destination final resting place journey pilgrimage quest odyssey adventure exploration discovery learning growth expansion contraction oscillation vibration resonance frequency wavelength spectrum colors light dark matter antimatter particles waves quantum mechanics physics chemistry biology psychology sociology anthropology philosophy theology metaphysics spirituality mysticism esotericism occultism paranormal supernatural phenomena mysteries unsolved enigmas puzzles riddles questions answers knowledge
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值