http://bbs.sciencenet.cn/blog-252888-719677.html
生存分析(survival analysis)适合于处理时间-事件数据。例如中风病人从首次发病到两次复发,其中就涉及到时间和事件。此例中时间就是复发的时间间隔,事件就是是否复发。如果用普通的线性回归对复发时间进行分析,就需要去除那些没有复发的病人样本。如果用Logistic回归对是否复发进行分析,就没有用到时间这个因素。而生存分析同时考虑时间和事情这两个因素,效果会更好些。
在R语言中我们可以使用survival包进行生存分析,其中主要的函数功能罗列如下:
Surv:用于创建生存数据对象
survfit:创建KM生存曲线或是Cox调整生存曲线
survdiff:用于不同组的统计检验
coxph:构建COX回归模型
cox.zph:检验PH假设是否成立
survreg:构建参数模型
下面是使用一个实例来使用R中的生存分析函数,其中用到的数据集可以在 这里下载 。
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788 |
# Example from Survival Analysis- A Self-Learning Text, Third Edition
library
(survival
)
addicts
<- read.table
(
'ADDICTS.txt'
,
T
)
names
(addicts
)
<- c
(
'id'
,
'clinic'
,
'status'
,
'survt'
,
'prison'
,
'dose'
)
# 1. 估计生存函数,观察不同组间的区别
# 建立生存对象
Surv
(addicts
$survt
,addicts
$status
==
1
)
# 估计KM生存曲线
y
<- Surv
(addicts
$survt
,addicts
$status
==
1
)
kmfit1
<- survfit
(y
~
1
)
summary
(kmfit1
)
plot
(kmfit1
)
# 根据clinic分组估计KM生存曲线
kmfit2
<- survfit
(y
~addicts
$clinic
)
plot
(kmfit2
, lty
= c
(
'solid'
,
'dashed'
), col
=c
(
'black'
,
'blue'
),
xlab
=
'survival time in days'
,ylab
=
'survival probabilities'
)
legend
(
'topright'
, c
(
'Clinic 1'
,
'Clinic 2'
), lty
=c
(
'solid'
,
'dashed'
),
col
=c
(
'black'
,
'blue'
))
# 检验显著性
survdiff
(Surv
(survt
,status
)
~clinic
, data
=addicts
)
# 用strata来控制协变量的影响
survdiff
(Surv
(survt
,status
)
~ clinic
+strata
(prison
),data
=addicts
)
# 2. 用图形方法检验PH假设
plot
(kmfit2
,fun
=
'cloglog'
,xlab
=
'time in days using logarithmic
scale'
,ylab
=
'log-log survival'
, main
=
'log-log curves by clinic'
)
# 不平行,不符合PH假设
# 3. 构建COX PH回归模型
y
<- Surv
(addicts
$survt
,addicts
$status
==
1
)
coxmodel
<- coxph
(y
~ prison
+ dose
+ clinic
,data
=addicts
)
summary
(coxmodel
)
# 两模型选择
mod1
<- coxph
(y
~ prison
+ dose
+ clinic
,data
=addicts
)
mod2
<- coxph
(y
~ prison
+ dose
+ clinic
+ clinic
*prison
+ clinic
*dose
, data
=addicts
)
anova
(mod1
,mod2
)
stepAIC
(mod2
)
# 简洁模型更好
# 风险预测
predict
(mod1
,newdata
=pattern1
,
type
=
'risk'
)
# 4. 构建一个stratified Cox model.
# 当PH假设在clinic不成立,控制这个变量
mod3
<- coxph
(y
~ prison
+ dose
+
strata
(clinic
),data
=addicts
)
summary
(mod3
)
# 5.对PH假设进行统计检验
mod1
<- coxph
(y
~ prison
+ dose
+ clinic
,data
=addicts
)
cox.zph
(mod1
,transform
=rank
)
# P值小显示PH假设不符合
# 显示系数变化图
plot
(cox.zph
(mod1
,transform
=rank
),se
=
F
,var
=
'clinic'
)
# 6. 得到COX调整后生存曲线
mod1
<- coxph
(y
~ prison
+ dose
+ clinic
,data
=addicts
)
pattern1
<- data.frame
(prison
=
0
,dose
=
70
,clinic
=
2
)
summary
(survfit
(mod1
,newdata
=pattern1
))
plot
(survfit
(mod1
,newdata
=pattern1
),conf.int
=
F
)
mod3
<- coxph
(y
~ prison
+ dose
+
strata
(clinic
),data
=addicts
)
pattern2
<- data.frame
(prison
=
.46
,dose
=
60.40
)
plot
(survfit
(mod3
,newdata
=pattern2
),conf.int
=
F
)
# 7. 构建参数模型
modpar1
<- survreg
(Surv
(addicts
$survt
,addicts
$status
)
~
prison
+dose
+clinic
,data
=addicts
,
dist
=
'exponential'
)
summary(modpar1) 需要事先加载包 library(MASS) |