协变量选择
Stepwise approach
这里只介绍基于AIC的stepwise方法,还有基于p值的方法,太繁琐了,还不如用AIC,所以不做赘述。
library(survival)
library(MASS)
#stepAIC函数在该包中
Rossi=read.table(choose.files(),header=TRUE)
#rossi是txt文件,header识别第一行为变量
attach(Rossi)
S=Surv(week,arrest)
Scope=list(upper=~fin+age+race+wexp+mar+paro+prio+factor(educ),lower=~1)
#若加上^2则考虑2 way interaction,此处未考虑interaction
phm_0=coxph(S~1)
phm_f=stepAIC(phm_0,Scope,direction = "both")
Start: AIC=1350.76
S ~ 1
Df AIC
+ age 1 1337.5
+ prio 1 1340.8
+ wexp 1 1343.1
+ factor(educ) 4 1347.0
+ mar 1 1348.1
+ fin 1 1348.9
<none> 1350.8
+ race 1 1352.2
+ paro 1 1352.4
Step: AIC=1337.49
S ~ age
Df AIC
+ prio 1 1329.1
+ wexp 1 1336.4
+ fin 1 1336.5
<none> 1337.5
+ mar 1 1337.5
+ factor(educ) 4 1337.6
+ paro 1 1338.6
+ race 1 1338.9
- age 1 1350.8
Step: AIC=1329.08
S ~ age + prio
Df AIC
+ fin 1 1327.7
+ mar 1 1329.0
<none> 1329.1
+ race 1 1329.9
+ wexp 1 1330.0
+ paro 1 1330.9
+ factor(educ) 4 1332.2
- prio 1 1337.5
- age 1 1340.8
Step: AIC=1327.71
S ~ age + prio + fin
Df AIC
+ mar 1 1327.3
<none> 1327.7
+ race 1 1328.2
+ wexp 1 1328.6
- fin 1 1329.1
+ paro 1 1329.4
+ factor(educ) 4 1330.5
- prio 1 1336.5
- age 1 1338.3
Step: AIC=1327.35
S ~ age + prio + fin + mar
Df AIC
<none> 1327.3
- mar 1 1327.7
+ race 1 1328.2
+ wexp 1 1328.8
- fin 1 1329.0
+ paro 1 1329.2
+ factor(educ) 4 1330.2
- age 1 1335.4
- prio 1 1336.2
剩下的只有race, wexp, paro,接下来可以对它们中的一些进行scale check,从而得到最优的模型
Purposeful approach
这种方法主客观结合,先对每个协变量做univariate fit,然后选出有影响的几个合起来做multivariate fit,之后可以检测其interaction,以及scale的调整,太多了就不过多论述,直接放过程如下:
addicts=read.table(choose.files(),header=TRUE)
library(survival)
attach(addicts)
S=Surv(survt,status)
phmc=coxph(S~clinic)
phmp=coxph(S~prison)
phmd=coxph(S~dose)
summary(phmc)
summary(phmp)
summary(phmd)
#univariate fits,drop prison
phm_m1=coxph(S~clinic+dose)
summary(phm_m1)
#multivariate fit,没出现p值突然升高,所以判断没有interaction
phm_m2=coxph(S~clinic+I(dose^2))
phm_m3=coxph(S~clinic+I(dose^(-1)))
phm_m4=coxph(S~clinic+I(sqrt(dose)))
phm_m5=coxph(S~clinic+I(log(dose)))
phm_m6=coxph(S~clinic+dose+I(dose^2)+I(dose^3))
extractAIC(phm_m1)[2]
extractAIC(phm_m2)[2]
extractAIC(phm_m3)[2]
extractAIC(phm_m4)[2]
extractAIC(phm_m5)[2]
extractAIC(phm_m6)[2]
#m2表现最好
phm_m7=coxph(S~clinic+dose+I(dose^2))
extractAIC(phm_m7)[2]
#m7不行
phm_m8=coxph(S~clinic*I(dose^2))
summary(phm_m8)
#尝试interaction,效果不好
综上最佳的模型是phm_m2的形式。