题目
数据内容
Using the data on the ocurrence of AIDS or death in HIV positive patients, from the AIDS clinical trials group.
变量介绍
The columns in this dataset are(only introduce the variables that are used here):
time: time to AIDS or death(days)
censor: censoring variable
tx: treatment
strat2: CD4 stratum at screening(0 if CD4<=50, 1 if CD4>50)
sex: sex
raceth: race——1 if white and non-Hiispanic; 2 if black and non-Hispanic; 3 if Hispanic; 4 if Asian or Pacific islander; 5 if native American; 6 if other.
ivdrug: IV druguse history(1 if never, 2 if currently, 3 if previously)
hemophil: heamophiliac(1 if yes, 2 if no)
karnof: karnofsky performance scale(100 indicates no evidence of disease, 90 minor symptoms, 80 some symptoms, 70 active work impossible)
priorzdv: months of prior use of ZDV(another drug)
age: age at enrollment.
画图
在同一张图上绘制KM curves 和 survival function using PHM.
library(survival)
library(survminer)
library("nnet")
rm(list = ls())
actg320=read.csv(file.choose())
#读取数据,注意:可以自行定义每列变量名称,在read.csv中使用col.names=c()
attach(actg320)
phm.trt=coxph(Surv(time,censor)~tx)
d.phm.trt=coxph.detail(phm.trt)
times=c(0,d.phm.trt$time)
h0=c(0,d.phm.trt$hazard)
S0=exp(-cumsum(h0))
beta=phm.trt$coefficients
meanx=mean(tx)
x1=c(0)-meanx
Sx1=S0^exp(t(beta)%*%x1)#t()是转置
x2=c(1)-meanx
Sx2=S0^exp(t(beta)%*%x2)
#binary variable 比较简单,直接算就行了不需要变换
xlb='t'; ylb=expression(hat(S)(t))
k=survfit(Surv(time,censor)~tx)
plot(k,col=1:2,lty=2,xlab='t',ylab=expression(hat(S)(t)))
#KM estimator画图
lines(times,Sx1,col=1,type='s')
lines(times,Sx2,col=2,type='s')
#PHM画图
legend("bottomright",1,c('control','IDV'),lty=1,col=c(1,2),bty='n')
detach(actg320)
结果如下:
summary() 输出解读
Concentrating on “blacks” and “Hispanics”.
black=raceth[raceth==2]
t1=time[raceth==2]
censor1=censor[raceth==2]
hispanics=raceth[raceth==3]
t2=time[raceth==3]
censor2=censor[raceth==3]
# 分别取出black和hispanics的数据
t=c(t1,t2)
cens=c(censor1,censor2)
x=c(black,hispanics)
phm.race=coxph(Surv(t,cens)~x)
summary(phm.race)
Call:
coxph(formula = Surv(t, cens) ~ x)
n= 530, number of events= 41
coef exp(coef) se(coef) z Pr(>|z|)
x 0.4055 1.5001 0.3127 1.297 0.195
exp(coef) exp(-coef) lower .95 upper .95
x 1.5 0.6666 0.8127 2.769
Concordance= 0.555 (se = 0.04 )
Likelihood ratio test= 1.66 on 1 df, p=0.2
Wald test = 1.68 on 1 df, p=0.2
Score (logrank) test = 1.7 on 1 df, p=0.2
结果显示:
首先p值0.195>0.05, 故可以95%的置信水平选择不拒绝原假设,两者的survival function无显著差异。
其次两者的hazard ratio应该是
e
β
(
x
1
−
x
2
)
e^{\beta (x_1-x_2)}
eβ(x1−x2) ,因为
x
1
,
x
2
x_1,x_2
x1,x2分别取2和3,故
H
R
=
e
−
β
=
0.6666
HR=e^{-\beta}=0.6666
HR=e−β=0.6666。
最后,对于HR的confidence interval,无法从output中直接获取,但可以通过已有的量手动计算:
首先通过公式
β
⌢
i
−
β
⌢
j
±
1.96
σ
i
i
+
σ
j
j
−
2
σ
i
j
\overset{\frown} \beta_i-\overset{\frown} \beta_j\pm1.96\sqrt{\sigma_{ii}+\sigma_{jj}-2\sigma_{ij}}
β⌢i−β⌢j±1.96σii+σjj−2σij算出95% confidence interval of the difference
> phm.raceth=coxph(Surv(time,censor)~factor(raceth))
> phm.raceth$coefficients
factor(raceth)2 factor(raceth)3 factor(raceth)4 factor(raceth)5
-0.22574649 0.19012337 1.09181651 0.05463501
> phm.raceth$var
[,1] [,2] [,3] [,4]
[1,] 0.06729250 0.01961863 0.01966865 0.01959378
[2,] 0.01961863 0.06961828 0.01962886 0.01961250
[3,] 0.01966865 0.01962886 0.35309826 0.01961906
[4,] 0.01959378 0.01961250 0.01961906 1.01964539
得到置信区间
−
0.226
−
0.19
±
0.0696
+
0.3531
−
2
∗
0.0196
-0.226-0.19\pm\sqrt{0.0696+0.3531-2*0.0196}
−0.226−0.19±0.0696+0.3531−2∗0.0196
即[-1.0353,0.2033]
HR的置信区间则是
[
e
−
1.0353
,
e
0.2033
]
=
[
0.355
,
1.225
]
[e^{-1.0353},e^{0.2033}]=[0.355,1.225]
[e−1.0353,e0.2033]=[0.355,1.225]