上一篇请见:MOVICS:分子分型一站式R包(01)
COMP Module
这部分就是探索不同亚型了。
生存分析
既然分成了5个亚型,肯定是要看看5个亚型的生存情况有没有区别,生存分析走一波:
# survival comparison
surv.brca <- compSurv(moic.res = cmoic.brca,
surv.info = surv.info,
convt.time = "m", # 把天变成月
surv.median.line = "h",
xyrs.est = c(5,10), # 计算5年和10年生存率
fig.name = "KAPLAN-MEIER CURVE OF CONSENSUSMOIC")
## --a total of 643 samples are identified.
## --removed missing values.
## --leaving 642 observations.
查看结果:
print(surv.brca)
## $fitd
## Call:
## survdiff(formula = Surv(futime, fustat) ~ Subtype, data = mosurv.res,
## na.action = na.exclude)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Subtype=CS1 146 28 14.6 12.269 15.455
## Subtype=CS2 132 17 15.1 0.251 0.317
## Subtype=CS3 107 7 13.4 3.028 3.756
## Subtype=CS4 144 7 17.4 6.218 8.175
## Subtype=CS5 113 16 14.6 0.140 0.176
##
## Chisq= 22.3 on 4 degrees of freedom, p= 2e-04
##
## $fit
## Call: survfit(formula = Surv(futime, fustat) ~ Subtype, data = mosurv.res,
## na.action = na.exclude, error = "greenwood", type = "kaplan-meier",
## conf.type = "plain")
##
## n events median 0.95LCL 0.95UCL
## CS1 146 28 130 67.3 NA
## CS2 132 17 114 83.6 144
## CS3 107 7 NA 102.5 NA
## CS4 144 7 216 113.5 NA
## CS5 113 16 NA 97.2 NA
##
## $xyrs.est
## Call: survfit(formula = Surv(futime, fustat) ~ Subtype, data = mosurv.res)
##
## Subtype=CS1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1825 23 23 0.658 0.0647 0.543 0.798
## 3650 4 4 0.509 0.0830 0.370 0.701
##
## Subtype=CS2
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1825 27 7 0.855 0.0541 0.755 0.968
## 3650 3 8 0.421 0.1286 0.231 0.766
##
## Subtype=CS3
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1825 19 4 0.851 0.0701 0.724 1.000
## 3650 5 3 0.644 0.1183 0.449 0.923
##
## Subtype=CS4
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1825 25 4 0.917 0.0443 0.834 1
## 3650 3 2 0.550 0.2027 0.267 1
##
## Subtype=CS5
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1825 25 12 0.821 0.0502 0.728 0.926
## 3650 3 4 0.503 0.1599 0.269 0.938
##
##
## $overall.p
## [1] 0.000172211
##
## $pairwise.p
##
## Pairwise comparisons using Log-Rank test
##
## data: mosurv.res and Subtype
##
## CS1 CS2 CS3 CS4
## CS2 0.11909 - - -
## CS3 0.01345 0.11909 - -
## CS4 0.00055 0.03497 0.72167 -
## CS5 0.16073 0.87433 0.16073 0.03022
##
## P value adjustment method: BH
比较临床特征
查看每个亚型的临床特征,类似于基线资料表。
clin.brca <- compClinvar(moic.res = cmoic.brca,
var2comp = surv.info, #需要比较的临床信息,行名须是样本名
strata = "Subtype", # 分层变量,这里肯定是分型了
factorVars = c("PAM50","pstage","fustat"), #分类变量名字
nonnormalVars = "futime", #非正态的连续性变量
exactVars = "pstage", #需要使用精确概率法的变量
doWord = TRUE, # 自动生成Word文档
tab.name = "SUMMARIZATION OF CLINICAL FEATURES")
## --all samples matched.
## Registered S3 methods overwritten by 'proxy':
## method from
## print.registry_field registry
## print.registry_entry registry
查看结果:
print(clin.brca$compTab)
## level CS1
## 1 n 146
## 2 fustat (%) 0 118 (80.8)
## 3 1 28 (19.2)
## 4 futime (median [IQR]) 719.00 [431.50, 1355.00]
## 5 PAM50 (%) Basal 2 ( 1.4)
## 6 Her2 38 (26.0)
## 7 LumA 47 (32.2)
## 8 LumB 53 (36.3)
## 9 Normal 6 ( 4.1)
## 10 pstage (%) T1 40 (27.4)
## 11 T2 85 (58.2)
## 12 T3 12 ( 8.2)
## 13 T4 9 ( 6.2)
## 14 TX 0 ( 0.0)
## 15 age 58.47 ± 13.75
## CS2 CS3 CS4
## 1 133 107 144
## 2 115 (86.5) 100 (93.5) 137 (95.1)
## 3 18 (13.5) 7 ( 6.5) 7 ( 4.9)
## 4 746.50 [432.25, 1420.25] 703.00 [436.00, 1270.50] 848.50 [470.50, 1547.75]
## 5 0 ( 0.0) 0 ( 0.0) 0 ( 0.0)
## 6 0 ( 0.0) 0 ( 0.0) 0 ( 0.0)
## 7 78 (58.6) 91 (85.0) 128 (88.9)
## 8 55 (41.4) 16 (15.0) 0 ( 0.0)
## 9 0 ( 0.0) 0 ( 0.0) 16 (11.1)
## 10 35 (26.3) 32 (29.9) 40 (27.8)
## 11 77 (57.9) 63 (58.9) 72 (50.0)
## 12 17 (12.8) 10 ( 9.3) 31 (21.5)
## 13 4 ( 3.0) 2 ( 1.9) 1 ( 0.7)
## 14 0 ( 0.0) 0 ( 0.0) 0 ( 0.0)
## 15 59.32 ± 14.03 59.62 ± 12.40 57.58 ± 12.23
## CS5 p test
## 1 113
## 2 97 (85.8) 0.001
## 3 16 (14.2)
## 4 742.00 [470.00, 1605.00] 0.691 nonnorm
## 5 109 (96.5) <0.001
## 6 0 ( 0.0)
## 7 0 ( 0.0)
## 8 0 ( 0.0)
## 9 4 ( 3.5)
## 10 25 (22.1) 0.036 exact
## 11 70 (61.9)
## 12 14 (12.4)
## 13 3 ( 2.7)
## 14 1 ( 0.9)
## 15 55.67 ± 12.11 0.116
突变全景图
比较突变频率,也就是突变全景图,也是基于complexheatmap
画出来的,类似于maftools
:
# mutational frequency comparison
mut.brca <- compMut(moic.res = cmoic.brca,
mut.matrix = brca.tcga$mut.status, # 0/1矩阵
doWord = TRUE, # 生成Word文档
doPlot = TRUE, # draw OncoPrint
freq.cutoff = 0.05, # 保留在至少5%的样本中突变的基因
p.adj.cutoff = 0.05, # 保留padj<0.05的基因
innerclust = TRUE, # 在每个亚型中进行聚类
annCol = annCol, # same annotation for heatmap
annColors = annColors, # same annotation color for heatmap
width = 6,
height = 2,
fig.name = "ONCOPRINT FOR SIGNIFICANT MUTATIONS",
tab.name = "INDEPENDENT TEST BETWEEN SUBTYPE AND MUTATION")
## --all samples matched.
print(mut.brca)
## Gene (Mutated) TMB CS1 CS2 CS3 CS4
## 1 PIK3CA 208 (32%) 55 (37.7%) 10 ( 7.5%) 77 (72.0%) 64 (44.4%)
## 2 TP53 186 (29%) 83 (56.8%) 12 ( 9.0%) 0 ( 0.0%) 8 ( 5.6%)
## 3 TTN 111 (17%) 41 (28.1%) 14 (10.5%) 14 (13.1%) 22 (15.3%)
## 4 CDH1 83 (13%) 11 ( 7.5%) 3 ( 2.3%) 19 (17.8%) 49 (34.0%)
## 5 GATA3 58 ( 9%) 5 ( 3.4%) 32 (24.1%) 10 ( 9.3%) 11 ( 7.6%)
## 6 MLL3 49 ( 8%) 11 (7.5%) 12 (9.0%) 10 (9.3%) 11 (7.6%)
## 7 MUC16 48 ( 8%) 18 (12.3%) 8 ( 6.0%) 6 ( 5.6%) 8 ( 5.6%)
## 8 MAP3K1 38 ( 6%) 2 ( 1.4%) 6 ( 4.5%) 16 (15.0%) 12 ( 8.3%)
## 9 SYNE1 33 ( 5%) 9 (6.2%) 1 (0.8%) 5 (4.7%) 8 (5.6%)
## CS5 pvalue padj
## 1 2 ( 1.8%) 6.90e-42 3.10e-41
## 2 83 (73.5%) 2.06e-63 1.85e-62
## 3 20 (17.7%) 2.03e-03 3.04e-03
## 4 1 ( 0.9%) 5.38e-19 1.61e-18
## 5 0 ( 0.0%) 6.21e-11 1.40e-10
## 6 5 (4.4%) 6.31e-01 6.31e-01
## 7 8 ( 7.1%) 2.05e-01 2.31e-01
## 8 2 ( 1.8%) 3.65e-05 6.57e-05
## 9 10 (8.8%) 3.27e-02 4.20e-02
比较突变负荷
毋庸置疑,免疫治疗正在成为现代癌症治疗的支柱。最近的分析将肿瘤基因组景观与抗肿瘤免疫联系起来。特别是,新兴的研究显示,肿瘤特异性基因组损伤与免疫检查点激活以及患者对免疫治疗的反应程度和持续时间有关。这些损伤包括高突变负荷和非整倍体情况。为了定量这些可能影响免疫治疗的基因组改变,MOVICS提供了两个函数来计算总突变负荷(TMB)和基因组改变比例(FGA)。具体而言,TMB指的是在肿瘤基因组中发现的突变数量,而FGA是受复制数增减影响的基因组百分比。这两个属性对遗传学研究人员非常有用,因为它们为他们提供了更深入的关于肿瘤基因组构成的信息。我们从
compTMB()
开始。首先,用于此函数的输入maf
数据必须至少具有以下10列:
names(maf)
## [1] "Tumor_Sample_Barcode" "Hugo_Symbol" "Chromosome"
## [4] "Start_Position" "End_Position" "Variant_Classification"
## [7] "Variant_Type" "Reference_Allele" "Tumor_Seq_Allele1"
## [10] "Tumor_Seq_Allele2"
head(maf)
## Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position
## 1 BRCA-A1XY-01A USP24 chr1 55159655 55159655
## 2 BRCA-A1XY-01A ERICH3 chr1 74571494 74571494
## 3 BRCA-A1XY-01A KIF26B chr1 245419680 245419680
## 4 BRCA-A1XY-01A USP34 chr2 61189055 61189055
## 5 BRCA-A1XY-01A ANTXR1 chr2 69245305 69245305
## 6 BRCA-A1XY-01A SCN9A chr2 166199365 166199365
## Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1
## 1 Missense_Mutation SNP T T
## 2 Missense_Mutation SNP C C
## 3 Silent SNP G G
## 4 Silent SNP G G
## 5 Silent SNP G G
## 6 Silent SNP G G
## Tumor_Seq_Allele2
## 1 C
## 2 T
## 3 T
## 4 C
## 5 A
## 6 A
关于肿瘤突变负荷,我之前也介绍过:1行代码计算肿瘤突变负荷TMB
这个函数可以一次计算不同亚型的TMB,同时展示碱基突变情况:
# compare TMB
tmb.brca <- compTMB(moic.res = cmoic.brca,
maf = maf,
rmDup = TRUE, # remove duplicated variants per sample
rmFLAGS = FALSE, # keep FLAGS mutations
exome.size = 38, # estimated exome size
test.method = "nonparametric", # statistical testing method
fig.name = "DISTRIBUTION OF TMB AND TITV")
## --67 samples mismatched from current subtypes.
## -Validating
## -Silent variants: 24329
## -Summarizing
## --Possible FLAGS among top ten genes:
## TTN
## MUC16
## -Processing clinical data
## --Missing clinical data
## -Finished in 3.230s elapsed (2.030s cpu)
## Kruskal-Wallis rank sum test p value = 1.97e-23
## post-hoc pairwise wilcoxon rank sum test with Benjamini-Hochberg adjustment presents below:
## CS1 CS2 CS3 CS4
## CS2 "4.71e-11" " NA" " NA" " NA"
## CS3 "5.11e-10" "7.79e-01" " NA" " NA"
## CS4 "1.68e-10" "5.93e-01" "7.79e-01" " NA"
## CS5 "3.18e-01" "1.23e-12" "1.27e-11" "1.27e-11"
查看具体结果:
head(tmb.brca$TMB.dat)
## samID variants TMB log10TMB Subtype
## 570 BRCA-A1EW-01A 9 0.2368421 0.09231426 CS1
## 558 BRCA-A1IO-01A 11 0.2894737 0.11041248 CS1
## 560 BRCA-A0C3-01A 11 0.2894737 0.11041248 CS1
## 541 BRCA-A1NG-01A 14 0.3684211 0.13621975 CS1
## 531 BRCA-A1Y2-01A 15 0.3947368 0.14449227 CS1
## 525 BRCA-A5RY-01A 16 0.4210526 0.15261016 CS1
比较FGA
FGA在上面解释过了。
compFGA()
不仅计算FGA,还会在每个亚型中针对每个样本计算特定的gain(FGG)或loss(FGL)。
需要修改列名为以下名字:
# change column names of segment data
colnames(segment) <- c("sample","chrom","start","end","value")
head(segment)
## sample chrom start end value
## 1 BRCA-A090-01A 1 3301765 54730235 -0.1271
## 2 BRCA-A090-01A 1 54730247 57443819 -0.0899
## 3 BRCA-A090-01A 1 57448465 57448876 -1.1956
## 4 BRCA-A090-01A 1 57448951 64426102 -0.1009
## 5 BRCA-A090-01A 1 64426648 106657734 -0.1252
## 6 BRCA-A090-01A 1 106657854 106835667 0.1371
然后运行即可:
# compare FGA, FGG, and FGL
fga.brca <- compFGA(moic.res = cmoic.brca,
segment = segment,
iscopynumber = FALSE, # this is a segmented copy number file
cnathreshold = 0.2, # threshold to determine CNA gain or loss
test.method = "nonparametric", # statistical testing method
fig.name = "BARPLOT OF FGA")
## --2 samples mismatched from current subtypes.
## 5% 10% 15% 21% 26% 31% 36% 41% 46% 51% 57% 62% 67% 72% 77% 82% 88% 93% 98%
head(fga.brca$summary)
## samID FGA FGG FGL Subtype
## 1 BRCA-A03L-01A 0.6217991 0.30867268 0.31312638 CS1
## 2 BRCA-A04R-01A 0.2531019 0.09132014 0.16178176 CS2
## 3 BRCA-A075-01A 0.7007067 0.41444237 0.28626433 CS1
## 4 BRCA-A08O-01A 0.6501287 0.45648145 0.19364725 CS3
## 5 BRCA-A0A6-01A 0.1468893 0.06356488 0.08332444 CS4
## 6 BRCA-A0AD-01A 0.1722214 0.03864521 0.13357618 CS2
比较药物敏感性
比较不同亚型的药物敏感性。基于pRRophetic
包。
药敏分析我们之前也介绍过了:
# drug sensitivity comparison
drug.brca <- compDrugsen(moic.res = cmoic.brca,
norm.expr=fpkm[,cmoic.brca$clust.res$samID],#确保样本顺序一致
drugs=c("Cisplatin", "Paclitaxel"), #选择药物
tissueType = "breast", # 选择组织类型
test.method = "nonparametric", # statistical testing method
prefix = "BOXVIOLIN OF ESTIMATED IC50")
## --all samples matched.
## --log2 transformation done for expression data.
## Cisplatin done...
## Cisplatin: Kruskal-Wallis rank sum test p value = 1.38e-58
## post-hoc pairwise wilcoxon rank sum test with Benjamini-Hochberg adjustment presents below:
## CS1 CS2 CS3 CS4
## CS2 "5.50e-18" " NA" " NA" " NA"
## CS3 "3.90e-13" "9.78e-02" " NA" " NA"
## CS4 "5.21e-01" "4.54e-24" "3.17e-18" " NA"
## CS5 "2.29e-13" "3.89e-34" "2.57e-30" "3.15e-14"
## Paclitaxel done...
## Paclitaxel: Kruskal-Wallis rank sum test p value = 1.91e-13
## post-hoc pairwise wilcoxon rank sum test with Benjamini-Hochberg adjustment presents below:
## CS1 CS2 CS3 CS4
## CS2 "8.11e-10" " NA" " NA" " NA"
## CS3 "3.87e-09" "8.56e-01" " NA" " NA"
## CS4 "9.12e-11" "6.59e-01" "8.55e-01" " NA"
## CS5 "2.82e-04" "5.20e-02" "4.92e-02" "1.95e-02"
查看具体结果:
head(drug.brca$Cisplatin)
## Est.IC50 Subtype
## BRCA-A03L-01A 3.802060 CS1
## BRCA-A075-01A 3.834646 CS1
## BRCA-A0AW-01A 3.126744 CS1
## BRCA-A0AZ-01A 3.303054 CS1
## BRCA-A0BC-01A 3.409402 CS1
## BRCA-A0BF-01A 3.358053 CS1
比较结果的一致性
目前,许多癌症都有传统的分类方法,评估新亚型与先前分类的一致性对于反映聚类分析的稳健性和确定潜在的新亚型至关重要。为了衡量当前亚型与其他已存在的分类之间的一致性(相似性),MOVICS提供了
compAgree()
函数来计算四个统计量:Rand指数(RI)、调整的互信息(AMI)、Jaccard指数(JI)和Fowlkes-Mallows指数(FM);所有这些度量范围从0到1,值越大,两个评估之间的相似度就越高。该函数还可以生成一张液态图(alluvial-diagram),以当前亚型作为参考,可视化两个评估与当前亚型的一致性。
# customize the factor level for pstage
surv.info$pstage <- factor(surv.info$pstage, levels = c("TX","T1","T2","T3","T4"))
# agreement comparison (support up to 6 classifications include current subtype)
agree.brca <- compAgree(moic.res = cmoic.brca,
subt2comp = surv.info[,c("PAM50","pstage")],
doPlot = TRUE,
box.width = 0.2,
fig.name = "AGREEMENT OF CONSENSUSMOIC WITH PAM50 AND PSTAGE")
## --all samples matched.
查看4个统计量:
print(agree.brca)
## current.subtype other.subtype RI AMI JI FM
## 1 Subtype PAM50 0.6929744 0.399563845 0.2910887 0.4694389
## 2 Subtype pstage 0.5506751 0.004951334 0.1565996 0.2884961
以上是第二部分的主要功能。