20190718
CopywriteR
- mappability:人类基因组有8%未组装(hg19)。重复序列与参考序列比对时存在mappability的问题。重复导致的unmappable区域不仅是实现100%测序的障碍,而且由于它们倾向于导致基因组的不稳定性,使人们对参考基因组的概念产生了怀疑。即使是最简单的串联重复序列在参考基因组中也会被低估,显示出高水平的copy number variation,影响相关基因表达,并引入异染色质,使附近基因沉默。CNVs分型是许多NGS应用于人类复杂疾病研究、取证、疾病标记的目标,但是人们应该记住参考基因组重复区域的不确定性也是容易发生CNV的。
GC-content
normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs.
从技术上讲,目的序列的PCR效率
以及探针与目的序列的结合
都与序列的GC含量
有关。特别是在没有正常样本的情况下,这是一个问题。技术问题可以通过测试成对的正常采样来解决,但我们仍然看到拷贝数效应显示出与gc含量的关系。细胞周期中的复制时间
似乎与gc含量
有关。由于肿瘤
组织的复制周期比正常组织快,这些效应可以在CNA profile中看到。gc含量比复制时间更容易获得,这就是通常使用gc含量标准化的原因。- For the CopywriteR function,
controls are specified as those samples that will be used to identify which regions are 'peaks' and contain on-target reads
. This information will then be used toremove on-target reads in the corresponding sample
.(移除 ont-arget?只用off-target?) - we make sure that in the plotCNA function we can
analyze the tumor samples relative to the corresponding germline samples
. We recommendidentifying on-target and off-target regions based on a germline sample if possible
, as this wouldavoid identifying highly amplified genomic regions in tumor cells as on-target regions
. 基于对照样本来识别靶区和脱靶区,这将避免将肿瘤细胞中高度扩增的基因组区识别为靶区。Nevertheless, we have observed that this effect is negligible in practice, and that CopywriteR analysis without a reference is still highly accurate.
GISTIC2
GISTIC通过两个关键步骤来识别显著性CNV突变。1) 该方法计算涉及CNV出现频率(在整个基因组的所有拷贝数变异中,某个突变的频率 )
和CNV改变幅度
的统计量(G分数
)。每一个样本每个区域各自有幅度,整合起来就是G score。2)通过将观察到的统计数据与偶然的预期结果进行比较,评估每种CNV的统计显著性。使用假阳性发现率(FDR)进行多重假设检验,并为每个结果分配一个q值(越小越好 0.25分界?)
,反映了该事件归因于随机波动的可能性。3)基于G分数和q值,便可以识别样本出显著突变的CNV。G score对应的有significant threshold
gistic结果图可用于story叙述:the locations of the peak regions and the known cancer-related genes within those peaks are indicated to the right of each panels. Several broad regions, including chr7 and chr10, contain superimposed叠加的 focal events, leading to needle-shaped peaks superimposed on highly significant plateaus.
GISTIC 2.0
可通过计算机算法将所有CNV分类为arm-level 和focal SCNVs.- GISTIC2:For each plot, known or interesting candidate genes are highlighted in black when identified by all three analyses, in red when identified by the high amplitude or focal length analyses, in purple when identified by the low amplitude or focal length analyses, and in green when identified only in the focal length analysis
- 绘制CNV图谱(做图)1) 获得感兴趣亚型的gistic scores 2)准备染色体信息:需要根据每条染色体的长度,将所有染色体绘制在同一坐标轴上。3) 绘图:绘制全部样本的gistic score和percentage/frequency图谱。
ABSOLUTE
install.packages("/Users/wangchen/Desktop/local_R_deal/ABSOLUTE_1.0.6.tar.gz",repos = NULL,type = "source")