●somatic mutations 包括:SNVs,Indel,CNAs,SVs等
indels :短片段插入缺失
SNVs : 单核苷酸变异
CNAS: 拷贝数异常
SVs: 结构变异
These somatic mutations range in scale from single-nucleotide variants ( SNVs), insertions and deletions of a few to a few dozen nucleotides (indels), larger copy-number aberrations (CNAs) and large-genome rearrangements, also called structural variants(SVs)。
●高通量测序面临的挑战:
1. indentying somatic mutations,误差/ 异质性
2. 识别driver genes
3. 确定由somatic mutations 改变的通路和其它生物过程
●误差(假阳性,假阴性)来源:
PCR 重复,GC偏移,链偏移, 难校准
The process of detecting somatic mutations from aligned reads is not straightforward. Numerous errors and artifacts are introduced during both the sequencing and the alignment processes including:optical PCR duplicates,GC-bias,strand bias (where reads indicating a possible mutation only align to one strand of DNA) and alignment artifacts resulting from low complexity or repetitive regions in the genome.
●Detecting somatic mutations
建议可以用多个软件处理数据
These differences demonstrate that the performance of methods can vary by dataset, and suggest that running multiple methods is advisable at present.
网络分析当前仅提供蛋白质间的相互作用图片,但能克服通路分析的局限性。
仍然受到交互网络的质量和覆盖范围的限制。
GSEA(Gene Set Enrichment Analysis,基因集富集分析 ) 的基本思想是使用预定义的基因集,通常来自功能注释或先前实验的结果,将基因按照在两类样本中的差异表达程度排序,然后检验预先设定的基因集合是否在这个排序表的顶端或者底端富集。基因集合富集分析检测基因集合而不是单个基因的表达变化,因此可以包含这些细微的表达变化,预期得到更为理想的结果。
HotNet:可以处理大规模,更复杂的数据.用热扩散模型来编码基因的突变频率和局部的交互网络拓扑模型。
克服统计问题,HotNet也用了大量的新方法
Hotnet已经被用于识别急性髓细胞白血病,卵巢癌,和其他类型的癌症中的重要网络.
已发现16个重要的基因网络, 其中几个与已知的促癌途径和基因有关,包括p53和NOTCH通路。