NGS_panel的CAP认证学习笔记

最新推荐文章于 2024-03-18 09:49:39 发布

fanyucai1

最新推荐文章于 2024-03-18 09:49:39 发布

阅读量1.6k

点赞数

分类专栏：医学肿瘤检测

本文链接：https://blog.csdn.net/fanyucai1/article/details/103986682

版权

医学肿瘤检测专栏收录该内容

16 篇文章 11 订阅

订阅专栏

对于基因的定义总体可以划分为两类

    GAD: Gene associated with Mendelian disorder; GADs include genes that meet criteria for definitive, strong, or moderate evidence for association with disease as described by ClinGen
    GUS: Gene of uncertain significance; GUSs include gene that meet the ClinGen categories of limited or dispute evidence

Clinical Genome Resource (ClinGen,www.clinicalgenome.org),大概有600多个基因(https://search.clinicalgenome.org/kb/gene-validity)[8] 该数据库对每个基因进行了分类，针对不同的疾病。分类属于GAD是必须要包含在内的。

在这里插入图片描述

1:此外主要考虑的因素是你检测的对象是SNV(必须)、indels（必须的）、CNAs、SV，另外你的panel必须包含基因的热点区域（例如：PIK3CA的exon9 and 20以及BRAF的exon 15,exons 18 to 21 of EGFR, or exons 12 and 14 of JAK2）另外你也可以决定cover几个重点基因的整个编码区和非编码区（KRAS、NRAS、TP53）。

2:如果要设定copy数目的检测几个常见的例如TP53、PTEN、CDKN2A以及RB1的losses以及ERBB2（HER2）、MET、RICTOR、MDM2的gain在临床上都是很有意义的

3:SV的检测主要中主要体现的是基因融合，例如RET/PTC、TMPRSS2/ERG、EML4/ALK，无论是DNA还是RNA（ctDNA）断点都发生在内含子区域，建议在设计的时候至少向外延伸20bp

4:在探针富集层数上内含子和外显子可以区别对待

5:梯度测试：不同DNA输入量的梯度测试，一篇文章中分别给出了75bp、100bp、150bp、200bp四个不同梯度总共4X7个样本，这个需要在测试完成后需要提出最低起始量和NGS的建议起始量，一般较高的起始量会得到较低的Duplication,因此做完了梯度测试应该有类似以下的三个图：
在这里插入图片描述
6:可重复性

一般是过CAP要自己测序，对于同样的样本可以选择重复测序3次也就是3个RUN，样本频率的范围选择是0-0.7,如下是总共考察了17个样本，每个样本重复用3个独立的实验，总共是17X3X3=153个实验
实验完后应得到如下图的结果：
在这里插入图片描述
7:检测下限（Lower Limit of Detection）

将12个样本为肿瘤纯度在80%-100%的样本进行稀释，按照100%、50%、20%，也是重复三次，得到如下结果
在这里插入图片描述
8:数据追溯

FASTQ、BAM、VCF

9：样本接收类型（可参考专家共识）

10：target区域描述，可参考FoundationOne的描述以表格的形式呈现（表2和表3）
在这里插入图片描述

11：样本测序质控metrix的一个例子[10]

12:关于变异位点的解释可以参考文献[6]

13:目前的生信流程针对Indels的分析其长度一般为<=21bp，根据文献[7]

14:在没有真实数据的时候，你可以用BAmsurgeon https://github.com/adamewing/bamsurgeon/ 进行数据模拟变异，来首先检测你的数据分析流程

15:另外还有一篇文章极具参考价值[11]

1:关于panel设计Baits were designed by taking overlapping 120 bp DNA sequence intervals covering target exons (60 bp overlap) and introns (20 bp overlap), with a minimum of three baits per target; SNP targets were allocated one bait each. Intronic baits were filtered for repetitive elements46 as defined by the UCSC Genome RepeatMasker track

2:本篇文章使用GATK进行Call变异，对于SNP和indel的过滤不通，可以参考这篇文章,对于碱基变异Final calls are made at MAF ≥ 5% (MAF ≥ 1% at hotspots)，对于indel分析的阈值是Filtering of indel candidates was carried out as described for base substitutions above (strand bias P < 1e-10, MAF ≥ 3% at hotspots), with an empirically increased MAF threshold at repeats and adjacent sequence quality metrics as implemented in GATK: percentage of neighboring base mismatches <25%, average neighboring base quality >25, average number of supporting read mismatches ≤2.

3:本篇文章对于基因融合的过滤条件是要有10条reads支持(clusters containing at least 10 chimeric pairs)

4:对于样本之间污染判断，该篇文章选取了与panel重合的 5,801 SNPs (marked coding-synonymous, missense, or nonsense), homozygous (MAF > 90%) or heterozygous (40% ≤ MAF ≤ 60%) state