Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
文献链接:https://www.biorxiv.org/content/10.1101/2021.02.26.433126v1.full
研究背景
1、genetic variants的发现
WGS | ATAC-seq | |
---|---|---|
特点 | 成本高,大部分的reads来自非调控区、非编码区 | 低成本,reads来自调控区(有功能的突变) |
应用 | genetic variants discovery常用方法 | 未被系统评估 |
之前有一些研究评估了sc RNA-seq数据中的单核苷突变检测方法的表现(参考链接:https://pubmed.ncbi.nlm.nih.gov/31744515/ ),但是尚未有研究系统评估一些单核苷突变检测方法在ATAC-seq数据上的表现。
(1)评估7种variant callers工具对bulk ATAC-seq和sc ATAC-seq数据,在SNVs和indels方面的预测效果。
(2)整合上述variants callers的结果,并开发出整体表现具有显著优势的VarCA预测工具。
主要结果
1、Variant Callers的表现(前几名)
- SNV discovery:GATK、VarScan2、VarDict
- Indels discovery:GATK、VarScan2、Menta、VarDict
2、VarCA的表现
bulk ATAC-seq | sc ATAC-seq | |
---|---|---|
SNVs | precision:0.99|recall:0.95 | precision:0.98|recall:0.94 |
indels | precision:0.93|recall:0.80 | precision:0.82|recall:0.82 |
-
VarCA achieves substantially better performance than any individual method and its recalibrated quality scores can be used to filter for high confidence variants.
-
Application of VarCA to single-cell ATAC-seq datasets could potentially reveal the presence of somatic mutations that are present in only some subsets of cells.
VarCA的局限性:
(1)只适用于双端测序的数据集。
(2)只能对SNVs、indels突变类型进行识别。
操作步骤
1、bulk ATAC-seq/Single cell ATAC-seq 数据处理步骤
bulk ATAC-seq | sc ATAC-seq | |
---|---|---|
Step1 | BWA-MEM:将双端的reads与参考基因组比对 | 10x Single Cell ATAC pipeline:比对、聚类 |
Step2 | samtools:过滤reads | samtools、pysam:过滤reads |
Step3 | MACS2:识别峰值 | MACS2:识别峰值 |
Step4 | VarCA:variants detection | VarCA:variants detection |
2、VarCA的简介
https://github.com/aryarm/varCA
- prepare subworkflow
- runs multiple variant callers on aligned ATAC-seq reads
- gathers the output of these callers together into a single dataset in variant call format (VCF)
- classify subworkflow
- uses the output from the prepare subworkflow to predict variants within ATAC-seq peaks
- outputs a new VCF file containing the predictions