Hard-filter阈值探究
GATK4官网给出的推荐阈值:For SNPs:
QD
MQ
FS > 60.0
SOR > 3.0
MQRankSum
ReadPosRankSum
For indels:
QD
ReadPosRankSum
InbreedingCoeff
FS > 200.0
SOR > 10.0
查看GATK4原始网页:https://software.broadinstitute.org/gatk/documentation/article?id=11097该阈值选择来自于GATK4官网的推荐,阈值依据于比较真 vs. 假 snp的特征值(annotation values)统计分布
One of the most helpful ways to approach hard-filtering is to visualize the distribution of annotation values for a truth set called using a particular pipeline. These distributions are sharped by both the pipeline methodology and the underlying physical properties of the sequence data; so for a given pairing of data generation technology + analysis pipeline, you can derive filtering thresholds based on what the distributions look like for the truth set
评估数据来源:1000Genomes 中的 whole genome trio