gatk过滤_vcf文件过滤

本文介绍了GATK在处理高覆盖率样本时的变异调用问题,推荐使用HaplotypeCaller代替UnifiedGenotyper。针对VCF文件过滤,提供了建议的SNP和indel过滤参数,如QD、MQ、FS等,并提到了GATK v3.7版本的变化及Indel的检测范围。同时,给出了实际应用的GATK参数设置示例,如`-stand_call_conf 30 -mbq 20 --minPruning 2 -nct 10`。
摘要由CSDN通过智能技术生成

1:参考文献:

Li H. Towards better understanding of artifacts in variant calling

from high-coverage samples[J]. Bioinformatics, 2014:

btu356.

2:针对GATK的call SNP有UnifiedGenotyper与HaplotypeCaller。现在基本上HaplotypeCaller可以取代UnifiedGenotyper。原因截取如下:

The HaplotypeCaller is a more

recent and sophisticated tool than the UnifiedGenotyper. Its

ability to call SNPs is equivalent to that of the UnifiedGenotyper,

its ability to call indels is far superior, and it is now capable

of calling non-diploid samples. It also comprises several unique

functionalities such as the reference confidence model (which

enables efficient and incremental variant discovery on ridiculously

large cohorts) and special settings for RNAseq

data.

As of GATK version 3.3, we recommend using HaplotypeCaller

in all cases, with no exceptions.(摘自GATK官方回复)

3:对于vcf文件过滤的建议参数:https://software.broadinstitute.org/gatk/guide/article?id=3225,以下这些过滤参数的设置主要是在无法使用VQSR的时候可以使用如下参数:

For SNPs:

QD < 2.0

MQ < 40.0

FS > 60.0

SOR > 3.0

MQRankSum < -12.5

ReadPosRankSum < -8.0

If your callset was generated with UnifiedGenotyper for legacy

reasons, you can add HaplotypeScore

> 13.0.

--clusterWindowSize 5 --clusterSize

2另外还加上这两个参数,如果某个地方密集出现SNP可能是缺失或者插入。

For indels:

QD < 2.0

ReadPosRankSum < -20.0

InbreedingCoeff < -0.8

FS > 200.0

SOR > 10.0

4:在参考文献GATK中call snp使用的参数有:-stand_call_conf 30

-stand_emit_conf 10,现在stand_emit_conf这个参数在我使用的GATKv3.7已经不存在

另外建议添加:

-minPruning  Minimum support to not prune paths in the

graph

-mbq Minimum base quality required

to consider a base for calling

-nct Number of CPU threads

to allocate per data thread

“-stand_call_conf 30 -mbq 20 --minPruning 2 -nct

10”这是我用的参数

5:另外Indel的范围一般是指:50bp,关于参考文献可以查看:

Tattini L, D’Aurizio R, Magi A. Detection of genomic structural

variants from next-generation sequencing data[J]. Frontiers in

bioengineering and biotechnology, 2015, 3: 92.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值