haploid genome

http://seqanswers.com/forums/archive/index.php/t-10182.html

To my knowledge, FreeBayes is significantly different than other variant detection systems in common use in that it is not limited to the analysis of haploid or diploid individuals

The GATK can be used to call the sex (X and Y) chromosomes, without explicit knowledge of the gender of the samples. In an ideal world, with perfect upfront data processing, we would get perfect genotypes on the sex chromosomes without knowledge of who is diploid on X and has no Y, and who is hemizygous on both. However, misalignment and mismapping contributes especially to these chromosomes, as their reference sequence is clearly of lower quality than the autosomal regions of the genome. Nevertheless, it is possible to get reasonably good SNP calls, even with simple data processing and basic filtering. Results with proper, full data processing as per the best practices in the GATK should lead to very good calls. You can view a presentation "The GATK Unified Genotyper on chrX and chrY" in GSA Public Drop Box

 

What I ended up doing was using GATK's UnifiedGenotyper, manually extracting the likelihoods for both of the homozygote genotypes, and calling a SNP if the likelihood of the alternative allele was above a certain amount higher than the likelihood of the reference allele (I believe I required the likelihood of the alt allele to be at least 3X greater than the ref allele, although I haven't tested extensively to find the best threshold).

I have used FreeBayes on haploid sequences with good results; it is recommended.

http://www.broadinstitute.org/gsa/wiki/index.php/Understanding_the_Unified_Genotyper%27s_VCF_files

 

git clone --recursive git://github.com/ekg/freebayes.git

 

 

 

Field Meaning
GT The genotype of this sample. For a diploid, the GT field indicates the two alleles carried by the sample, encoded by a 0 for the REF allele, 1 for the first ALT allele, 2 for the second ALT allele, etc. When there's a single ALT allele (the by far more common case), GT will be either:
  • 0/0 - the sample is homozygous reference
  • 0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
  • 1/1 - the sample is homozygous alternate

In the three examples above, NA12878 is T/G, G/G, and C/T.

GQ The Genotype Quality, as a Phred-scaled confidence at the true genotype is the one provided in GT. In diploid case, if GT is 0/1, then GQ is really L(0/1) / (L(0/0) + L(0/1) + L(1/1)), where L is the likelihood of the NGS sequencing data under the model of that the sample is 0/0, 0/1/, or 1/1.
AD and DP See the online documentation for AD and DP .
PL We provide the AD and DP fields since this is usually what downstream users want. However, the truly sophisticated users will want to directly use the likelihoods of the three genotypes 0/0, 0/1, and 1/1 provide in the PL field. These are normalized, Phred-scaled likelihoods for each of the 0/0, 0/1, and 1/1, without priors. To be concrete, for the het case, this is L(data given that the true genotype is 0/1). The most likely genotype (the one in GT) is scaled so that it's P = 1.0 (0 when Phred-scaled), and the other likelihoods reflect their Phred-scaled likelihoods relative to this most likely genotype. Currently only provided when the site is biallelic.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值