RTG-tools的安装与子工具vcfeval的使用

1.RTG-tools是处理基因组数据常用的一个工具集,其中的vcfeval工具用于比对vcf标准集和普通vcf文件,找到普通vcf文件中真阳性、假阳性、假阴性的变异位点。

2.安装软件。

检查rtg-tools的依赖软件版本。

Java 1.8 or later
apache ant 1.9 or later

3.通过conda安装软件。

conda create -c bioconda rtg-tools

4.rtg-tools的使用说明。

rtg vcfeval -h
Usage: rtg vcfeval [OPTION]... -b FILE -c FILE -o DIR -t SDF

Evaluates called variants for genotype agreement with a baseline variant set irrespective of representational differences.
Outputs a weighted ROC file which can be viewed with rtg rocplot and VCF files containing false positives (called variants not
matched in the baseline), false negatives (baseline variants not matched in the call set), and true positives (variants that
match between the baseline and calls).

File Input/Output
  -b, --baseline=FILE           VCF file containing baseline variants
      --bed-regions=FILE        if set, only read VCF records that overlap the ranges contained in the specified BED file
  -c, --calls=FILE              VCF file containing called variants
  -e, --evaluation-regions=FILE if set, evaluate within regions contained in the supplied BED file, allowing transborder
                                matches. To be used for truth-set high-confidence regions or other regions of interest where
                                region boundary effects should be minimized
  -o, --output=DIR              directory for output
      --region=REGION           if set, only read VCF records within the specified range. The format is one of <sequence_name>,
                                <sequence_name>:<start>-<end>, <sequence_name>:<pos>+<length> or <sequence_name>:<pos>~<padding>
  -t, --template=SDF            SDF of the reference genome the variants are called against

Filtering
      --all-records             use all records regardless of FILTER status (Default is to only process records where FILTER is
                                "." or "PASS")
      --decompose               decompose complex variants into smaller constituents to allow partial credit
      --ref-overlap             allow alleles to overlap where bases of either allele are same-as-ref (Default is to only allow
                                VCF anchor base overlap)
      --sample=STRING           the name of the sample to select. Use <baseline_sample>,<calls_sample> to select different
                                sample names for baseline and calls. (Required when using multi-sample VCF files)
      --squash-ploidy           treat heterozygous genotypes as homozygous ALT in both baseline and calls, to allow matches that
                                ignore zygosity differences

Reporting
  -m, --output-mode=STRING      output reporting mode. Allowed values are [split, annotate, combine, ga4gh, roc-only] (Default
                                is split)
  -O, --sort-order=STRING       the order in which to sort the ROC scores so that "good" scores come before "bad" scores.
                                Allowed values are [ascending, descending] (Default is descending)
  -f, --vcf-score-field=STRING  the name of the VCF FORMAT field to use as the ROC score. Also valid are "QUAL", "INFO.<name>"
                                or "FORMAT.<name>" to select the named VCF FORMAT or INFO field (Default is GQ)

Utility
  -h, --help                    print help on command-line flag usage
  -Z, --no-gzip                 do not gzip the output
  -T, --threads=INT             number of threads (Default is the number of available cores)

5.其中vcf文件需要用bgzip压缩。

-t --template 检测变异所用的参考基因组文件,为SDF格式,是对参考基因组做了一些预处理后产生的文件。若是fasta文件,可通过format命令生成相应的sdf目录文件。

6.例如

先用参考基因组生成sdf文件:

rtg format -o hg19.sdf ucsc.hg19.fasta

对每个vcf文件都要使用bgzip压缩的文件格式以及相应的索引,若没有压缩,需先使用htslib中的bgzip进行压缩,然后使用tabix进行索引。

# 安装htslib
$ conda install htslib
# 压缩vcf文件
$ bgzip -c vcf.file > vcf.file.gz
# 对压缩文件进行索引
$ tabix -p vcf vcf.file.gz

所有文件准备好后可以使用vcfeval进行比较了。

rtg vcfeval -b ~/raw_vcf/ref/HG002_hg19_GIAB_highconf.ccds.vcf.gz -c ~/raw_vcf/HG002_SNP.vcf.gz -o output -t ../hg19.sdf

跑完之后查看输出文件中的summary.txt

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值