0088-【生物软件】-GATK4如何使用idx和tbi索引

最新推荐文章于 2023-08-07 16:19:02 发布

leadingsci

最新推荐文章于 2023-08-07 16:19:02 发布

阅读量6.3k

点赞数 1

分类专栏：【生信软件】

本文链接：https://blog.csdn.net/leadingsci/article/details/83622881

版权

【生信软件】专栏收录该内容

5 篇文章 4 订阅

订阅专栏

gatk数据库下载

使用路径：https://software.broadinstitute.org/gatk/download/bundle

数据库下载后，
hg19的vcf为gz结尾压缩格式，idx索引后缀。
hg38的vcf为gz结尾压缩格式，tbi索引。

运行命令

使用数据库下载后的vcf文件，直接用户跑命令。发现报错，说没有读到index索引。

/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/dbsnp_138.hg19.vcf.gz -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table >> /opt/script/pipeline/thalaflow/script/run_call_snp.log 2>&1

解决方法

1. 对vcf进行解压

开始的时候，使用tar -zxf进行解压，但没有效果。

成功解压后，可以直接使用cat进行查看，不再是二进制文件。

gunzip 1000G_phase1.indels.hg19.sites.vcf.gz

2. 对index文件进行解压

gunzip 1000G_phase1.indels.hg19.sites.vcf.idx.gz

3. 测试

/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/temp/1000G_phase1.indels.hg19.sites.vcf -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table

经测试，成功输出