gatk数据库下载
使用路径:https://software.broadinstitute.org/gatk/download/bundle
数据库下载后,
hg19的vcf为gz结尾压缩格式,idx索引后缀。
hg38的vcf为gz结尾压缩格式,tbi索引。
运行命令
使用数据库下载后的vcf文件,直接用户跑命令。发现报错,说没有读到index索引。
/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/dbsnp_138.hg19.vcf.gz -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table >> /opt/script/pipeline/thalaflow/script/run_call_snp.log 2>&1
解决方法
1. 对vcf进行解压
开始的时候,使用tar -zxf
进行解压,但没有效果。
成功解压后,可以直接使用cat进行查看,不再是二进制文件。
gunzip 1000G_phase1.indels.hg19.sites.vcf.gz
2. 对index文件进行解压
gunzip 1000G_phase1.indels.hg19.sites.vcf.idx.gz
3. 测试
/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/temp/1000G_phase1.indels.hg19.sites.vcf -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table
经测试,成功输出
4. 批量解压
ls *.gz|xargs gunzip