三代基因组全基因组的变异检测

本例使用三代组装的玉米NAM群体为例。通过AnchorWave进行全基因组比对,通过gatk进行变异检测。通过tassel进行格式转换。

1.使用AnchorWave进行双序列比对

基因组数据下载链接https://download.maizegdb.org/。NAM群体一共有26个基因组。

anchorwave genoAli -i Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 -as cds.fa -r B73.ref.fa -a B97.sam -ar B73.sam -s Zm-B97-REFERENCE-NAM-1.0.fa -n B97.anchors -o B97.maf -f B97.f.maf -w 38000 -fa3 200000 -B -6 -O1 -8 -E1 -2 -O2 -75 -E2 -1 -IV  >B97.log 2>&1

2.1使用tassel将上步产生的MAF文件转化为GVCF

#download tassel and reference genome.
git clone https://bitbucket.org/tasseladmin/tassel-5-standalone.git 
wget https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0.fa.gz
gunzip Zm-B73-REFERENCE-NAM-5.0.fa.gz

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/B97/B97.maf -sampleName B97_anchorwave -gvcfOutput /home/xuql/copyNAM/B97/B97ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/B97/B97_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML103/CML103.maf -sampleName CML103_anchorwave -gvcfOutput /home/xuql/copyNAM/CML103/CML103ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML103/CML103_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML228/CML228.maf -sampleName CML228_anchorwave -gvcfOutput /home/xuql/copyNAM/CML228/CML228ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML228/CML228_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML247/CML247.maf -sampleName CML247_anchorwave -gvcfOutput /home/xuql/copyNAM/CML247/CML247ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML247/CML247_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML277/CML277.maf -sampleName CML277_anchorwave -gvcfOutput /home/xuql/copyNAM/CML277/CML277ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML277/CML277_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML322/CML322.maf -sampleName CML322_anchorwave -gvcfOutput /home/xuql/copyNAM/CML322/CML322ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML322/CML322_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML333/CML333.maf -sampleName CML333_anchorwave -gvcfOutput /home/xuql/copyNAM/CML333/CML333ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML333/CML333_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML52/CML52.maf -sampleName CML52_anchorwave -gvcfOutput /home/xuql/copyNAM/CML52/CML52ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML52/CML52_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/CML69/CML69.maf -sampleName CML69_anchorwave -gvcfOutput /home/xuql/copyNAM/CML69/CML69ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/CML69/CML69_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/HP301/HP301.maf -sampleName HP301_anchorwave -gvcfOutput /home/xuql/copyNAM/HP301/HP301ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/HP301/HP301_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Il14H/Il14H.maf -sampleName Il14H_anchorwave -gvcfOutput /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Il14H/Il14H_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Ki11/Ki11.maf -sampleName Ki11_anchorwave -gvcfOutput /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Ki11/Ki11_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Ki3/Ki3.maf -sampleName Ki3_anchorwave -gvcfOutput /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Ki3/Ki3_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Ky21/Ky21.maf -sampleName Ky21_anchorwave -gvcfOutput /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Ky21/Ky21_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/M162W/M162W.maf -sampleName M162W_anchorwave -gvcfOutput /home/xuql/copyNAM/M162W/M162WToB73.gvcf -fillGaps false > /home/xuql/copyNAM/M162W/M162W_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/M37W/M37W.maf -sampleName M37W_anchorwave -gvcfOutput /home/xuql/copyNAM/M37W/M37WToB73.gvcf -fillGaps false > /home/xuql/copyNAM/M37W/M37W_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Mo18W/Mo18W.maf -sampleName Mo18W_anchorwave -gvcfOutput /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Mo18W/Mo18W_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Ms71/Ms71.maf -sampleName Ms71_anchorwave -gvcfOutput /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Ms71/Ms71_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/NC350/NC350.maf -sampleName NC350_anchorwave -gvcfOutput /home/xuql/copyNAM/NC350/NC350ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/NC350/NC350_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/NC358/NC358.maf -sampleName NC358_anchorwave -gvcfOutput /home/xuql/copyNAM/NC358/NC358ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/NC358/NC358_outputMafToGVCF.txt

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Oh43/Oh43.maf -sampleName Oh43_anchorwave -gvcfOutput /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Oh43/Oh43_outputMafToGVCF.txt 

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Oh7B/Oh7B.maf -sampleName Oh7B_anchorwave -gvcfOutput /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Oh7B/Oh7B_outputMafToGVCF.txt 

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/P39/P39.maf -sampleName P39_anchorwave -gvcfOutput /home/xuql/copyNAM/P39/P39ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/P39/P39_outputMafToGVCF.txt 

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Tx303/Tx303.maf -sampleName Tx303_anchorwave -gvcfOutput /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Tx303/Tx303_outputMafToGVCF.txt 

/home/xuql/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -MAFToGVCFPlugin -referenceFasta /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -mafFile /home/xuql/copyNAM/Tzi8/Tzi8.maf -sampleName Tzi8_anchorwave -gvcfOutput /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf -fillGaps false > /home/xuql/copyNAM/Tzi8/Tzi8_outputMafToGVCF.txt 

2.2对gvcf文件进行压缩和建立索引

对参考基因组,1)更改染色体名 ,和比对基因组的一样2)建立fai索引3)建立dict索引

sed -i 's/chr//g' Zm-B73-REFERENCE-NAM-5.0.fa 
samtools faidx Zm-B73-REFERENCE-NAM-5.0.fa
wget https://github.com/broadinstitute/picard/releases/download/2.26.10/picard.jar
java -jar picard.jar CreateSequenceDictionary R=Zm-B73-REFERENCE-NAM-5.0.fa O=Zm-B73-REFERENCE-NAM-5.0.dict

对gvcf文件:压缩和建立索引,“GATK GenomicsDBImport” 需要这一步。

bgzip -c  /home/xuql/copyNAM/B97/B97ToB73.gvcf > /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML103/CML103ToB73.gvcf > /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML228/CML228ToB73.gvcf > /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz
bgzip -c  /home/xuql/copyNAM/CML247/CML247ToB73.gvcf > /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz
bgzip -c  /home/xuql/copyNAM/CML277/CML277ToB73.gvcf > /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML322/CML322ToB73.gvcf > /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML333/CML333ToB73.gvcf > /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML52/CML52ToB73.gvcf > /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/CML69/CML69ToB73.gvcf > /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz
bgzip -c  /home/xuql/copyNAM/HP301/HP301ToB73.gvcf > /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf > /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf > /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf > /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf > /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/M162W/M162WToB73.gvcf > /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/M37W/M37WToB73.gvcf > /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf > /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf > /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/NC350/NC350ToB73.gvcf > /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/NC358/NC358ToB73.gvcf > /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf > /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf > /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/P39/P39ToB73.gvcf > /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf > /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz 
bgzip -c  /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf > /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz

tabix -p vcf /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz 

3.变异检测

3.1由于本步中gatk无法识别大于10M的变异,所以我们通过一下步骤找到>10M的变异并把它删除掉。

#for indel how to sort by python and the part results is shown.
with open('./job.407755.err') as f:
    lines = f.readlines()
new_lines=[]
for line in lines:
    if "Set" in line:
        new_lines.append(line)
        
sorted(new_lines, 
       key=lambda x: int(x.strip().split('=')[-1]),
       reverse=True)

#thoese content output from all chromosome by "gatk LeftAlignAndTrimVariants" and then sorted by python. at last,check the initial output content to identify the super-indel belong to individuals.
['10:56:14.335 INFO  LeftAlignAndTrimVariants - Indel is too long (34461688) at position 9:3695105; skipping that record. Set --max-indel-length >= 34461688\n',
 '10:56:30.429 INFO  LeftAlignAndTrimVariants - Indel is too long (10668738) at position 10:33212598; skipping that record. Set --max-indel-length >= 10668738\n',
 '10:56:28.937 INFO  LeftAlignAndTrimVariants - Indel is too long (9101264) at position 10:14179; skipping that record. Set --max-indel-length >= 9101264\n',
 '10:56:30.038 INFO  LeftAlignAndTrimVariants - Indel is too long (7918835) at position 10:22996027; skipping that record. Set --max-indel-length >= 7918835\n',
 '11:31:49.968 INFO  LeftAlignAndTrimVariants - Indel is too long (7154442) at position 6:16715313; skipping that record. Set --max-indel-length >= 7154442\n',

#acquire the line number of super-indel and delete it.
less Oh7BToB73.gvcf | awk '$2=="3695105"{printf("%5d\t%s\n", NR, $0)}' >aaa.txt 
less Oh7BToB73.gvcf | sed '32542168d' > aOh7BToB73.gvcf 
# compress and index for "gatk GenomicsDBImport"
bgzip -c  /home/xuql/copyNAM/Oh7B/aOh7BToB73.gvcf > /home/xuql/copyNAM/Oh7B/aOh7BToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Oh7B/aOh7BToB73.gvcf.gz

less Oh7BToB73.gvcf | awk '$2=="33212598"{printf("%5d\t%s\n", NR, $0)}' >bbb.txt
less Oh7BToB73.gvcf | sed '35176648d' > bOh7BToB73.gvcf
bgzip -c  /home/xuql/copyNAM/Oh7B/bOh7BToB73.gvcf > /home/xuql/copyNAM/Oh7B/bOh7BToB73.gvcf.gz 
tabix -p vcf /home/xuql/copyNAM/Oh7B/bOh7BToB73.gvcf.gz

3.2建立GenomicsDBImport 

根据实际情况使用内存,对每条染色体都运行这一步,一共10个染色体。

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 5 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk1 \
      --genomicsdb-segment-size 1048576000 --genomicsdb-vcf-buffer-size 10000000000 -L 1
      #Elapsed time: 105.02 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk2 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 2
        #Elapsed time: 76.68 minutes.  Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk3 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 3
        #Elapsed time: 70.12 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk4 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 4
        #Elapsed time: 68.47 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk5 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 5
        # Elapsed time: 64.29 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk6 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 6
        #Elapsed time: 53.32 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk7 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 7
        #Elapsed time: 57.26 minutes.  Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx256g -Xms256g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk8 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 8
      #Elapsed time: 52.72 minutes. Runtime.totalMemory()=274877906944

gatk --java-options "-Xmx128g -Xms5g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/aOh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk9 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 9
      #Elapsed time: 54.09 minutes.Runtime.totalMemory()=5368709120

gatk --java-options "-Xmx128g -Xms5g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/bOh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk10 \
      --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 100000000 -L 10  
        #Elapsed time: 44.05 minutes.Runtime.totalMemory()=5368709120

 3.3变异检测

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk1 -O /home/xuql/copyNAM/gatk1.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Processed 251973478 total variants in 581.3 minutes. Elapsed time: 581.36 minutes. Runtime.totalMemory()=6694109184

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk2 -O /home/xuql/copyNAM/gatk2.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
# Processed 202340450 total variants in 471.8 minutes.Elapsed time: 472.05 minutes.Runtime.totalMemory()=2147483648

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk3 -O /home/xuql/copyNAM/gatk3.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
# Processed 188148248 total variants in 444.9 minutes.Elapsed time: 445.13 minutes.Runtime.totalMemory()=2147483648

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk4 -O /home/xuql/copyNAM/gatk4.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Processed 188872000 total variants in 432.5 minutes. Elapsed time: 432.67 minutes.Runtime.totalMemory()=2147483648

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk5 -O /home/xuql/copyNAM/gatk5.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Elapsed time: 434.09 minutes.Runtime.totalMemory()=2147483648
 
gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk6 -O /home/xuql/copyNAM/gatk6.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
# Processed 148426585 total variants in 338.4 minutes.Elapsed time: 338.57 minutes. Runtime.totalMemory()=7088373760

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk7 -O /home/xuql/copyNAM/gatk7.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Processed 148023151 total variants in 369.9 minutes.Elapsed time: 370.07 minutes. Runtime.totalMemory()=2147483648

gatk --java-options "-Xmx50g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk8 -O /home/xuql/copyNAM/gatk8.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Processed 147858803 total variants in 355.4 minutes.Elapsed time: 355.56 minutes.Runtime.totalMemory()=7214202880

#to make sure "GenotypeGVCFs" can implement we delete the indels more than 10M that only included in cheomosome 9 and chromosome 10.

gatk --java-options "-Xmx100g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk9 -O /home/xuql/copyNAM/gatk9.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Elapsed time: 320.96 minutes.Runtime.totalMemory()=3724541952

gatk --java-options "-Xmx200g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk10 -O /home/xuql/copyNAM/gatk10.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000
#Elapsed time: 291.33 minutes.Runtime.totalMemory()=7079985152

3.4变异标准化

左对齐,修正由于比对策略导致的偏差

/home/ywt/vt/vt normalize gatk1.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk1bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk1bcfvt.vcf.gz -Oz -o gatk1bcftools.vcf.gz 
bcftools index -t gatk1bcfvt.vcf.gz
bcftools index -t gatk1bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk2.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk2bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk2bcfvt.vcf.gz -Oz -o gatk2bcftools.vcf.gz 
bcftools index -t gatk2bcfvt.vcf.gz
bcftools index -t gatk2bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk3.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk3bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk3bcfvt.vcf.gz -Oz -o gatk3bcftools.vcf.gz 
bcftools index -t gatk3bcfvt.vcf.gz
bcftools index -t gatk3bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk4.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk4bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk4bcfvt.vcf.gz -Oz -o gatk4bcftools.vcf.gz 
bcftools index -t gatk4bcfvt.vcf.gz
bcftools index -t gatk4bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk5.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk5bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk5bcfvt.vcf.gz -Oz -o gatk5bcftools.vcf.gz 
bcftools index -t gatk5bcfvt.vcf.gz
bcftools index -t gatk5bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk6.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk6bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk6bcfvt.vcf.gz -Oz -o gatk6bcftools.vcf.gz 
bcftools index -t gatk6bcfvt.vcf.gz
bcftools index -t gatk6bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk7.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk7bcfvt.vcf.gz 
bcftools norm  -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk7bcfvt.vcf.gz -Oz -o gatk7bcftools.vcf.gz 
bcftools index -t gatk7bcfvt.vcf.gz
bcftools index -t gatk7bcftools.vcf.gz

bcftools index -t gatk8.vcf.gz
/home/ywt/vt/vt normalize gatk8.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk8bcfvt.vcf.gz 
bcftools norm -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk8bcfvt.vcf.gz -Oz -o gatk8bcftools.vcf.gz
bcftools index -t gatk8bcfvt.vcf.gz
bcftools index -t gatk8bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk9.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk9bcfvt.vcf.gz 
bcftools norm -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk9bcfvt.vcf.gz -Oz -o gatk9bcftools.vcf.gz
bcftools index -t gatk9bcfvt.vcf.gz
bcftools index -t gatk9bcftools.vcf.gz

/home/ywt/vt/vt normalize gatk10.vcf.gz -r /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -o gatk10bcfvt.vcf.gz 
bcftools norm -f /media/ywt/14T1/NAManchorwave/B73/Zm-B73-REFERENCE-NAM-5.0.fa -m +both gatk10bcfvt.vcf.gz -Oz -o gatk10bcftools.vcf.gz
bcftools index -t gatk10bcfvt.vcf.gz
bcftools index -t gatk10bcftools.vcf.gz

4使用IGV可视化结果

sed -r -i 's/[0-9]+H//g' B97.sam &
samtools view -O CRAM --threads 80 --reference Zm-B73-REFERENCE-NAM-5.0.fa B97.sam | samtools sort --threads 30 -O CRAM - > B97.cram
samtools index B97.cram

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值