基因数据处理78之从vcf使用不同的方法读取结果不一样

1.方法1和2:

val path2 = "hdfs://219.219.220.149:9000/xubo/callVariant/vcf/smallAnno2Adam.vcf"
val anno2adam = sc.loadParquetVariantAnnotations(path2)
println("anno2adam:")
anno2adam.foreach(println)

val annotations: RDD[DatabaseVariantAnnotation] = sc.loadVcfAnnotations(path)
println("annotations:")
annotations.foreach(println)

(2)结果:

anno2adam:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14396, "end": 14400, "referenceAllele": "CTGT", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 195, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14521, "end": 14522, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 1186, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 19189, "end": 19191, "referenceAllele": "GC", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 2994, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 63734, "end": 63738, "referenceAllele": "CCTA", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 2486, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 752720, "end": 752721, "referenceAllele": "A", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
annotations:
{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14396, "end": 14400, "referenceAllele": "CTGT", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 195, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14521, "end": 14522, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 1186, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 19189, "end": 19191, "referenceAllele": "GC", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 2994, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 63734, "end": 63738, "referenceAllele": "CCTA", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}
{"variant": {"variantErrorProbability": 2486, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 752720, "end": 752721, "referenceAllele": "A", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "dbSnpId": null, "geneSymbol": null, "omimId": null, "cosmicId": null, "clinvarId": null, "clinicalSignificance": null, "gerpNr": null, "gerpRs": null, "phylop": null, "ancestralAllele": null, "thousandGenomesAlleleCount": null, "thousandGenomesAlleleFrequency": null, "siftScore": null, "siftScoreConverted": null, "siftPred": null, "mutationTasterScore": null, "mutationTasterScoreConverted": null, "mutationTasterPred": null}

2.方法3: 比方法1和2多了两倍,vcf中有三组样本
(1)代码: in package org.bdgenomics.adam.rdd class ADAMContextSuite extends ADAMFunSuite

  sparkTest("can read a small .vcf file") {
    val path = resourcePath("small.vcf")

    val vcs = sc.loadGenotypes(path).toVariantContext.collect.sortBy(_.position)
    assert(vcs.size === 5)

    val vc = vcs.head
    assert(vc.genotypes.size === 3)

    val gt = vc.genotypes.head
    assert(gt.getVariantCallingAnnotations != null)
    assert(gt.getReadDepth === 20)

    /** ****************add by xubo 20160608 ***********************/
    println("vcs.head:")
    println(vcs.head.genotypes.size)
    println(vcs.head.databases.size)
    vcs.foreach { each =>
      println("position:"+each.position)
      println("position:"+each.variant.variant)
      each.genotypes.foreach(println)
      println("database")
      each.databases.foreach(println)
    }
    println("loadGenotypes:")
    sc.loadGenotypes(path).foreach(println)

    /** ****************add by xubo 20160608 ***********************/
  }

(2) 结果:

loadGenotypes:
{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14396, "end": 14400, "referenceAllele": "CTGT", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["IndelQD"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 26.84, "mapq0Reads": 0, "mqRankSum": -1.906, "readPositionRankSum": 0.384, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12878", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 16, "alternateReadDepth": 4, "readDepth": 20, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [-9.999779E-13, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14396, "end": 14400, "referenceAllele": "CTGT", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["IndelQD"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 26.84, "mapq0Reads": 0, "mqRankSum": -1.906, "readPositionRankSum": 0.384, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12891", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 8, "alternateReadDepth": 2, "readDepth": 10, "minReadDepth": null, "genotypeQuality": 60, "genotypeLikelihoods": [-1.0000005E-6, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14396, "end": 14400, "referenceAllele": "CTGT", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["IndelQD"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 26.84, "mapq0Reads": 0, "mqRankSum": -1.906, "readPositionRankSum": 0.384, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12892", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Ref"], "expectedAlleleDosage": null, "referenceReadDepth": 39, "alternateReadDepth": 0, "readDepth": 39, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": ["-Infinity", -2.5118796E-12, 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 195, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14521, "end": 14522, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["VQSRTrancheSNP99.95to100.00"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 25.89, "mapq0Reads": 0, "mqRankSum": -0.063, "readPositionRankSum": 0.952, "genotypePriors": [], "genotypePosteriors": [], "vqslod": -3.333, "culprit": "MQ", "attributes": {}}, "sampleId": "NA12878", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 10, "alternateReadDepth": 5, "readDepth": 15, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [-1.2589252E-10, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 195, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14521, "end": 14522, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["VQSRTrancheSNP99.95to100.00"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 25.89, "mapq0Reads": 0, "mqRankSum": -0.063, "readPositionRankSum": 0.952, "genotypePriors": [], "genotypePosteriors": [], "vqslod": -3.333, "culprit": "MQ", "attributes": {}}, "sampleId": "NA12891", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 2, "alternateReadDepth": 5, "readDepth": 7, "minReadDepth": null, "genotypeQuality": 34, "genotypeLikelihoods": [-1.5853985E-13, "-Infinity", -3.9818644E-4], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 195, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 14521, "end": 14522, "referenceAllele": "G", "alternateAllele": "A", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": false, "variantFilters": ["VQSRTrancheSNP99.95to100.00"], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 25.89, "mapq0Reads": 0, "mqRankSum": -0.063, "readPositionRankSum": 0.952, "genotypePriors": [], "genotypePosteriors": [], "vqslod": -3.333, "culprit": "MQ", "attributes": {}}, "sampleId": "NA12892", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Ref"], "expectedAlleleDosage": null, "referenceReadDepth": 26, "alternateReadDepth": 0, "readDepth": 26, "minReadDepth": null, "genotypeQuality": 78, "genotypeLikelihoods": ["-Infinity", -1.5848933E-8, 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 1186, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 19189, "end": 19191, "referenceAllele": "GC", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 22.26, "mapq0Reads": 0, "mqRankSum": 0.195, "readPositionRankSum": -4.072, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12878", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 8, "alternateReadDepth": 14, "readDepth": 22, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [0.0, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 1186, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 19189, "end": 19191, "referenceAllele": "GC", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 22.26, "mapq0Reads": 0, "mqRankSum": 0.195, "readPositionRankSum": -4.072, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12891", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 18, "alternateReadDepth": 13, "readDepth": 31, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [0.0, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 1186, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 19189, "end": 19191, "referenceAllele": "GC", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 22.26, "mapq0Reads": 0, "mqRankSum": 0.195, "readPositionRankSum": -4.072, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12892", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 5, "alternateReadDepth": 15, "readDepth": 20, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [0.0, "-Infinity", -1.9952595E-11], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2994, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 63734, "end": 63738, "referenceAllele": "CCTA", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 31.06, "mapq0Reads": 0, "mqRankSum": 0.636, "readPositionRankSum": -1.18, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12878", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Ref"], "expectedAlleleDosage": null, "referenceReadDepth": 27, "alternateReadDepth": 0, "readDepth": 27, "minReadDepth": null, "genotypeQuality": 79, "genotypeLikelihoods": ["-Infinity", -1.2589254E-8, 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2994, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 63734, "end": 63738, "referenceAllele": "CCTA", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 31.06, "mapq0Reads": 0, "mqRankSum": 0.636, "readPositionRankSum": -1.18, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12891", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Ref"], "expectedAlleleDosage": null, "referenceReadDepth": 40, "alternateReadDepth": 0, "readDepth": 40, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": ["-Infinity", -1.9952928E-12, 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2994, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 63734, "end": 63738, "referenceAllele": "CCTA", "alternateAllele": "C", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 31.06, "mapq0Reads": 0, "mqRankSum": 0.636, "readPositionRankSum": -1.18, "genotypePriors": [], "genotypePosteriors": [], "vqslod": null, "culprit": null, "attributes": {}}, "sampleId": "NA12892", "sampleDescription": null, "processingDescription": null, "alleles": ["Ref", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 23, "alternateReadDepth": 74, "readDepth": 97, "minReadDepth": null, "genotypeQuality": 99, "genotypeLikelihoods": [0.0, "-Infinity", 0.0], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2486, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 752720, "end": 752721, "referenceAllele": "A", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 60.0, "mapq0Reads": 0, "mqRankSum": null, "readPositionRankSum": null, "genotypePriors": [], "genotypePosteriors": [], "vqslod": 18.94, "culprit": "QD", "attributes": {}}, "sampleId": "NA12878", "sampleDescription": null, "processingDescription": null, "alleles": ["Alt", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 0, "alternateReadDepth": 27, "readDepth": 27, "minReadDepth": null, "genotypeQuality": 81, "genotypeLikelihoods": [0.0, -7.9432825E-9, "-Infinity"], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2486, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 752720, "end": 752721, "referenceAllele": "A", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 60.0, "mapq0Reads": 0, "mqRankSum": null, "readPositionRankSum": null, "genotypePriors": [], "genotypePosteriors": [], "vqslod": 18.94, "culprit": "QD", "attributes": {}}, "sampleId": "NA12891", "sampleDescription": null, "processingDescription": null, "alleles": ["Alt", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 0, "alternateReadDepth": 19, "readDepth": 19, "minReadDepth": null, "genotypeQuality": 57, "genotypeLikelihoods": [0.0, -1.9952643E-6, "-Infinity"], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}
{"variant": {"variantErrorProbability": 2486, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "species": null, "referenceIndex": null}, "start": 752720, "end": 752721, "referenceAllele": "A", "alternateAllele": "G", "svAllele": null, "isSomatic": false}, "variantCallingAnnotations": {"variantIsPassing": true, "variantFilters": [], "downsampled": null, "baseQRankSum": null, "fisherStrandBiasPValue": null, "rmsMapQ": 60.0, "mapq0Reads": 0, "mqRankSum": null, "readPositionRankSum": null, "genotypePriors": [], "genotypePosteriors": [], "vqslod": 18.94, "culprit": "QD", "attributes": {}}, "sampleId": "NA12892", "sampleDescription": null, "processingDescription": null, "alleles": ["Alt", "Alt"], "expectedAlleleDosage": null, "referenceReadDepth": 0, "alternateReadDepth": 22, "readDepth": 22, "minReadDepth": null, "genotypeQuality": 66, "genotypeLikelihoods": [0.0, -2.5118868E-7, "-Infinity"], "nonReferenceLikelihoods": [], "strandBiasComponents": [], "splitFromMultiAllelic": false, "isPhased": false, "phaseSetId": null, "phaseQuality": null}

参考

【1】https://github.com/xubo245/AdamLearning
【2】https://github.com/bigdatagenomics/adam/ 
【3】https://github.com/xubo245/SparkLearning
【4】http://spark.apache.org
【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  
【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)
【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).
【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).
【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cn
Wechat: xu601450868
QQ: 601450868
发布了502 篇原创文章 · 获赞 77 · 访问量 118万+
展开阅读全文

请问plink对VCF格式的数据进行格式转化时,会出现文件后缀为hh,nof和nosex文件呢

07-21

我按照以下流程进行主成分分析: 1、利用vcftools软件进行格式转换:vcftools --vcf tmp.vcf --plink --out tmp --plink, 改变输出文件格式 此时会生成两个文件:tmp.ped(基因型数据) 和 tmp.map 2、利用plink软件进行数据格式转换:./plink --noweb --file tmp --make-bed --out tmp 注意,输入文件和输出文件都不需要文件名的后缀,此时生成3个文件:tmp.bed,tmp.bim 和 tmp.fam  3、利用gcta软件进行pca构建 gcta --bfile tmp --make-grm --autosome --out tmp 此时生成一个文件:tmp.grm.gz 但是在使用plink对VCF格式的数据进行格式转化时,会出现文件后缀为hh,nof和nosex文件![图片说明](https://img-ask.csdn.net/upload/201807/21/1532164668_8771.png) 在接下来利用gcta软件进行pca构建时,出现了以下错误: Skipping web check... [ --noweb ] Writing this text to log file [ sample_210JS.fltsnp.log ] Analysis started: Tue Jul 10 09:21:15 2018 Options in effect: --noweb --file /lustre/waterfowl_group/practice/geneic_evolution/data/ped_map/sample_210JS.fltsnp --make-bed --out sample_210JS.fltsnp ** For gPLINK compatibility, do not use '.' in --out ** 54599752 (of 54599752) markers to be included from [ /lustre/waterfowl_group/practice/geneic_evolution/data/ped_map/sample_210JS.fltsnp.map ] Warning, found 210 individuals with ambiguous sex codes Writing list of these individuals to [ sample_210JS.fltsnp.nosex ] 210 individuals read from [ /lustre/waterfowl_group/practice/geneic_evolution/data/ped_map/sample_210JS.fltsnp.ped ] 0 individuals with nonmissing phenotypes Assuming a disease phenotype (1=unaff, 2=aff, 0=miss) Missing phenotype value is also -9 0 cases, 0 controls and 210 missing 0 males, 0 females, and 210 of unspecified sex Before frequency and genotyping pruning, there are 54599752 SNPs 210 founders and 0 non-founders found 697384 heterozygous haploid genotypes; set to missing Writing list of heterozygous haploid genotypes to [ sample_210JS.fltsnp.hh ] 15 SNPs with no founder genotypes observed Warning, MAF set to 0 for these SNPs (see --nonfounders) Writing list of these SNPs to [ sample_210JS.fltsnp.nof ] Total genotyping rate in remaining individuals is nan 请问这是哪里出了问题呢?望大神解答 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览