实验记录 | mutect的安装与运行

参考链接:https://github.com/broadinstitute/mutect
http://gatkforums.broadinstitute.org/categories/mutect
在网上的时候,突然发现,可以使用gatk内置的mutect2直接进行mutation calling。
其参数列表如下(与作者使用的代码比较了一下,作者使用的是mutect1.1.7,且是直接调用mutect.jar来进行的。列表中的参数,也并没有作者使用的dbsnp等注释信息):

Using GATK jar /home/zxx/workplace/gatk/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zxx/workplace/gatk/gatk-package-4.2.0.0-local.jar Mutect2
USAGE: Mutect2 [arguments]
Call somatic SNVs and indels via local assembly of haplotypes
Version:4.2.0.0
Required Arguments:
–input,-I BAM/SAM/CRAM file containing reads This argument must be specified at least once.
Required.
–output,-O File to which variants should be written Required.
–reference,-R Reference sequence file Required.
Optional Arguments:
–add-output-sam-program-record,-add-output-sam-program-record
If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible
values: {true, false}
–add-output-vcf-command-line,-add-output-vcf-command-line
If true, adds a command line header line to created VCF files. Default value: true.
Possible values: {true, false}
–af-of-alleles-not-in-resource,-default-af
Population allele fraction assigned to alleles not found in germline resource. Please see
docs/mutect/mutect2.pdf fora derivation of the default value. Default value: -1.0.
–alleles The set of alleles to force-call regardless of evidence Default value: null.
–annotation,-A One or more specific annotations to add to variant calls This argument may be specified 0
or more times. Default value: null. Possible values: {AlleleFraction,
AS_BaseQualityRankSumTest, AS_FisherStrand, AS_InbreedingCoeff,
AS_MappingQualityRankSumTest, AS_QualByDepth, AS_ReadPosRankSumTest, AS_RMSMappingQuality,
AS_StrandBiasMutectAnnotation, AS_StrandOddsRatio, BaseQuality, BaseQualityHistogram,
BaseQualityRankSumTest, ChromosomeCounts, ClippingRankSumTest, CountNs, Coverage,
DepthPerAlleleBySample, DepthPerSampleHC, ExcessHet, FisherStrand, FragmentLength,
GenotypeSummaries, InbreedingCoeff, LikelihoodRankSumTest, MappingQuality,
MappingQualityRankSumTest, MappingQualityZero, OrientationBiasReadCounts,
OriginalAlignment, PossibleDeNovo, QualByDepth, ReadPosition, ReadPosRankSumTest,
ReferenceBases, RMSMappingQuality, SampleList, StrandBiasBySample, StrandOddsRatio,
TandemRepeat, UniqueAltReadCountPossible values: {
–annotation-group,-G One or more groups of annotations to apply to variant calls This argument may be
specified 0 or more times. Default value: null. Possible values:
{AlleleSpecificAnnotation, AS_StandardAnnotation, ReducibleAnnotation, StandardAnnotation,
StandardHCAnnotation, StandardMutectAnnotationPossible values: {
–annotations-to-exclude,-AX
One or more specific annotations to exclude from variant calls This argument may be
specified 0 or more times. Default value: null. Possible values:
{AS_StrandBiasMutectAnnotation, BaseQuality, Coverage, DepthPerAlleleBySample,
DepthPerSampleHC, FragmentLength, MappingQuality, OrientationBiasReadCounts, ReadPosition,
StrandBiasBySample, TandemRepeatPossible values: {
–arguments_file read one or more arguments files and add them to the command line This argument may be
specified 0 or more times. Default value: null.
–assembly-region-out Output the assembly region to this IGV formatted file Default value: null.
–assembly-region-padding
Number of additional bases of context to include around each assembly region Default
value: 100.
–base-quality-score-threshold
Base qualities below this threshold will be reduced to the minimum (6) Default value: 18.
–callable-depth Minimum depth to be considered callable for Mutect stats. Does not affect genotyping.
Default value: 10.
–cloud-index-prefetch-buffer,-CIPB
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
cloudPrefetchBuffer if unset. Default value: -1.
–cloud-prefetch-buffer,-CPB
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Default value: 40.
–create-output-bam-index,-OBI
If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. Default
value: true. Possible values: {true, false}
–create-output-bam-md5,-OBM
If true, create a MD5 digest for any BAM/SAM/CRAM file created Default value: false.
Possible values: {true, false}
–create-output-variant-index,-OVI
If true, create a VCF index when writing a coordinate-sorted VCF file. Default value:
true. Possible values: {true, false}
–create-output-variant-md5,-OVM
If true, create a a MD5 digest any VCF file created. Default value: false. Possible
values: {true, false}
–disable-bam-index-caching,-DBIC
If true, don’t cache bam indexes, this will reduce memory requirements but may harm
performance if many intervals are specified. Caching is automatically disabled if there
are no intervals specified. Default value: false. Possible values: {true, false}
–disable-read-filter,-DF
Read filters to be disabled before analysis This argument may be specified 0 or more
times. Default value: null. Possible values: {GoodCigarReadFilter, MappedReadFilter,
MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
MappingQualityReadFilter, NonChimericOriginalAlignmentReadFilter,
NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter,
NotSecondaryAlignmentReadFilter, PassesVendorQualityCheckReadFilter, ReadLengthReadFilter,
WellformedReadFilterPossible values: {
–disable-sequence-dictionary-validation,-disable-sequence-dictionary-validation
If specified, do not check the sequence dictionaries from our inputs for compatibility.
Use at your own risk! Default value: false. Possible values: {true, false}
–dont-use-dragstr-pair-hmm-scores
disable DRAGstr pair-hmm score even when dragstr-params-path was provided Default value:
false. Possible values: {true, false}
–downsampling-stride,-stride
Downsample a pool of reads starting within a range of one or more bases. Default value:
1.
–dragstr-het-hom-ratio
het to hom prior ratio use with DRAGstr on Default value: 2.
–dragstr-params-path
location of the DRAGstr model parameters for STR error correction used in the Pair HMM.
When provided, it overrides other PCR error correcting mechanisms Default value: null.
–enable-dynamic-read-disqualification-for-genotyping
Will enable less strict read disqualification low base quality reads Default value:
false. Possible values: {true, false}
–exclude-intervals,-XL
One or more genomic intervals to exclude from processing This argument may be specified 0
or more times. Default value: null.
–f1r2-max-depth sites with depth higher than this value will be grouped Default value: 200.
–f1r2-median-mq skip sites with median mapping quality below this value Default value: 50.
–f1r2-min-bq exclude bases below this quality from pileup Default value: 20.
–f1r2-tar-gz If specified, collect F1R2 counts and output files into this tar.gz file Default value:
null.
–founder-id,-founder-id
Samples representing the population “founders” This argument may be specified 0 or more
times. Default value: null.
–gatk-config-file A configuration file to use with the GATK. Default value: null.
–gcs-max-retries,-gcs-retries
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
connection Default value: 20.
–gcs-project-for-requester-pays
Project to bill when accessing “requester pays” buckets. If unset, these buckets cannot be
accessed. User must have storage.buckets.get permission on the bucket being accessed.
Default value: .
–genotype-germline-sites
(EXPERIMENTAL) Call all apparent germline site even though they will ultimately be
filtered. Default value: false. Possible values: {true, false}
–genotype-pon-sites Call sites in the PoN even though they will ultimately be filtered. Default value: false.
Possible values: {true, false}
–germline-resource
Population vcf of germline sequencing containing allele fractions. Default value: null.
–graph-output,-graph Write debug assembly graph information to this file Default value: null.
–help,-h display the help message Default value: false. Possible values: {true, false}
–ignore-itr-artifacts
Turn off read transformer that clips artifacts associated with end repair insertions near
inverted tandem repeats. Default value: false. Possible values: {true, false}
–initial-tumor-lod,-init-lod
Log 10 odds threshold to consider pileup active. Default value: 2.0.
–interval-exclusion-padding,-ixp
Amount of padding (in bp) to add to each interval you are excluding. Default value: 0.
–interval-merging-rule,-imr
Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL,
OVERLAPPING_ONLY}
–interval-padding,-ip
Amount of padding (in bp) to add to each interval you are including. Default value: 0.
–interval-set-rule,-isr
Set merging approach to use for combining interval inputs Default value: UNION. Possible
values: {UNION, INTERSECTION}
–intervals,-L One or more genomic intervals over which to operate This argument may be specified 0 or
more times. Default value: null.
–lenient,-LE Lenient processing of VCF files Default value: false. Possible values: {true, false}
–max-assembly-region-size
Maximum size of an assembly region Default value: 300.
–max-population-af,-max-af
Maximum population allele frequency in tumor-only mode. Default value: 0.01.
–max-reads-per-alignment-start
Maximum number of reads to retain per alignment start position. Reads above this threshold
will be downsampled. Set to 0 to disable. Default value: 50.
–min-assembly-region-size
Minimum size of an assembly region Default value: 50.
–min-base-quality-score,-mbq
Minimum base quality required to consider a base for calling Default value: 10.
–mitochondria-mode Mitochondria mode sets emission and initial LODs to 0. Default value: false. Possible
values: {true, false}
–native-pair-hmm-threads
How many threads should a native pairHMM implementation use Default value: 4.
–native-pair-hmm-use-double-precision
use double precision in the native pairHmm. This is slower but matches the java
implementation better Default value: false. Possible values: {true, false}
–normal-lod Log 10 odds threshold for calling normal variant non-germline. Default value: 2.2.
–normal-sample,-normal
BAM sample name of normal(s), if any. May be URL-encoded as output by GetSampleName with
-encode argument. This argument may be specified 0 or more times. Default value: null.
–panel-of-normals,-pon
VCF file of sites observed in normal. Default value: null.
–pcr-indel-qual Phred-scaled PCR indel qual for overlapping fragments Default value: 40.
–pcr-snv-qual Phred-scaled PCR SNV qual for overlapping fragments Default value: 40.
–pedigree,-ped Pedigree file for determining the population “founders” Default value: null.
–QUIET Whether to suppress job-summary info on System.err. Default value: false. Possible
values: {true, false}
–read-filter,-RF Read filters to be applied before analysis This argument may be specified 0 or more
times. Default value: null. Possible values: {AlignmentAgreesWithHeaderReadFilter,
AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator,
FirstOfPairReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter,
HasReadGroupReadFilter, IntervalOverlapReadFilter, LibraryReadFilter, MappedReadFilter,
MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter,
MateDistantReadFilter, MateOnSameContigOrNoMappedMateReadFilter,
MateUnmappedAndUnmappedReadFilter, MetricsReadFilter,
NonChimericOriginalAlignmentReadFilter, NonZeroFragmentLengthReadFilter,
NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter,
NotOpticalDuplicateReadFilter, NotProperlyPairedReadFilter,
NotSecondaryAlignmentReadFilter, NotSupplementaryAlignmentReadFilter,
OverclippedReadFilter, PairedReadFilter, PassesVendorQualityCheckReadFilter,
PlatformReadFilter, PlatformUnitReadFilter, PrimaryLineReadFilter,
ProperlyPairedReadFilter, ReadGroupBlackListReadFilter, ReadGroupReadFilter,
ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter, ReadNameReadFilter,
ReadStrandFilter, SampleReadFilter, SecondOfPairReadFilter, SeqIsStoredReadFilter,
SoftClippedReadFilter, ValidAlignmentEndReadFilter, ValidAlignmentStartReadFilter,
WellformedReadFilterPossible values: {
–read-index,-read-index
Indices to use for the read inputs. If specified, an index must be provided for every read
input and in the same order as the read inputs. If this argument is not specified, the
path to the index for each input will be inferred automatically. This argument may be
specified 0 or more times. Default value: null.
–read-validation-stringency,-VS
Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default
stringency value SILENT can improve performance when processing a BAM file in which
variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default
value: SILENT. Possible values: {STRICT, LENIENT, SILENT}
–seconds-between-progress-updates,-seconds-between-progress-updates
Output traversal statistics every time this many seconds elapse Default value: 10.0.
–sequence-dictionary,-sequence-dictionary
Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a
.dict file. Default value: null.
–sites-only-vcf-output
If true, don’t emit genotype fields when writing vcf file output. Default value: false.
Possible values: {true, false}
–tmp-dir Temp directory to use. Default value: null.
–tumor-lod-to-emit,-emit-lod
Log 10 odds threshold to emit variant to VCF. Default value: 3.0.
–tumor-sample,-tumor BAM sample name of tumor. May be URL-encoded as output by GetSampleName with -encode
argument. Default value: null.
–use-jdk-deflater,-jdk-deflater
Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false.
Possible values: {true, false}
–use-jdk-inflater,-jdk-inflater
Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false.
Possible values: {true, false}
–verbosity,-verbosity
Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING,
INFO, DEBUG}
–version display the version number for this tool Default value: false. Possible values: {true,
false}
Advanced Arguments:
–active-probability-threshold
Minimum probability for a locus to be considered active. Default value: 0.002.
–adaptive-pruning-initial-error-rate
Initial base error rate estimate for adaptive pruning Default value: 0.001.
–allele-informative-reads-overlap-margin
Likelihood and read-based annotations will only take into consideration reads that overlap
the variant or any base no further than this distance expressed in base pairs Default
value: 2.
–allow-non-unique-kmers-in-ref
Allow graphs that have non-unique kmers in the reference Default value: false. Possible
values: {true, false}
–bam-output,-bamout File to which assembled haplotypes should be written Default value: null.
–bam-writer-type Which haplotypes should be written to the BAM Default value: CALLED_HAPLOTYPES. Possible
values: {ALL_POSSIBLE_HAPLOTYPES, CALLED_HAPLOTYPES, NO_HAPLOTYPES}
–debug-assembly,-debug
Print out verbose debug information about each assembly region Default value: false.
Possible values: {true, false}
–disable-adaptive-pruning
Disable the adaptive algorithm for pruning paths in the graph Default value: false.
Possible values: {true, false}
–disable-cap-base-qualities-to-map-quality
If false this disables capping of base qualities in the HMM to the mapping quality of the
read Default value: false. Possible values: {true, false}
–disable-symmetric-hmm-normalizing
Toggle to revive legacy behavior of asymmetrically normalizing the arguments to the
reference haplotype Default value: false. Possible values: {true, false}
–disable-tool-default-annotations,-disable-tool-default-annotations
Disable all tool default annotations Default value: false. Possible values: {true, false}
–disable-tool-default-read-filters,-disable-tool-default-read-filters
Disable all tool default read filters (WARNING: many tools will not function correctly
without their default read filters on) Default value: false. Possible values: {true,
false}
–dont-increase-kmer-sizes-for-cycles
Disable iterating over kmer sizes when graph cycles are detected Default value: false.
Possible values: {true, false}
–dont-use-soft-clipped-bases
Do not analyze soft clipped bases in the reads Default value: false. Possible values:
{true, false}
–emit-ref-confidence,-ERC
Mode for emitting reference confidence scores (For Mutect2, this is a BETA feature)
Default value: NONE. Possible values: {NONE, BP_RESOLUTION, GVCF}
–enable-all-annotations
Use all possible annotations (not for the faint of heart) Default value: false. Possible
values: {true, false}
–expected-mismatch-rate-for-read-disqualification
Error rate used to set expectation for post HMM read disqualification based on mismatches
Default value: 0.02.
–force-active If provided, all regions will be marked as active Default value: false. Possible values:
{true, false}
–force-call-filtered-alleles,-genotype-filtered-alleles
Force-call filtered alleles included in the resource specified by --alleles Default
value: false. Possible values: {true, false}
–gvcf-lod-band,-LODB Exclusive upper bounds for reference confidence LOD bands (must be specified in increasing
order) This argument may be specified 0 or more times. Default value: [-2.5, -2.0, -1.5,
-1.0, -0.5, 0.0, 0.5, 1.0].
–independent-mates Allow paired reads to independently support different haplotypes. Useful for validations
with ill-designed synthetic data. Default value: false. Possible values: {true, false}
–kmer-size Kmer size to use in the read threading assembler This argument may be specified 0 or more
times. Default value: [10, 25].
–linked-de-bruijn-graph
If enabled, the Assembly Engine will construct a Linked De Bruijn graph to recover better
haplotypes Default value: false. Possible values: {true, false}
–max-mnp-distance,-mnp-dist
Two or more phased substitutions separated by this distance or less are merged into MNPs.
Default value: 1.
–max-num-haplotypes-in-population
Maximum number of haplotypes to consider for your population Default value: 128.
–max-prob-propagation-distance
Upper limit on how many bases away probability mass can be moved around when calculating
the boundaries between active and inactive assembly regions Default value: 50.
–max-suspicious-reads-per-alignment-start
Maximum number of suspicious reads (mediocre mapping quality or too many substitutions)
allowed in a downsampling stride. Set to 0 to disable. Default value: 0.
–max-unpruned-variants
Maximum number of variants in graph the adaptive pruner will allow Default value: 100.
–min-dangling-branch-length
Minimum length of a dangling branch to attempt recovery Default value: 4.
–min-pruning Minimum support to not prune paths in the graph Default value: 2.
–minimum-allele-fraction,-min-AF
Lower bound of variant allele fractions to consider when calculating variant LOD Default
value: 0.0.
–num-pruning-samples
Number of samples that must pass the minPruning threshold Default value: 1.
–pair-hmm-gap-continuation-penalty
Flat gap continuation penalty for use in the Pair HMM Default value: 10.
–pair-hmm-implementation,-pairHMM
The PairHMM implementation to use for genotype likelihood calculations Default value:
FASTEST_AVAILABLE. Possible values: {EXACT, ORIGINAL, LOGLESS_CACHING,
AVX_LOGLESS_CACHING, AVX_LOGLESS_CACHING_OMP, EXPERIMENTAL_FPGA_LOGLESS_CACHING,
FASTEST_AVAILABLE}
–pcr-indel-model
The PCR indel model to use Default value: CONSERVATIVE. Possible values: {NONE, HOSTILE,
AGGRESSIVE, CONSERVATIVE}
–phred-scaled-global-read-mismapping-rate
The global assumed mismapping rate for reads Default value: 45.
–pruning-lod-threshold
Ln likelihood ratio threshold for adaptive pruning algorithm Default value:
2.302585092994046.
–pruning-seeding-lod-threshold
Ln likelihood ratio threshold for seeding subgraph of good variation in adaptive pruning
algorithm Default value: 9.210340371976184.
–recover-all-dangling-branches
Recover all dangling branches Default value: false. Possible values: {true, false}
–showHidden,-showHidden
display hidden arguments Default value: false. Possible values: {true, false}
–smith-waterman
Which Smith-Waterman implementation to use, generally FASTEST_AVAILABLE is the right
choice Default value: JAVA. Possible values: {FASTEST_AVAILABLE, AVX_ENABLED, JAVA}
–soft-clip-low-quality-ends
If enabled will preserve low-quality read ends as softclips (used for DRAGEN-GATK BQD
genotyper model) Default value: false. Possible values: {true, false}
Conditional Arguments for readFilter:
Valid only if “MappingQualityReadFilter” is specified:
–maximum-mapping-quality
Maximum mapping quality to keep (inclusive) Default value: null.
–minimum-mapping-quality
Minimum mapping quality to keep (inclusive) Default value: 20.
Valid only if “ReadLengthReadFilter” is specified:
–max-read-length Keep only reads with length at most equal to the specified value Default value:
2147483647.
–min-read-length Keep only reads with length at least equal to the specified value Default value: 30.

所以,关于这一点,咱们不着急。先看文献,找思路。至于一些细节的问题,我们可以一点点的再扣。

nohup  ./gatk Mutect2 -R  /home/xxzhang/workplace/QBRC/geneome/hg38/hg38.fasta --input /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam  --input /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam --output ./output_RNA/mutect.vcf >mutect.txt&

A USER ERROR has occurred: Fasta index file file:///home/xxzhang/workplace/QBRC/geneome/hg19/hg19.fasta.fai for reference file:///home/xxzhang/workplace/QBRC/geneome/hg19/hg19.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (–java-options ‘-DGATK_STACKTRACE_ON_USER_EXCEPTION=true’) to print the stack trace.

A USER ERROR has occurred: Input files reference and reads have incompatible contigs: Found contigs with the same name but different lengths:


更新一下,关于这个错误的解决的思路。
记得之前,自己也纠结过到底是选择mutect1还是mutect2的问题。最近在gatk的官网上,就我上次报出的错误,进行了提问。
链接:https://gatk.broadinstitute.org/hc/en-us/community/posts/4402845032859-The-mutect-ERROR-Please-add-an-explicit-type-tag-NAME-listing-the-correct-type-from-among-the-supported-types

作者的主要的意思是:我目前所使用的gatk的版本(4.2.0.0)只支持mutect2,mutect2是基于mutect1的更新,在性能上会有一些进步。推荐使用mutect2。
我查看了作者的原始文件的list,发现并没有gatk-4.2.0.0-local-jar。那么这个gatk可能是我后期自己安装的。
我翻找实验记录:
我发现我之前的gatk的版本是3.5,作者在文件中,好像对此也并没有强制的要求。

/home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java -Djava.io.tmpdir=./output_RNA/mutmp -Xmx31g -jar /home/xxzhang/workplace/QBRC//somatic_script/mutect-1.1.7.jar --analysis_type MuTect --reference_sequence ./geneome/hg19/hg19.fa --dbsnp ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --cosmic ./geneome/hg19/hg19.fa_resource/CosmicCodingMuts.hg19.vcf  --input_file:tumor /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam --input_file:normal /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam --vcf ./output_RNA/mutect.vcf --out ./output_RNA/mutect.out

在java1.7处输入指令:java -version
出现错误:

java version “1.7.0_80”
Java™ SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot™ 64-Bit Server VM (build 24.80-b11, mixed mode)

说明java1.7在我的环境中是配置好了的。我单纯的使用我的实例数据看一下效果。

java  -Xmx31g -jar /home/xxzhang/workplace/QBRC//somatic_script/mutect-1.1.7.jar --analysis_type MuTect --reference_sequence ./geneome/hg19/hg19.fa --dbsnp ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --cosmic ./geneome/hg19/hg19.fa_resource/CosmicCodingMuts.hg19.vcf  --input_file:tumor /home/xxzhang/workplace/output/tumor.bam --input_file:normal /home/xxzhang/workplace/output/normal.bam --vcf ./output_RNA/mutect.vcf --out ./output_RNA/mutect.

被这个问题,真的纠缠好多时间了。换一个位置,重新写。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
GATK Mutect2是一种广泛用于检测体细胞突变的工具,以下是其检测流程的简要说明。 首先,Mutect2通过比较肿瘤样本和正常样本的测序数据来区分突变事件。它采用配对样本的测序数据,其中包括Tumor样本和Normal样本,用于检测在Tumor样本中特有的变异。 其次,Mutect2将输入的DNA测序数据首先进行处理和去噪,包括读取比对、质量控制和去除PCR偏差等步骤。然后,它使用GATK提供的基于Bayesian模型的变异检测算法来识别可能的单核苷酸变异(SNVs)和小片段插入/删除突变(indels)。 然后,Mutect2使用多个过滤器来排除假阳性的变异。这些过滤器包括测序深度过滤器、错配率过滤器、基因组运行过滤器等。通过应用这些过滤器,Mutect2可以准确地识别并过滤掉可能是由于技术问题或其他伪变异引起的假阳性。 最后,Mutect2输出一个突变调用文件(VCF),其中包含检测到的变异信息,如变异位置、变异类型、基因型频率、基因型质量评分等。这个VCF文件可以进一步用于变异注释、功能预测和统计分析,从而为研究人员提供更多研究突变现象的细节。 总之,GATK Mutect2是一种高效准确的基于比较正常和肿瘤样本测序数据的突变检测工具,它的检测流程包括数据处理、变异检测和过滤、突变调用等步骤,为研究人员提供了有效分析体细胞突变的工具和结果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值