download varscan2 the current version, refer to GitHub at https://github.com/dkoboldt/varscan
引用率貌似还可以的一个variants检测软件,用来Call Indel自然不在话下。
前面我们说到samtools里面的mpileup,他生成的结果可以给bcftools用来Call Indel,当然也可以用其他软件来处理,这边的VarScan就也是通过mpileup的结果来Call Indel的,具体用法例如,
samtools mpileup -f ref.fasta sample.bam |
java -jar VarScan.v2.3.3.jar mpileup2indel
–output-vcf 1
–vcf-sample-list sample_names.list
> sample.varscan.vcf
VarScan是一个java程序,这–output-vcf 1表示输出结果格式为vcf格式的,否则就是软件本身的格式,然后–vcf-sample-list这个参数是可以不用加的,但是生成的vcf文件中sample名是按1、2、3…这样重新命名的,所以可以用这个参数给一个sample名称的列表,对应你给的bam文件中的sample名,这样vcf文件中就有对应的sample名称了。其他一些参数一般用默认的就ok,注意这边用的是mpileup2indel,对应samtools的mpileup,以前samtools里面还是pileup的时候他就对应pileup2indel (从我懂事起感觉pileup就被淘汰了,都没用到过,世事变迁啊……),然后软件使用的时候可能会提示mpileup生成的结果里面有很多不能解析的,无视应该就可以了……
总体上来说使用方法还是很简单的,就相当于是把bcftools替换成了VarScan,相信大家很容易就能上手。
varscan2: a non probabilistic variant caller
Varscan2[1][2] is coded in Java, and should be executed from the command line (Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant calling, you will need a pileup file. See the How to Build A Pileup File section for details. Running VarScan with no arguments prints the usage information.
For a detailed documentation of v2.3 and later, see http://varscan.sourceforge.net/using-varscan.html[3] and for the current version, refer to GitHub at https://github.com/dkoboldt/varscan [4]
Define a system variable pointing to where VarScan.jar is located and name it VARSCAN. This will ease calling it from anywhere.
basic usage to call variants from samtools pileup
Varscan takes a samtools pileup (as well as the more recent mpileup) as input. Such data is easily obtained with a command below. Some functions expect tumor and normal paired files to perform pairwise analysis/filtering.
samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup
The samtools mpileup can be piped directly to varscan2 to save IO and disk space.
samtools mpileup -f [reference sequence] [BAM file(s)] | java -jar $VARSCAN/VarScan.jar pileup2snp ...
The most useful varscan2 functions are presented below, others can be reviewed by adding -h after the command line.
***NON-COMMERCIAL VERSION***
USAGE: java -jar $VARSCAN/VarScan.jar [COMMAND] [OPTIONS]
COMMANDS:
pileup2snp Identify SNPs from a pileup file
pileup2indel Identify indels a pileup file
pileup2cns Call consensus and variants from a pileup file
mpileup2snp Identify SNPs from an mpileup file
mpileup2indel Identify indels an mpileup file
mpileup2cns Call consensus and variants from an mpileup file
somatic Call germline/somatic variants from tumor-normal pileups
copynumber Determine relative tumor copy number from tumor-normal pileups
readcounts Obtain read counts for a list of variants from a pileup file
filter Filter SNPs by coverage, frequency, p-value, etc.
somaticFilter Filter somatic variants for clusters/indels
fpfilter Apply the false-positive filter
processSomatic Isolate Germline/LOH/Somatic calls from output
copyCaller GC-adjust and process copy number changes from VarScan copynumber output
compare Compare two lists of positions/variants
limit Restrict pileup/snps/indels to ROI positions
calling SNVs
For simple SNP calls, several options allow setting the stringency of the varscan2 prediction.
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--variants Report only variant (SNP/indel) positions (mpileup2cns only) [0]
calling small InDels
For indels, the following command will do
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--variants Report only variant (SNP/indel) positions (mpileup2cns only) [0]
calling both together
For SNV and indels, the following command will do
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--vcf-sample-list For VCF output, a list of sample names in order, one per line
--variants Report only variant (SNP/indel) positions [0]
Without --variants, the returned calls will be of ( ref / SNV / Indel ) while adding --variants will omit the ref-calls
filtering results
This filter command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from mpileup2snp or mpileup2indel.
variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [10]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-strands2 Minimum # of strands on which variant observed (1 or 2) [1]
--min-avg-qual Minimum average base quality for variant-supporting reads [20]
--min-var-freq Minimum variant allele frequency threshold [0.20]
--p-value Default p-value threshold for calling variants [1e-01]
--indel-file File of indels for filtering nearby SNPs, from pileup2indel command
--output-file File to contain variants passing filters
more commands for tumor / normal sample pairs
Additional commands are available for somatic calls and somatic CNVs. Please refer to the varscan2 Wiki for detailed somatic detection information and examples.
References:
- ↑
Daniel C Koboldt, Qunyuan Zhang, David E Larson, Dong Shen, Michael D McLellan, Ling Lin, Christopher A Miller, Elaine R Mardis, Li Ding, Richard K Wilson
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.
Genome Res.: 2012, 22(3);568-76
[PubMed:22300766] ##WORLDCAT## [DOI] (I p)Daniel C Koboldt, Ken Chen, Todd Wylie, David E Larson, Michael D McLellan, Elaine R Mardis, George M Weinstock, Richard K Wilson, Li Ding
VarScan: variant detection in massively parallel sequencing of individual and pooled samples.
Bioinformatics: 2009, 25(17);2283-5
[PubMed:19542151] ##WORLDCAT## [DOI] (I p)