VarScan2使用

download varscan2 the current version, refer to GitHub at  https://github.com/dkoboldt/varscan

引用率貌似还可以的一个variants检测软件,用来Call Indel自然不在话下。

前面我们说到samtools里面的mpileup,他生成的结果可以给bcftools用来Call Indel,当然也可以用其他软件来处理,这边的VarScan就也是通过mpileup的结果来Call Indel的,具体用法例如,

samtools mpileup -f ref.fasta sample.bam |
java -jar VarScan.v2.3.3.jar mpileup2indel
–output-vcf 1
–vcf-sample-list sample_names.list
> sample.varscan.vcf

VarScan是一个java程序,这–output-vcf 1表示输出结果格式为vcf格式的,否则就是软件本身的格式,然后–vcf-sample-list这个参数是可以不用加的,但是生成的vcf文件中sample名是按1、2、3…这样重新命名的,所以可以用这个参数给一个sample名称的列表,对应你给的bam文件中的sample名,这样vcf文件中就有对应的sample名称了。其他一些参数一般用默认的就ok,注意这边用的是mpileup2indel,对应samtools的mpileup,以前samtools里面还是pileup的时候他就对应pileup2indel (从我懂事起感觉pileup就被淘汰了,都没用到过,世事变迁啊……),然后软件使用的时候可能会提示mpileup生成的结果里面有很多不能解析的,无视应该就可以了……

总体上来说使用方法还是很简单的,就相当于是把bcftools替换成了VarScan,相信大家很容易就能上手。



varscan2: a non probabilistic variant caller

Varscan2[1][2] is coded in Java, and should be executed from the command line (Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant calling, you will need a pileup file. See the How to Build A Pileup File section for details. Running VarScan with no arguments prints the usage information.

For a detailed documentation of v2.3 and later, see http://varscan.sourceforge.net/using-varscan.html[3] and for the current version, refer to GitHub at https://github.com/dkoboldt/varscan [4]

Handicon.png Define a system variable pointing to where VarScan.jar is located and name it VARSCAN. This will ease calling it from anywhere.

basic usage to call variants from samtools pileup

Varscan takes a samtools pileup (as well as the more recent mpileup) as input. Such data is easily obtained with a command below. Some functions expect tumor and normal paired files to perform pairwise analysis/filtering.

samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup

The samtools mpileup can be piped directly to varscan2 to save IO and disk space.

samtools mpileup -f [reference sequence] [BAM file(s)] | java -jar $VARSCAN/VarScan.jar pileup2snp ...

The most useful varscan2 functions are presented below, others can be reviewed by adding -h after the command line.

VarScan v2.4

***NON-COMMERCIAL VERSION***

USAGE: java -jar $VARSCAN/VarScan.jar [COMMAND] [OPTIONS] 

COMMANDS:
        pileup2snp              Identify SNPs from a pileup file
        pileup2indel            Identify indels a pileup file
        pileup2cns              Call consensus and variants from a pileup file
        mpileup2snp             Identify SNPs from an mpileup file
        mpileup2indel           Identify indels an mpileup file
        mpileup2cns             Call consensus and variants from an mpileup file

        somatic                 Call germline/somatic variants from tumor-normal pileups
        copynumber                      Determine relative tumor copy number from tumor-normal pileups
        readcounts              Obtain read counts for a list of variants from a pileup file

        filter                  Filter SNPs by coverage, frequency, p-value, etc.
        somaticFilter           Filter somatic variants for clusters/indels
        fpfilter                Apply the false-positive filter

        processSomatic          Isolate Germline/LOH/Somatic calls from output
        copyCaller              GC-adjust and process copy number changes from VarScan copynumber output
        compare                 Compare two lists of positions/variants
        limit                   Restrict pileup/snps/indels to ROI positions

calling SNVs

For simple SNP calls, several options allow setting the stringency of the varscan2 prediction.

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2snp [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]

calling small InDels

For indels, the following command will do

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2indel [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]

calling both together

For SNV and indels, the following command will do

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2cns [pileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --vcf-sample-list       For VCF output, a list of sample names in order, one per line
        --variants      Report only variant (SNP/indel) positions [0]

Technical.png Without --variants, the returned calls will be of ( ref / SNV / Indel ) while adding --variants will omit the ref-calls

filtering results

This filter command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from mpileup2snp or mpileup2indel.

USAGE: java -jar $VARSCAN/VarScan.jar filter [variants file] OPTIONS
        variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [10]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
        --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
        --min-var-freq  Minimum variant allele frequency threshold [0.20]
        --p-value       Default p-value threshold for calling variants [1e-01]
        --indel-file    File of indels for filtering nearby SNPs, from pileup2indel command
        --output-file   File to contain variants passing filters

more commands for tumor / normal sample pairs

Additional commands are available for somatic calls and somatic CNVs. Please refer to the varscan2 Wiki for detailed somatic detection information and examples.


References:
  1. Jump up

    Daniel C Koboldt, Qunyuan Zhang, David E Larson, Dong Shen, Michael D McLellan, Ling Lin, Christopher A Miller, Elaine R Mardis, Li Ding, Richard K Wilson 
    VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. 
    Genome Res.: 2012, 22(3);568-76 
    [PubMed:22300766] ##WORLDCAT## [DOI] (I p)

    Daniel C Koboldt, Ken Chen, Todd Wylie, David E Larson, Michael D McLellan, Elaine R Mardis, George M Weinstock, Richard K Wilson, Li Ding 
    VarScan: variant detection in massively parallel sequencing of individual and pooled samples. 
    Bioinformatics: 2009, 25(17);2283-5 
    [PubMed:19542151] ##WORLDCAT## [DOI] (I p)



  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
提取组装好的线粒体里面的contig可以使用基因组比对软件如Bowtie2或BWA等进行序列比对,并使用基因组浏览器如IGV进行可视化,以下是具体的流程: 1. 安装Bowtie2或BWA软件以及IGV基因组浏览器。 2. 准备已知的线粒体序列和组装好的线粒体序列,将它们转化为比对软件所需的格式(如Bowtie2需要将序列转化为索引文件),具体格式转化命令可以参考软件使用手册。 3. 运行比对软件对组装好的线粒体序列进行比对,比对命令的具体参数需要根据比对软件和数据集进行调整,可以参考软件使用手册或在线教程。 4. 根据比对结果,使用IGV基因组浏览器进行可视化,定位已知线粒体序列在组装好的线粒体序列上的位置,提取目标contig序列。 5. 对提取出的contig序列进行验证,可以使用工具如PCR或Sanger测序等。 对于三倍体物种,在比对时需要考虑其多倍体结构,可以使用软件如SAMtools对比对结果进行处理,以减少多倍体造成的比对误差。 具体的代码流程如下所示: 1. 格式转化: ``` # Bowtie2索引文件生成 bowtie2-build reference.fasta reference # BWA索引文件生成 bwa index reference.fasta ``` 2. 运行比对软件对组装好的线粒体序列进行比对: ``` # Bowtie2比对 bowtie2 -x reference -U reads.fastq -S output.sam # BWA比对 bwa mem reference.fasta reads.fastq > output.sam ``` 3. 使用SAMtools进行多倍体处理: ``` # SAMtools排序和索引 samtools sort output.sam -o output.sorted.bam samtools index output.sorted.bam # SAMtools mpileup生成.pileup文件 samtools mpileup -uf reference.fasta output.sorted.bam > output.pileup # 使用VarScan进行多倍体SNP调用 java -jar VarScan.jar mpileup2cns output.pileup --min-coverage 10 --output-vcf 1 > output.vcf ``` 4. 使用IGV进行可视化,定位已知线粒体序列在组装好的线粒体序列上的位置,提取目标contig序列。 5. 对提取出的contig序列进行验证,可以使用工具如PCR或Sanger测序等。 需要注意的是,不同的数据集和比对软件可能需要不同的参数和流程,具体操作需要参考软件使用手册和在线教程,并根据实际情况进行调整。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值