bcftools的安装和使用已经bam to vcf 【bcftools】

写在前面
以下内容均是在课堂中,根据老师讲解与帮助完成
部分文件和内容由老师提供和准备
多种安装方式中,老师已经帮我们提前安装好,这里就不再上机演示了,只演示bam to vcf 【bcftools】的步骤

1. bcftools的安装和使用

1.1. apt安装

  • ubuntu系统和sudo权限
  1. sudo安装
    sudo apt install bcftools
    
  2. 如果已经安装了bcftools,可以直接运行bcftools call命令来执行您的操作。
    bcftools call命令用于对VCF格式的文件进行变异位点的调用。可以将需要进行调用的VCF文件作为输入,并将结果输出到标准输出或指定的输出文件中。
    例如:
    bcftools call -O v -o output.vcf input.vcf
    
    这将对input.vcf文件进行变异位点的调用,并将结果保存到output.vcf文件中。
    确保在运行bcftools call命令之前,已经正确安装了bcftools并且已经设置好了正确的环境变量。

1.2. 源代码编译安装

  • 官网给出的编译方式
git clone --recurse-submodules https://github.com/samtools/htslib.git
git clone https://github.com/samtools/bcftools.git
cd bcftools
 # The following is optional:
 #   autoheader && autoconf && ./configure --enable-libgsl --enable-perl-filters
make
  1. 克隆htslibbcftools的代码仓库:

    git clone --recurse-submodules https://github.com/samtools/htslib.git
    git clone https://github.com/samtools/bcftools.git
    

    这将分别克隆htslibbcftools的代码仓库到当前目录。

  2. 进入bcftools目录:

    cd bcftools
    

    在这个目录下,您可以选择执行以下可选步骤:

    autoheader && autoconf && ./configure --enable-libgsl --enable-perl-filters
    

    该步骤将生成配置文件,并根据需要启用libgslperl-filters功能。如果不需要这些功能,可以跳过该步骤。

  3. 使用make命令编译bcftools

    make
    

    编译成功后,您可以在bcftools目录中找到生成的可执行文件。

2. bcftools的使用

2.1. 基础使用

Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.10.2 (using htslib 1.10.2-3ubuntu0.1)

Usage:   bcftools [--version|--version-only] [--help] <command> <argument>

Commands:

 -- Indexing
    index        index VCF/BCF files

 -- VCF/BCF manipulation
    annotate     annotate and edit VCF/BCF files
    concat       concatenate VCF/BCF files from the same set of samples
    convert      convert VCF/BCF files to different formats and back
    isec         intersections of VCF/BCF files
    merge        merge VCF/BCF files files from non-overlapping sample sets
    norm         left-align and normalize indels
    plugin       user-defined plugins
    query        transform VCF/BCF into user-defined formats
    reheader     modify VCF/BCF header, change sample names
    sort         sort VCF/BCF file
    view         VCF/BCF conversion, view, subset and filter VCF/BCF files

 -- VCF/BCF analysis
    call         SNP/indel calling
    consensus    create consensus sequence by applying VCF variants
    cnv          HMM CNV calling
    csq          call variation consequences
    filter       filter VCF/BCF files using fixed thresholds
    gtcheck      check sample concordance, detect sample swaps and contamination
    mpileup      multi-way pileup producing genotype likelihoods
    roh          identify runs of autozygosity (HMM)
    stats        produce VCF/BCF stats

 Most commands accept VCF, bgzipped VCF, and BCF with the file type detected
 automatically even when streaming from a pipe. Indexed VCF and BCF will work
 in all situations. Un-indexed VCF and BCF and streams will work in most but
 not all situations.

2.2. call、filter、view、stats

2.2.1. call:SNP/indel calling
  • 用于对接bcftools mpileup命令
  • 替代了以前的“bcftools view"
About:   SNP/indel variant calling from VCF/BCF. To be used in conjunction with bcftools mpileup.
         This command replaces the former "bcftools view" caller. Some of the original
         functionality has been temporarily lost in the process of transition to htslib,
         but will be added back on popular demand. The original calling model can be
         invoked with the -c option.
Usage:   bcftools call [options] <in.vcf.gz>

File format options:
       --no-version                do not append version and command line to the header
   -o, --output <file>             write output to a file [standard output]
   -O, --output-type <b|u|z|v>     output type: 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
       --ploidy <assembly>[?]      predefined ploidy, 'list' to print available settings, append '?' for details
       --ploidy-file <file>        space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
   -r, --regions <region>          restrict to comma-separated list of regions
   -R, --regions-file <file>       restrict to regions listed in a file
   -s, --samples <list>            list of samples to include [all samples]
   -S, --samples-file <file>       PED file or a file with an optional column with sex (see man page for details) [all samples]
   -t, --targets <region>          similar to -r but streams rather than index-jumps
   -T, --targets-file <file>       similar to -R but streams rather than index-jumps
       --threads <int>             use multithreading with <int> worker threads [0]

Input/output options:
   -A, --keep-alts                 keep all possible alternate alleles at variant sites
   -f, --format-fields <list>      output format fields: GQ,GP (lowercase allowed) []
   -F, --prior-freqs <AN,AC>       use prior allele frequencies
   -G, --group-samples <file|->    group samples by population (file with "sample\tgroup") or "-" for single-sample calling
   -g, --gvcf <int>,[...]          group non-variant sites into gVCF blocks by minimum per-sample DP
   -i, --insert-missed             output also sites missed by mpileup but present in -T
   -M, --keep-masked-ref           keep sites with masked reference allele (REF=N)
   -V, --skip-variants <type>      skip indels/snps
   -v, --variants-only             output variant sites only

Consensus/variant calling options:
   -c, --consensus-caller          the original calling method (conflicts with -m)
   -C, --constrain <str>           one of: alleles, trio (see manual)
   -m, --multiallelic-caller       alternative model for multiallelic and rare-variant calling (conflicts with -c)
   -n, --novel-rate <float>,[...]  likelihood of novel mutation for constrained trio calling, see man page for details [1e-8,1e-9,1e-9]
   -p, --pval-threshold <float>    variant if P(ref|D)<FLOAT with -c [0.5]
   -P, --prior <float>             mutation rate (use bigger for greater sensitivity), use with -m [1.1e-3]

Example:
   # See also http://samtools.github.io/bcftools/howtos/variant-calling.html
   bcftools mpileup -f reference.fa alignments.bam | bcftools call -mv -Ob -o calls.bcf
2.2.2. filter: filter VCF/BCF files using fixed thresholds
About:   Apply fixed-threshold filters.
Usage:   bcftools filter [options] <in.vcf.gz>

Options:
    -e, --exclude <expr>          exclude sites for which the expression is true (see man page for details)
    -g, --SnpGap <int>            filter SNPs within <int> base pairs of an indel
    -G, --IndelGap <int>          filter clusters of indels separated by <int> or fewer base pairs allowing only one to pass
    -i, --include <expr>          include only sites for which the expression is true (see man page for details
    -m, --mode [+x]               "+": do not replace but add to existing FILTER; "x": reset filters at sites which pass
        --no-version              do not append version and command line to the header
    -o, --output <file>           write output to a file [standard output]
    -O, --output-type <b|u|z|v>   b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
    -r, --regions <region>        restrict to comma-separated list of regions
    -R, --regions-file <file>     restrict to regions listed in a file
    -s, --soft-filter <string>    annotate FILTER column with <string> or unique filter name ("Filter%d") made up by the program ("+")
    -S, --set-GTs <.|0>           set genotypes of failed samples to missing (.) or ref (0)
    -t, --targets <region>        similar to -r but streams rather than index-jumps
    -T, --targets-file <file>     similar to -R but streams rather than index-jumps
        --threads <int>           use multithreading with <int> worker threads [0]
2.2.3. view:
  • 用于查看VCF文件的内容,包括样本信息、位点信息、注释信息等
About:   VCF/BCF conversion, view, subset and filter VCF/BCF files.
Usage:   bcftools view [options] <in.vcf.gz> [region1 [...]]

Output options:
    -G,   --drop-genotypes              drop individual genotype information (after subsetting if -s option set)
    -h/H, --header-only/--no-header     print the header only/suppress the header in VCF output
    -l,   --compression-level [0-9]     compression level: 0 uncompressed, 1 best speed, 9 best compression [-1]
          --no-version                  do not append version and command line to the header
    -o,   --output-file <file>          output file name [stdout]
    -O,   --output-type <b|u|z|v>       b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
    -r, --regions <region>              restrict to comma-separated list of regions
    -R, --regions-file <file>           restrict to regions listed in a file
    -t, --targets [^]<region>           similar to -r but streams rather than index-jumps. Exclude regions with "^" prefix
    -T, --targets-file [^]<file>        similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix
        --threads <int>                 use multithreading with <int> worker threads [0]

Subset options:
    -a, --trim-alt-alleles        trim ALT alleles not seen in the genotype fields (or their subset with -s/-S)
    -I, --no-update               do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN)
    -s, --samples [^]<list>       comma separated list of samples to include (or exclude with "^" prefix)
    -S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
        --force-samples           only warn about unknown subset samples

Filter options:
    -c/C, --min-ac/--max-ac <int>[:<type>]      minimum/maximum count for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -f,   --apply-filters <list>                require at least one of the listed FILTER strings (e.g. "PASS,.")
    -g,   --genotype [^]<hom|het|miss>          require one or more hom/het/missing genotype or, if prefixed with "^", exclude sites with hom/het/missing genotypes
    -i/e, --include/--exclude <expr>            select/exclude sites for which the expression is true (see man page for details)
    -k/n, --known/--novel                       select known/novel sites only (ID is not/is '.')
    -m/M, --min-alleles/--max-alleles <int>     minimum/maximum number of alleles listed in REF and ALT (e.g. -m2 -M2 for biallelic sites)
    -p/P, --phased/--exclude-phased             select/exclude sites where all samples are phased
    -q/Q, --min-af/--max-af <float>[:<type>]    minimum/maximum frequency for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -u/U, --uncalled/--exclude-uncalled         select/exclude sites without a called genotype
    -v/V, --types/--exclude-types <list>        select/exclude comma-separated list of variant types: snps,indels,mnps,ref,bnd,other [null]
    -x/X, --private/--exclude-private           select/exclude sites where the non-reference alleles are exclusive (private) to the subset samples
2.2.4. stats: produce VCF/BCF stats
  • bcftools stats 命令可以用于统计 SNP 和 Indel 的信息,如统计文件中 SNP 和 Indel 的数量:
About:   Parses VCF or BCF and produces stats which can be plotted using plot-vcfstats.
         When two files are given, the program generates separate stats for intersection
         and the complements. By default only sites are compared, -s/-S must given to include
         also sample columns.
Usage:   bcftools stats [options] <A.vcf.gz> [<B.vcf.gz>]

Options:
        --af-bins <list>               allele frequency bins, a list (0.1,0.5,1) or a file (0.1\n0.5\n1)
        --af-tag <string>              allele frequency tag to use, by default estimated from AN,AC or GT
    -1, --1st-allele-only              include only 1st allele at multiallelic sites
    -c, --collapse <string>            treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
    -d, --depth <int,int,int>          depth distribution: min,max,bin size [0,500,1]
    -e, --exclude <expr>               exclude sites for which the expression is true (see man page for details)
    -E, --exons <file.gz>              tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed)
    -f, --apply-filters <list>         require at least one of the listed FILTER strings (e.g. "PASS,.")
    -F, --fasta-ref <file>             faidx indexed reference sequence file to determine INDEL context
    -i, --include <expr>               select sites for which the expression is true (see man page for details)
    -I, --split-by-ID                  collect stats for sites with ID separately (known vs novel)
    -r, --regions <region>             restrict to comma-separated list of regions
    -R, --regions-file <file>          restrict to regions listed in a file
    -s, --samples <list>               list of samples for sample stats, "-" to include all samples
    -S, --samples-file <file>          file of samples to include
    -t, --targets <region>             similar to -r but streams rather than index-jumps
    -T, --targets-file <file>          similar to -R but streams rather than index-jumps
    -u, --user-tstv <TAG[:min:max:n]>  collect Ts/Tv stats for any tag using the given binning [0:1:100]
        --threads <int>                use multithreading with <int> worker threads [0]
    -v, --verbose                      produce verbose per-site and per-sample output

3. bam to vcf 【bcftools】

3.1. Bam2Bcf

3.1.1. 命令示例
  1. 进入~/bwa_test目录
    cd ~/bwa_test
    
  2. 使用bcftools mpileup命令生成PCC7942_bwa.bcf文件
    bcftools mpileup -f GCA_000012525.1_ASM1252v1_genomic.fna PCC7942_bwa.bam.sorted > PCC7942_bwa.bcf
    
  3. 使用less命令查看该文件的内容
    less PCC7942_bwa.bcf
    

上述命令将基于参考基因组文件GCA_000012525.1_ASM1252v1_genomic.fna和对应的BAM文件PCC7942_bwa.bam.sorted生成PCC7942_bwa.bcf文件,并使用less命令查看该文件的内容。

生成BCF文件的过程可能需要一些时间,具体时间取决于输入文件大小和计算资源。

3.1.2. 上机演示

3.2. bcftolls检测变异位点

3.2.1. 命令示例
  1. 使用bcftoolsPCC7942_bwa.bcf文件进行变异位点的调用,并将结果保存到PCC7942_bwa.variant.bcf文件中。
    bcftools call -vm PCC7942_bwa.bcf -o PCC7942_bwa.variant.bcf
    
  2. 使用bcftools view命令将PCC7942_bwa.variant.bcf文件中的SNPs和indels过滤出来,并将结果保存到PCC7942_bwa.snps.vcf文件中。
    bcftools view -v snps,indels PCC7942_bwa.variant.bcf > PCC7942_bwa.snps.vcf
    
  3. 使用less命令查看PCC7942_bwa.snps.vcf文件的内容。
    less PCC7942_bwa.snps.vcf
    

生成和处理BCF/VCF文件的过程可能需要一些时间,具体时间取决于输入文件的大小和计算资源的情况。

3.2.2. 上机演示

3.3. 变异位点过滤

  • 用filter命令
3.3.1. 命令示例
  1. 使用bcftools filter命令对PCC7942_bwa.snps.vcf文件进行过滤操作,将满足QUAL大于20且DP大于5的变异位点筛选出来,并将结果保存到PCC7942_bwa.snps.filtered.vcf文件中。
    bcftools filter -o PCC7942_bwa.snps.filtered.vcf -i 'QUAL>20 && DP>5' PCC7942_bwa.snps.vcf
    
  2. 使用less命令查看PCC7942_bwa.snps.filtered.vcf文件的内容。
    less PCC7942_bwa.snps.filtered.vcf
    

过滤操作的时间取决于输入文件的大小和计算资源的情况。

3.3.2. 上机演示

上机演示内容后面补上,需要裁剪图片

  • 24
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

sxx0309

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值