0.pbsv简介
pbsv是一个基于reads比对鉴定SV的软件,首先把pbsv的Github放上来:PacificBiosciences/pbsv: pbsv - PacBio structural variant (SV) calling and analysis tools (github.com)
pbsv是PABIO公司发布的SV鉴定软件,所以算是pacbio HiFi官方的SV caller吧,因此call SV之前的比对软件用的也是 pacbio发布的pbmm2(基于mimimap2)。
1.pbsv安装
pbsv的安装很简单,因为有编译好的版本
wget https://github.com/PacificBiosciences/pbsv/releases/download/v2.9.0/pbsv
之后直接加进环境变量或者用绝对路径运行就可以了
pbsv - PacBio structural variant (SV) calling and analysis tools
Usage:
pbsv <tool>-h,--help Show this help and exit.
--version Show application version and exit.Tools:
discover Find structural variant (SV) signatures in read alignments (BAM to SVSIG).
call Call structural variants from SV signatures and assign genotypes (SVSIG to VCF).Typical workflow:
1. Align PacBio reads to a reference genome, per movie.
Subreads BAM input:
$ pbmm2 align ref.fa movie1.subreads.bam ref.movie1.bam --sort --median-filter --sample sample1CCS BAM input:
$ pbmm2 align ref.fa movie1.ccs.bam ref.movie1.bam --sort --preset CCS --sample sample1CCS FASTQ input:
$ pbmm2 align ref.fa movie1.Q20.fastq ref.movie1.bam --sort --preset CCS --sample sample1 --rg '@RG\tID:movie1'2. Discover signatures of structural variation (per movie or per sample):
$ pbsv discover ref.movie1.bam ref.sample1.svsig.gz
$ pbsv discover ref.movie2.bam ref.sample2.svsig.gz3. Call structural variants and assign genotypes (all samples), for CCS input append "--ccs":
$ pbsv call ref.fa ref.sample1.svsig.gz ref.sample2.svsig.gz ref.var.vcfCopyright (C) 2004-2021 Pacific Biosciences of California, Inc.
This program comes with ABSOLUTELY NO WARRANTY; it is intended for
Research Use Only and not for use in diagnostic procedures.
同时,pbmm2的安装也很简单,也有编译好的版本
wget https://github.com/PacificBiosciences/pbmm2/releases/download/v1.14.99/pbmm2
此外也可以用conda安装pbmm2和pbsv
conda install -c bioconda pbsv
conda install -c bioconda pbmm2
2.reads比对
CCS fastq:
pbmm2 align ref.fa quary.fastq pbmm2.bam --sort --preset CCS --sample sample1 --rg '@RG\tID:movie1'
一个CCS bam时:
pbmm2 align -j 40 --preset HIFI --sort --sample $accession ref.fa quary.ccs.bam $accession.pbmm2.bam
多个个CCS bam (多个cell)时:
#将同一个accession的多个bam文件名写进一个fofn文件中
ls $accession.*.ccs.bam >$accession.fofn
#运行pbmm2
pbmm2 align -j 40 --preset HIFI --sort --sample $accession ref.fa $accession.fofn $accession.pbmm2.bam
3. call SV
#寻找 SV 特征
pbsv discover $accession.pbmm2.bam ref.$accession.svsig.gz
#建立索引
tabix -c '#' -s 3 -b 4 -e 4 ref.$accession.svsig.gz
#call SV
pbsv call -O 5 -m 50 -j 30 ref.fa ref.$accession.svsig.gz $accession.pbsv.vcf
其中
-j 表示线程数
-m 表示最小SV长度
-O 表示每个单独的样本中需要支持的最小reads数量