使用hifiasm组装hifi基因组的方法介绍

最新推荐文章于 2024-06-21 11:01:35 发布

生信技术

最新推荐文章于 2024-06-21 11:01:35 发布

阅读量1.1w

点赞数 11

分类专栏：基因组文章标签： python linux

本文链接：https://blog.csdn.net/m0_49960764/article/details/116196727

版权

基因组专栏收录该内容

8 篇文章 11 订阅

订阅专栏

目前用于Pacbio HIFI测序数据的组装软件主流上有：FALCON、Hifiasm、Hicanu、NextDenovo。

Hifiasm的使用

介绍

Hifiasm是用于PacBio Hifi读取的快速单倍型解析的从头汇编程序。它可以在几个小时内组装一个人类基因组，并与加利福尼亚红木基因组（迄今为止测序最复杂的基因组之一）一起工作。Hifiasm可以生产质量最好的组装商的初级/替代组装。它还引入了新的图合并算法，并在给定三重数据的情况下实现了最佳的单倍型解析程序集。

我认为hifiasm不仅组装速度快，而且准确度也很好，所以还是很推荐使用

软件安装

#使用conda安装
conda install -c bioconda hifiasm

#安装hifiasm（需要g++和zlib）
git clone https://github.com/chhylp123/hifiasm
cd hifiasm && make

格式转换

由于是bam格式需要转换为fasta格式

# bam --> fasta
samtools view *.bam | awk '{print ">"$1"\n"$10}' > fasta

#补充一下其他格式的转换
## sam ---> fasta
cat *.sam | awk '{print ">"$1"\n"$10}' > *.fasta
## fasta ---> sam
bowtie2 -1 *_1.fa -2 *_2.fa -p 16 -x prefix -S *.sam
## sam --> bam
# -@：线程 -b：输出格式为BAM -S：自动检测输入格式 -o：输出文件
samtools view -@ 16 -b -S final.sam -o final.bam
## bam --> sam
samtools view *.bam -O SAM > *.sam

软件参数

$ hifiasm
Usage: hifiasm [options] <in_1.fq> <in_2.fq> <...>
Options:
  Input/Output:
    -o STR       prefix of output files [hifiasm.asm]
    -i           ignore saved read correction and overlaps
    -t INT       number of threads [1]
    -z INT       length of adapters that should be removed [0]
    --version    show version number
  Overlap/Error correction:
    -k INT       k-mer length (must be <64) [51]
    -w INT       minimizer window size [51]
    -f INT       number of bits for bloom filter; 0 to disable [37]
    -D FLOAT     drop k-mers occurring >FLOAT*coverage times [5.0]
    -N INT       consider up to max(-D*coverage,-N) overlaps for each oriented read [100]
    -r INT       round of correction [3]
  Assembly:
    -a INT       round of assembly cleaning [4]
    -m INT       pop bubbles of <INT in size in contig graphs [10000000]
    -p INT       pop bubbles of <INT in size in unitig graphs [0]
    -n INT       remove tip unitigs composed of <=INT reads [3]
    -x FLOAT     max overlap drop ratio [0.8]
    -y FLOAT     min overlap drop ratio [0.2]
    -u           disable post join contigs step which may improve N50
    --lowQ       INT
                 output contig regions with >=INT% inconsistency in BED format; 0 to disable [70]
    --b-cov      INT
                 break contigs at positions with <INT-fold coverage; work with '--m-rate'; 0 to disable [0]
    --h-cov      INT
                 break contigs at positions with >INT-fold coverage; work with '--m-rate'; -1 to disable [-1]
    --m-rate     FLOAT
                 break contigs at positions with <=FLOAT*coverage exact overlaps;
                 only work with '--b-cov' or '--h-cov'[0.75]
    --primary    output a primary assembly and an alternate assembly
  Trio-partition:
    -1 FILE      hap1/paternal k-mer dump generated by "yak count" []
    -2 FILE      hap2/maternal k-mer dump generated by "yak count" []
    -c INT       lower bound of the binned k-mer's frequency [2]
    -d INT       upper bound of the binned k-mer's frequency [5]
    -3 FILE      list of hap1/paternal read names []
    -4 FILE      list of hap2/maternal read names []
    --t-occ      INT
                 force remove unitigs with >INT unexpected haplotype-specific reads;
                 ignore graph topology; [60]
  Purge-dups:
    -l INT       purge level. 0: no purging; 1: light; 2/3: aggressive [0 for trio; 3 for unzip]
    -s FLOAT     similarity threshold for duplicate haplotigs [0.75 for -l1/-l2, 0.55 for -l3]
    -O INT       min number of overlapped reads for duplicate haplotigs [1]
    --purge-cov  INT
                 coverage upper bound of Purge-dups [auto]
    --n-hap      INT
                 number of haplotypes [2]
  Hi-C-partition:
    --h1 FILEs   file names of Hi-C R1  [r1_1.fq,r1_2.fq,...]
    --h2 FILEs   file names of Hi-C R2  [r2_1.fq,r2_2.fq,...]
    --seed INT   RNG seed [11]
    --n-weight   INT
                 rounds of reweighting Hi-C links [3]
    --n-perturb  INT
                 rounds of perturbation [10000]
    --f-perturb  FLOAT
                 fraction to flip for perturbation [0.1]

用法

典型的hifiasm命令行如下所示：

hifiasm -o <outputPrefix> -t <nThreads> <HiFi-reads.fasta>
#eg：
hifiasm -o NA12878.asm -t 32 NA12878.fq.gz

其中NA12878.fq.gz提供输入reads，-t设置使用中的CPU数,-o输出文件的前缀名

加入HiC测序数据进行hifiasm组装：
在新版本中可添加hic测序数据进行组装，组装的结果还是不错的

hifiasm --h1 HIC_1_clean.fq.gz --h2 HIC_2_clean.fq.gz -t 50 HiFi-reads.fq.gz

# --h1 --h2 HiC 数据
# -t 线程数
# -o 输出文件前缀
# HiFi-reads.fq.gz

当亲本reads可用时，hifiasm可以生成一对具有三位一体的单倍型解析程序集。要进行这种组装，需要先用yak计算k-mers，然后进行组装：

yak count -b37 -t <nThreads> -o <pat.yak> <paternal-short-reads.fastq>
yak count -b37 -t <nThreads> -o <mat.yak> <maternal-short-reads.fastq>

#eg：
yak count -k31 -b37 -t16 -o pat.yak paternal.fq.gz
yak count -k31 -b37 -t16 -o mat.yak maternal.fq.gz

然后我们用以下命令产生the paternal assembly and the maternal assembly：

hifiasm -o <outputPrefix> -t <nThreads> -1 <pat.yak> -2 <mat.yak> <HiFi-reads.fasta>
#eg:
hifiasm -o NA12878.asm -t 20 -1 pat.yak -2 mat.yak NA12878.fq.gz

结果

对于非三重组装，hifiasm会生成以下文件：

prefix.r_utg.gfa(Haplotype-resolved raw unitig graph in GFA format)：保留了组装生成的所有单体型信息，包括体细胞突变和重复的测序错误。
prefix.p_utg.gfa(Haplotype-resolved processed unitig graph without small bubbles)：无小气泡的单倍型解析；去掉由于体细胞突变和数据背景噪音引起的small bubbles（这个并不是真正的单体型信息），对于高度杂合基因组物种优先选择这个结果。
prefix.p_ctg.gfa(Primary assembly contig graph)：对于低杂合度物种来说，优先选择该文件；对于高杂合度物种，该结果代表其中一个单倍型。
prefix.a_ctg.gfa(Alternate assembly contig graph)：组装出来的另一套单体型基因组结果。

对于三重组装，hifiasm会生成以下文件：

prefix.r_utg.gfa(Haplotype-resolved raw unitig graph in GFA format):保存了所有的单倍型信息。
prefix.hap1.p_ctg.gfa(Phased paternal/haplotype1 contig graph):保留了阶段性父系/单倍型1组装。
prefix.hap2.p_ctg.gfa(Phased maternal/haplotype2 contig graph):保留了阶段性母系/单倍型2组装。

生信技术

关注

11
点赞
踩
52

收藏

觉得还不错? 一键收藏
11
评论
使用hifiasm组装hifi基因组的方法介绍

目前用于Pacbio HIFI测序数据的组装软件主流上有：FALCON、Hifiasm、Hicanu三款。Hifiasm的使用介绍Hifiasm是用于PacBio Hifi读取的快速单倍型解析的从头汇编程序。它可以在几个小时内组装一个人类基因组，并与加利福尼亚红木基因组（迄今为止测序最复杂的基因组之一）一起工作。Hifiasm可以生产质量最好的组装商的初级/替代组装。它还引入了新的图合并算法，并在给定三重数据的情况下实现了最佳的单倍型解析程序集。软件安装#使用conda安装conda insta
复制链接

扫一扫