之前组装基因组一直用另外一个软件:SPAdes,组装效果还不错,但是IDBA的大名早就听说过,所以趁着这次刚那个两个菌的数据,分别用这两个软件组装一下,对比一下效果,在SPAdes的网站上面看到过几个组装软件的对比图,毫无疑问,SPAdes排第一,但是IDBA能排第二,说明IDBA的组装效果还可以。
一,使用说明
安装
If you use the
release package.
Exract the package, then use make to compile the source
code.
$ ./configure
$ make
Introduction
IDBA is the basic iterative de Bruijn graph assembler for
second-generation sequencing reads.
主要的部分分为以下三个:
IDBA-UD, an extension of IDBA, is designed to
utilize paired-end reads to assemble low-depth regions and use
progressive depth on contigs to reduce errors in high-depth
regions. It is a generic purpose assembler and epspacially
good for single-cell and metagenomic
sequencing data.
IDBA-Hybrid is another update version of
IDBA-UD, which can make use of a
similar reference genome to improve assembly
result.
IDBA-Tran is an iterative de Bruijn graph
assembler for RNA-Seq
data.
The basic IDBA is included for comparison, you should use more
specific assemblers for your data.
If you are assembling genomic data without reference, please use
IDBA-UD.
If you are assembling genomic data with a similar reference
genome, please use IDBA-Hybrid. If you are assembling transcriptome
data, please use IDBA-Tran.
转换格式fastq—fasta
需要注意的是IDBA的输入数据只能是fasta格式,并且正反向序列只能放在一个文件中,比较贴心的软件自带格式转换工具。
IDBA series assemblers accept fasta format
reads. Fastq format reads can be converted by fq2fa program in the
packcage.
$ bin/fq2fa read.fq read.fa
IDBA-UD IDBA-Hybrid and IDBA-Tran
require paired-end reads stored in single FastA file and a
pair of reads is in consecutive two lines. If not, please use fq2fa
to merge two FastQ read files to single file.
$ bin/fq2fa --merge --filter read_1.fq read_2.fq read.fa
or convert a FastQ read file to FastA
file.
$ bin/fq2fa --paired --filter read.fq read.fa
The this tools assume the paired-end reads
are in order (->,
->), please convert it by yourself.
二,参数
Note that IDBA assemblers are designed
for short reads (around
100bp). If you want to assemble paired-end reads
with longer read length, please modify the constant kMaxShortSequence in src/sequence/short_sequence.h to
support longer read length.
Please find the manual by running the
assembler without any parameters. For example:
$ bin/idba
IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth.
Usage: idba_ud -r read.fa -o output_dir
Allowed Options:
-o, --out arg (=out) output directory
-r, --read arg fasta read file (<=128) --read_level_2 arg paired-end reads fasta for second level scaffolds
--read_level_3 arg paired-end reads fasta for third level scaffolds
--read_level_4 arg paired-end reads fasta for fourth level scaffolds
--read_level_5 arg paired-end reads fasta for fifth level scaffolds
-l, --long_read arg fasta long read file (>128) --mink arg (=20) minimum k value (<=124)
--maxk arg (=100) maximum k value (<=124)
--step arg (=20) increment of k-mer of each iteration
--inner_mink arg (=10) inner minimum k value
--inner_step arg (=5) inner increment of k-mer
--prefix arg (=3) prefix length used to build sub k-mer table
--min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph
--min_support arg (=1)