linux+基因组字符替换,IDBA-UD组装基因组简单用法

之前组装基因组一直用另外一个软件:SPAdes,组装效果还不错,但是IDBA的大名早就听说过,所以趁着这次刚那个两个菌的数据,分别用这两个软件组装一下,对比一下效果,在SPAdes的网站上面看到过几个组装软件的对比图,毫无疑问,SPAdes排第一,但是IDBA能排第二,说明IDBA的组装效果还可以。

一,使用说明

安装

If you use the

release package.

Exract the package, then use make to compile the source

code.

$ ./configure

$ make

Introduction

IDBA is the basic iterative de Bruijn graph assembler for

second-generation sequencing reads.

主要的部分分为以下三个:

IDBA-UD, an extension of IDBA, is designed to

utilize paired-end reads to assemble low-depth regions and use

progressive depth on contigs to reduce errors in high-depth

regions. It is a generic purpose assembler and epspacially

good for single-cell and metagenomic

sequencing data.

IDBA-Hybrid is another update version of

IDBA-UD, which can make use of a

similar reference genome to improve assembly

result.

IDBA-Tran is an iterative de Bruijn graph

assembler for RNA-Seq

data.

The basic IDBA is included for comparison, you should use more

specific assemblers for your data.

If you are assembling genomic data without reference, please use

IDBA-UD.

If you are assembling genomic data with a similar reference

genome, please use IDBA-Hybrid. If you are assembling transcriptome

data, please use IDBA-Tran.

转换格式fastq—fasta

需要注意的是IDBA的输入数据只能是fasta格式,并且正反向序列只能放在一个文件中,比较贴心的软件自带格式转换工具。

IDBA series assemblers accept fasta format

reads. Fastq format reads can be converted by fq2fa program in the

packcage.

$ bin/fq2fa read.fq read.fa

IDBA-UD IDBA-Hybrid and IDBA-Tran

require paired-end reads stored in single FastA file and a

pair of reads is in consecutive two lines. If not, please use fq2fa

to merge two FastQ read files to single file.

$ bin/fq2fa --merge --filter read_1.fq read_2.fq read.fa

or convert a FastQ read file to FastA

file.

$ bin/fq2fa --paired --filter read.fq read.fa

The this tools assume the paired-end reads

are in order (->,

->), please convert it by yourself.

二,参数

Note that IDBA assemblers are designed

for short reads (around

100bp). If you want to assemble paired-end reads

with longer read length, please modify the constant kMaxShortSequence in src/sequence/short_sequence.h to

support longer read length.

Please find the manual by running the

assembler without any parameters. For example:

$ bin/idba

IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth.

Usage: idba_ud -r read.fa -o output_dir

Allowed Options:

-o, --out arg (=out) output directory

-r, --read arg fasta read file (<=128) --read_level_2 arg paired-end reads fasta for second level scaffolds

--read_level_3 arg paired-end reads fasta for third level scaffolds

--read_level_4 arg paired-end reads fasta for fourth level scaffolds

--read_level_5 arg paired-end reads fasta for fifth level scaffolds

-l, --long_read arg fasta long read file (>128) --mink arg (=20) minimum k value (<=124)

--maxk arg (=100) maximum k value (<=124)

--step arg (=20) increment of k-mer of each iteration

--inner_mink arg (=10) inner minimum k value

--inner_step arg (=5) inner increment of k-mer

--prefix arg (=3) prefix length used to build sub k-mer table

--min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph

--min_support arg (=1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值