linux+基因组字符替换,IDBA-UD组装基因组简单用法

最新推荐文章于 2024-01-03 08:07:46 发布

学弱猹

最新推荐文章于 2024-01-03 08:07:46 发布

阅读量556

点赞数

文章标签： linux+基因组字符替换

之前组装基因组一直用另外一个软件：SPAdes，组装效果还不错，但是IDBA的大名早就听说过，所以趁着这次刚那个两个菌的数据，分别用这两个软件组装一下，对比一下效果，在SPAdes的网站上面看到过几个组装软件的对比图，毫无疑问，SPAdes排第一，但是IDBA能排第二，说明IDBA的组装效果还可以。

一，使用说明

安装

If you use the

release package.

Exract the package, then use make to compile the source

code.

$ ./configure

$ make

Introduction

IDBA is the basic iterative de Bruijn graph assembler for

second-generation sequencing reads.

主要的部分分为以下三个：

IDBA-UD, an extension of IDBA, is designed to

utilize paired-end reads to assemble low-depth regions and use

progressive depth on contigs to reduce errors in high-depth

regions. It is a generic purpose assembler and epspacially

good for single-cell and metagenomic

sequencing data.

IDBA-Hybrid is another update version of

IDBA-UD, which can make use of a

similar reference genome to improve assembly

result.

IDBA-Tran is an iterative de Bruijn graph

assembler for RNA-Seq

data.

The basic IDBA is included for comparison, you should use more

specific assemblers for your data.

If you are assembling genomic data without reference, please use

IDBA-UD.

If you are assembling genomic data with a similar reference

genome, please use IDBA-Hybrid. If you are assembling transcriptome

data, please use IDBA-Tran.

转换格式fastq—fasta

需要注意的是IDBA的输入数据只能是fasta格式，并且正反向序列只能放在一个文件中，比较贴心的软件自带格式转换工具。

IDBA series assemblers accept fasta format

reads. Fastq format reads can be converted by fq2fa program in the

packcage.

$ bin/fq2fa read.fq read.fa

IDBA-UD IDBA-Hybrid and IDBA-Tran

require paired-end reads stored in single FastA file and a

pair of reads is in consecutive two lines. If not, please use fq2fa

to merge two FastQ read files to single file.

$ bin/fq2fa --merge --filter read_1.fq read_2.fq read.fa

or convert a FastQ read file to FastA

file.

$ bin/fq2fa --paired --filter read.fq read.fa

The this tools assume the paired-end reads

are in order (->,

->), please convert it by yourself.

二，参数

Note that IDBA assemblers are designed

for short reads (around

100bp). If you want to assemble paired-end reads

with longer read length, please modify the constant kMaxShortSequence in src/sequence/short_sequence.h to

support longer read length.

Please find the manual by running the

assembler without any parameters. For example:

$ bin/idba

IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth.

Usage: idba_ud -r read.fa -o output_dir

Allowed Options:

-o, --out arg (=out) output directory

-r, --read arg fasta read file (<=128) --read_level_2 arg paired-end reads fasta for second level scaffolds

--read_level_3 arg paired-end reads fasta for third level scaffolds

--read_level_4 arg paired-end reads fasta for fourth level scaffolds

--read_level_5 arg paired-end reads fasta for fifth level scaffolds

-l, --long_read arg fasta long read file (>128) --mink arg (=20) minimum k value (<=124)

--maxk arg (=100) maximum k value (<=124)

--step arg (=20) increment of k-mer of each iteration

--inner_mink arg (=10) inner minimum k value

--inner_step arg (=5) inner increment of k-mer

--prefix arg (=3) prefix length used to build sub k-mer table

--min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph

--min_support arg (=1)

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
linux+基因组字符替换,IDBA-UD组装基因组简单用法

之前组装基因组一直用另外一个软件：SPAdes，组装效果还不错，但是IDBA的大名早就听说过，所以趁着这次刚那个两个菌的数据，分别用这两个软件组装一下，对比一下效果，在SPAdes的网站上面看到过几个组装软件的对比图，毫无疑问，SPAdes排第一，但是IDBA能排第二，说明IDBA的组装效果还可以。一，使用说明安装If you use therelease package.Exract the p...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。