alfalfa：长片段比对软件

最新推荐文章于 2024-08-06 10:30:56 发布

msw521sg

最新推荐文章于 2024-08-06 10:30:56 发布

阅读量1.8k

点赞数

分类专栏：生物信息文章标签： GMAP alfalfa align 三代测序生物信息

本文链接：https://blog.csdn.net/msw521sg/article/details/53040366

版权

随着三代测序技术的发展，长片段比对工具的需求日益增长。本文介绍了新的比对软件alfalfa，它在性能上优于BWA-MEM, BWA-SW, Bowtie 2, CUSHAW3等。alfalfa不仅适用于长片段比对，还兼容短序列。文章详细讲解了alfalfa的安装和使用，包括index、align和evaluate三个子命令，提供了丰富的参数选项以适应不同的比对需求。" 130904040,8095769,电装光庭汽车电子：智能座舱与汽车电子解决方案,"['汽车电子', '智能驾驶', '汽车安全', '车载系统', '软件开发']

摘要由CSDN通过智能技术生成

随着三代测序的兴起，长片段比对需求增加。前面一直在用GMAP,这个软件有个缺点，一条带有polyA序列的可能比对不上参考序列，当把polyA去掉之后，又可以比对上。意思就是，当序列开头比对不上的时候，GMAP可能就认为比对不上。有的时候GMAP也不能找全所有可能的hit。使用GMAP时，对于那些map不上的序列，最好随机选择一些，看看是不是真的map不上。今天发现一个新的长片段比对软件 alfalfa （苜蓿）。这款软件号称也兼容短序列比对。文章测评下来，当然要比BWA-MEM, BWA-SW， Bowtie 2, CUSHAW3要好。下面简单介绍下安装及使用

安装

$ git clone git://github.com/readmapping/alfalfa.git
 $ cd alfalfa
$ make
```就是这样简单，安装之后在当前目录下会出现alfalfa可执行命令文件，可以将他copy至全局变量路径里。




<div class="se-preview-section-delimiter"></div>

####使用





<div class="se-preview-section-delimiter"></div>

@PG ID:alfalfa VN:0.8.1
Usage: alfalfa [] [option…]
Command should be index, align or evaluate
Subcommand is only required for the evaluate command

commands:

index is used to construct the data structures for indexing a given reference
genome.
align is used for mapping and aligning a read set onto a reference genome.
evaluate is used for evaluating the accuracy of simulated reads and summarizing
statistics from the SAM-formatted alignments reported by a read mapper.

call alfalfa -h/–help for more detailed information on the specific
commands





<div class="se-preview-section-delimiter"></div>

#####Usage: alfalfa index [option...]




<div class="se-preview-section-delimiter"></div>

index is used to construct the data structures for indexing a given reference
genome.

options

-r/–reference (file).
Specifies the location of a file that contains the reference
genome in multi-fasta format.
-s/–sparseness (int, 12).
Specifies the sparseness of the index structure as a way to
control part of the speed-memory trade-off.
-p/–prefix (string, filename passed to the -r option).
Specifies the prefix that will be used to name all generated
index files. The same prefix has to be passed to the -i option
of the align command to load the index structure when mapping
reads.
–no-child .
By default, a sparse child array is constructed and stored in an
index file with extension .child. The construction of this
sparse child array is skipped when the –no-child option is set.
This data structure speeds up seed-finding at the cost of (4/s)
bytes per base in the reference genome. As the data structure
provides a major speed-up, it is advised to have it constructed.
–suflink .
Suffix link support is disabled by default. Suffix link support
is enabled when the –suflink option is set, resulting in an
index file with extension .isa to be generated. This data
structure speeds up seed-finding at the cost of (4/s) bytes per
base. It is only useful when sparseness is less than four and
minimum seed length is very low (less than 10), because it
conflicts with skipping suffixes in matching the read. In
practice, this is rarely the case.
–no-kmer .
By default, a 10-mer lookup table is constructed that contains
the suffix array interval positions to depth 10 in the virtual
suffix tree. It is stored in