Multiple sequence alignment Benchmark Data set

Multiple sequence alignment Benchmark Data set

 

1. 汇总: 序列比对标准数据集: http://www.drive5.com/bench/

This is a collection of multiple alignment benchmarks in a uniform
format that is convenient for further analysis. All files are in
FASTA format, with upper-case letters used to indicate aligned
columns.

See References below for original sources of benchmark data.

Benchmarks are:

--------------------------1---------------------------

bali2dna
BALIBASE v2, reverse-translated to DNA

bali2dnaf
Bali2dbn, with frame-shifts induced by random insertions of one
or two nucleotides into the middle 50% of exactly one sequence
in each set.

bali3
BALIBASE v3.

bali3pdb
BALIS, the structural subset of BALIBASE v3.

bali3pdbm
MU-BALIS, i.e. BALIS re-aligned by MUSTANG.

---------------------------2--------------------------

ox
OXBENCH.

oxm
MU-OXBENCH, i.e. OXBENCH re-aligned by MUSTANG.

oxx
OXBENCH-X, i.e. the Extended set in OBENCH.

---------------------------3--------------------------

prefab4
PREFAB v4.

prefab4ref
PREFAB-R, i.e. the pair-wise reference pairs in PREFAB v4.

prefab4refm
MU-PREFAB-R, i.e. PREFAB-R re-aligned by MUSTANG.

---------------------------4--------------------------

sabre
Consistent multiple alignments constructed from SABMARK v1.65.

sabrem
MU-SABRE, i.e. SABRE re-aligned by MUSTANG.

-----------------------------------------------------

Directory structure under each benchmark is:

in/
Input sequences.

ref/
Reference alignments. Upper-case regions indicate conservative
regions that are intended for use in assessment. Lower-case regions
should not be used.

info/
Contains ids.txt (list of set identifiers that are filenames in ref/
and in/), nrseqs.txt (number of sequences in each set), and
pctids.txt (%id in conservative regions in each set).

Download page for qscore :http://www.drive5.com/bench/bench.tar.gz

This is a quality scoring program that compares two multiple sequence alignments: an alignment to be evaluated (the "test" alignment) and a second alignment that is believed to be correct (the "reference" alignment). The program outputs the following scores:
- The PREFAB Q score (aka the Balibase SPS score or the Developer score).
- The Modeler score
- The Cline et al. shift score
- The Balibase TC (total column) score


 

Balibase标准数据库地址: http://www.lbgi.fr/balibase/


 

References
----------

Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest
developments of the multiple sequence alignment benchmark. Proteins
61: 127-136.

Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark
Alignment dataBASE): enhancements for repeats, transmembrane
sequences and circular permutations. Nucleic Acids Res 29: 323-326.

Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark
alignment database for the evaluation of multiple alignment programs.
Bioinformatics 15: 87-88.

Van Walle I, Lasters I, Wyns L (2005) SABmark--a benchmark for
sequence alignment that covers the entire known fold space.
Bioinformatics 21: 1267-1268.

Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003)
OXBench: a benchmark for evaluation of protein multiple sequence
alignment accuracy. BMC Bioinformatics 4: 47.

Edgar RC (2004) MUSCLE: multiple sequence alignment with high
accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值