clustalw2 使用简介

最新推荐文章于 2024-06-11 07:30:00 发布
weixin_34160277
最新推荐文章于 2024-06-11 07:30:00 发布
阅读量4.6k
点赞数
文章标签： python 操作系统数据库
原文链接：http://www.cnblogs.com/xiaofeiIDO/p/6441752.html
版权
本文档简要介绍了生物信息学工具ClustalW2的使用方法，涵盖了从安装到基本操作的全过程，适合初学者入门。
摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >
CLUSTAL 2.0.12 Multiple Sequence Alignments




>> HELP NEW <<             NEW FEATURES/OPTIONS

==UPGMA== 
 The UPGMA algorithm has been added to allow faster tree construction. The user now
 has the choice of using Neighbour Joining邻接法 or UPGMA非加权组平均法. The default is still NJ, but the
 user can change this by setting the clustering parameter.
 
 -CLUSTERING=   :NJ or UPGMA
 
==ITERATION==

 A remove first iteration scheme has been added. This can be used to improve the final
 alignment or improve the alignment at each stage of the progressive先进 alignment. During the 
 iteration step each sequence is removed in turn and realigned. If the resulting alignment 
 is better than the  previous alignment it is kept. This process is repeated until the score
 converges (the  score is not improved) or until the maximum number of iterations is 
 reached. The user can  iterate at each step of the progressive alignment by setting the 
 iteration parameter to  TREE or just on the final alignment by seting the iteration 
 parameter to ALIGNMENT. The default is no iteration. The maximum number of  iterations can 
 be set using the numiter parameter. The default number of iterations is 3.
  
 -ITERATION=    :NONE or TREE or ALIGNMENT
 
 -NUMITER=n     :Maximum number of iterations to perform
 
==HELP==
 
 -FULLHELP      :Print out the complete help content
 
==MISC==

 -MAXSEQLEN=n   :Maximum allowed sequence length
 
 -QUIET         :Reduce console output to minimum
 
 -STATS=file    :Log some alignents statistics to file


>> HELP 1 <<             General help for CLUSTAL W (2.0.12)

Clustal W is a general purpose multiple alignment program for DNA or proteins.

SEQUENCE INPUT序列输入:  all sequences must be in 1 file, one after another.  
7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT, 
Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file.
All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in MSF-RSF).  

To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from this menu to 
INPUT them; go to menu item 2 to do the multiple alignment.

PROFILE ALIGNMENTS概要文件比对 (menu item 3) are used to align 2 alignments.  Use this to
add a new sequence to an old alignment, or to use secondary structure to guide 
the alignment process.  GAPS in the old alignments are indicated using the "-" 
character.   PROFILES can be input in ANY of the allowed formats; just 
use "-" (or "." for MSF-RSF) for each gap position.

PHYLOGENETIC TREES系统发生树 (menu item 4) can be calculated from old alignments (read in
with "-" characters to indicate gaps) OR after a multiple alignment while the 
alignment is still in memory.


The program tries to automatically recognise the different file formats used
and to guess whether the sequences are amino acid or nucleotide.  This is not
always foolproof.

FASTA and NBRF-PIR formats are recognised by having a ">" as the first 
character in the file.  

EMBL-Swiss Prot formats are recognised by the letters
ID at the start of the file (the token for the entry name field).  

CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.

GCG-MSF format is recognised by one of the following:
       - the word PileUp at the start of the file. 
       - the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
         at the start of the file.
       - the word MSF on the first line of the line, and the characters ..
         at the end of this line.

GCG-RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of
the file.


If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the
sequence will be assumed to be nucleotide.  This works in 97.3% of cases
but watch out!


>> HELP 2 <<             Help for multiple alignments

If you have already loaded sequences, use menu item 1 to do the complete
multiple alignment.  You will be prompted提示 for 2 output files: 1 for the 
alignment itself; another to store a dendrogram系统发生树 that describes the similarity
of the sequences to each other.

Multiple alignments are carried out in 3 stages (automatically done from menu
item 1 ...Do complete multiple alignments now):

1) all sequences are compared to each other (pairwise成对 alignments);

2) a dendrogram (like a phylogenetic tree) is constructed, describing the
approximate groupings of the sequences by similarity (stored in a file).

3) the final multiple alignment is carried out, using the dendrogram as a guide.


PAIRWISE ALIGNMENT parameters control the speed-sensitivity of the initial
alignments.

MULTIPLE ALIGNMENT parameters control the gaps in the final multiple alignments.


RESET GAPS (menu item 7) will remove any new gaps introduced into the sequences
during multiple alignment if you wish to change the parameters and try again.
This only takes effect just before you do a second multiple alignment.  You
can make phylogenetic trees after alignment whether or not this is ON.
If you turn this OFF, the new gaps are kept even if you do a second multiple
alignment. This allows you to iterate the alignment gradually.  Sometimes, the 
alignment is improved by a second or third pass.

SCREEN DISPLAY (menu item 8) can be used to send the output alignments to the 
screen as well as to the output file.

You can skip the first stages (pairwise alignments; dendrogram) by using an
old dendrogram file (menu item 3); or you can just produce the dendrogram
with no final multiple alignment (menu item 2).


OUTPUT FORMAT: Menu item 9 (format options) allows you to choose from 6 
different alignment formats (CLUSTAL, GCG, NBRF-PIR, PHYLIP, GDE, NEXUS, and FASTA).  



>> HELP 3 <<             Help for pairwise alignment parameters

A distance is calculated between every pair of sequences and these are used to
construct the dendrogram which guides the final multiple alignment. The scores
are calculated from separate pairwise alignments. These can be calculated using
2 methods: dynamic programming (slow but accurate) or by the method of Wilbur
and Lipman (extremely fast but approximate). 

You can choose between the 2 alignment methods using menu option 8.  The
slow-accurate method is fine for short sequences but will be VERY SLOW for 
ma
最低0.47元/天解锁文章