CLUSTAL 2.0.12 Multiple Sequence Alignments >> HELP NEW << NEW FEATURES/OPTIONS ==UPGMA== The UPGMA algorithm has been added to allow faster tree construction. The user now has the choice of using Neighbour Joining邻接法 or UPGMA非加权组平均法. The default is still NJ, but the user can change this by setting the clustering parameter. -CLUSTERING= :NJ or UPGMA ==ITERATION== A remove first iteration scheme has been added. This can be used to improve the final alignment or improve the alignment at each stage of the progressive先进 alignment. During the iteration step each sequence is removed in turn and realigned. If the resulting alignment is better than the previous alignment it is kept. This process is repeated until the score converges (the score is not improved) or until the maximum number of iterations is reached. The user can iterate at each step of the progressive alignment by setting the iteration parameter to TREE or just on the final alignment by seting the iteration parameter to ALIGNMENT. The default is no iteration. The maximum number of iterations can be set using the numiter parameter. The default number of iterations is 3. -ITERATION= :NONE or TREE or ALIGNMENT -NUMITER=n :Maximum number of iterations to perform ==HELP== -FULLHELP :Print out the complete help content ==MISC== -MAXSEQLEN=n :Maximum allowed sequence length -QUIET :Reduce console output to minimum -STATS=file :Log some alignents statistics to file >> HELP 1 << General help for CLUSTAL W (2.0.12) Clustal W is a general purpose multiple alignment program for DNA or proteins. SEQUENCE INPUT序列输入: all sequences must be in 1 file, one after another. 7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT, Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file. All non-alphabetic characters (spaces, digits, punctuation marks) are ignored except "-" which is used to indicate a GAP ("." in MSF-RSF). To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from this menu to INPUT them; go to menu item 2 to do the multiple alignment. PROFILE ALIGNMENTS概要文件比对 (menu item 3) are used to align 2 alignments. Use this to add a new sequence to an old alignment, or to use secondary structure to guide the alignment process. GAPS in the old alignments are indicated using the "-" character. PROFILES can be input in ANY of the allowed formats; just use "-" (or "." for MSF-RSF) for each gap position. PHYLOGENETIC TREES系统发生树 (menu item 4) can be calculated from old alignments (read in with "-" characters to indicate gaps) OR after a multiple alignment while the alignment is still in memory. The program tries to automatically recognise the different file formats used and to guess whether the sequences are amino acid or nucleotide. This is not always foolproof. FASTA and NBRF-PIR formats are recognised by having a ">" as the first character in the file. EMBL-Swiss Prot formats are recognised by the letters ID at the start of the file (the token for the entry name field). CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file. GCG-MSF format is recognised by one of the following: - the word PileUp at the start of the file. - the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT at the start of the file. - the word MSF on the first line of the line, and the characters .. at the end of this line. GCG-RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of the file. If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the sequence will be assumed to be nucleotide. This works in 97.3% of cases but watch out! >> HELP 2 << Help for multiple alignments If you have already loaded sequences, use menu item 1 to do the complete multiple alignment. You will be prompted提示 for 2 output files: 1 for the alignment itself; another to store a dendrogram系统发生树 that describes the similarity of the sequences to each other. Multiple alignments are carried out in 3 stages (automatically done from menu item 1 ...Do complete multiple alignments now): 1) all sequences are compared to each other (pairwise成对 alignments); 2) a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity (stored in a file). 3) the final multiple alignment is carried out, using the dendrogram as a guide. PAIRWISE ALIGNMENT parameters control the speed-sensitivity of the initial alignments. MULTIPLE ALIGNMENT parameters control the gaps in the final multiple alignments. RESET GAPS (menu item 7) will remove any new gaps introduced into the sequences during multiple alignment if you wish to change the parameters and try again. This only takes effect just before you do a second multiple alignment. You can make phylogenetic trees after alignment whether or not this is ON. If you turn this OFF, the new gaps are kept even if you do a second multiple alignment. This allows you to iterate the alignment gradually. Sometimes, the alignment is improved by a second or third pass. SCREEN DISPLAY (menu item 8) can be used to send the output alignments to the screen as well as to the output file. You can skip the first stages (pairwise alignments; dendrogram) by using an old dendrogram file (menu item 3); or you can just produce the dendrogram with no final multiple alignment (menu item 2). OUTPUT FORMAT: Menu item 9 (format options) allows you to choose from 6 different alignment formats (CLUSTAL, GCG, NBRF-PIR, PHYLIP, GDE, NEXUS, and FASTA). >> HELP 3 << Help for pairwise alignment parameters A distance is calculated between every pair of sequences and these are used to construct the dendrogram which guides the final multiple alignment. The scores are calculated from separate pairwise alignments. These can be calculated using 2 methods: dynamic programming (slow but accurate) or by the method of Wilbur and Lipman (extremely fast but approximate). You can choose between the 2 alignment methods using menu option 8. The slow-accurate method is fine for short sequences but will be VERY SLOW for ma
clustalw2 使用简介
最新推荐文章于 2024-06-11 07:30:00 发布