26P A M L M A N U A L
Amino Acid Sequences (seqtype = 2)
model specifies the model of amino acid substitution: 0 for the Poisson model assuming equal rates for any amino acid substitutions (Bishop and Friday, 1987); 1 for the
proportional model in which the rate of change to an amino acid is proportional to the frequency of that amino acid. Model = 2 specifies a class of empirical
models, and the empirical amino acid substitution rate matrix is given in the file
specified by aaRatefile. Files included in the package are for the empirical
models of Dayhoff et al. (1978) (dayhoff.dat), Jones et al. 1992 (1992) (see
(Kishino, Miyata, and Hasegawa 1990) for the construction), and Whelan and
Goldman (2001) (wag.dat). The file mtmam.dat has a matrix for
mitochondrial proteins estimated by maximum likelihood from a data set of 20
mammals. The mtREV24 model of the MOLPHY package (Adachi and
Hasegawa 1996b) is also provided (the file mtREV24.dat). These two are
similar, and the difference is that the former is derived from proteins from
mammals only while the latter came from more-diverse species including
chicken, fish, frog, and lamprey. Due to differences in the implementation, you
may see small differences in log-likelihood values and branch lengths between
aaml and protml in the MOLPHY package. Such differences are normal and you should use the same program to compare different trees. Under the
mtREV24 model, the two programs should give almost identical results.
If you want to specify your own substitution rate matrix, have a look at one of
those files, which has notes about the file structure. Other options for amino
acid substitution models should be ignored. To summarize, the variables
model, aaDist, CodonFreq, NSsites, and icode are used for codon
sequences (seqtype = 1), while model, alpha, and aaRatefile are
used for amino acid sequences.
runmode also works in the same way as in baseml.ctl. Specifying runmode = ?2 will forces the program to calculate the ML distances in pairwise comparisons.
You can change the following variables in the control file codeml.ctl:
aaRatefile, model, and alpha.
If you do pairwise ML comparison (runmode = -2) and the data contain
ambiguity characters or alignment gaps, the program will remove all sites which
have such characters from all sequences before the pairwise comparison if
cleandata = 1. This is known as "complete deletion". It will remove
alignment gaps and ambiguity characters in each pairwise comparsion
("pairwise" deletion) if cleandata = 0. {{This does not seem to be true.
The program currently removes all sites with any ambiguities if runmode = -2.
Need checking. Note by Ziheng 31/08/04.}} Note that in a likelihood analysis
of multiple sequences on a phylogeny, alignment gaps are treated as ambiguity
characters if cleandata = 0, and both alignment gaps and ambiguity
characters are deleted if cleandata = 1. Note that removing alignment gaps
and treating them as ambiguity characters both underestimate sequence
divergences. Ambiguity characters in the data (cleandata = 0) make the
likelihood calculation slower.
Output for amino acid sequences (seqtype = 2): The output file is self-explanatory and very similar to the result files for the nucleotide- and codon-based analyses. The