Problem
Say that we have taxa represented by strings with a multiple alignment inducing corresponding augmented strings .
Recall that the number of single-symbol substitutions required to transform one string into another is the Hamming distance between the strings (see “Counting Point Mutations”). Say that we have a rooted binary tree containing at its leaves and additional strings at its internal nodes, including the root (the number of internal nodes is by extension of “Counting Phylogenetic Ancestors”). Define as the sum of over all edges in :
Thus, our aim is to minimize .
Given: A rooted binary tree on () species, given in Newick format, followed by a multiple alignment of () augmented DNA strings having the same length (at most 300 bp) corresponding to the species and given in FASTA format.
Return: The minimum possible value of , followed by a collection of DNA strings to be assigned to the internal nodes of that will minimize (multiple solutions will exist, but you need only output one).
说我们有 字符串表示的分类单元 多重比对产生相应的增强弦 。
回想一下,将一个字符串转换为另一个字符串所需的单符号替换次数是字符串之间的汉明距离(请参见“计数点突变”)。假设我们有一棵有根的二叉树 包含 在它的叶子和其他弦上 在其内部节点(包括根)(内部节点的数量为 通过扩展“系统发育祖先的计数”)。定义 作为的总和 遍及所有边缘 在 :
因此,我们的目标是最大程度地减少 。
给定:根的二叉树 上 ()物种,以Newick格式给出,然后是 ()扩增的DNA串,具有与该物种相对应的相同长度(至多300 bp),并以FASTA格式给出。
返回:的最小可能值,然后是要分配给的内部节点的DNA字符串的集合 这将最小化 (将存在多个解决方案,但您只需要输出一个)
Sample Dataset
(((ostrich,cat)rat,(duck,fly)mouse)dog,(elephant,pikachu)hamster)robot; >ostrich AC >cat CA >duck T- >fly GC >elephant -T >pikachu AA
Sample Output
8 >rat AC >mouse TC >dog AC >hamster AT >robot AC