Problem
For a rooted tree whose internal nodes are labeled with genetic strings, our goal is to identify reversing substitutions in . Assuming that all the strings of have the same length, a reversing substitution is defined formally as two parent-child string pairs and along with a position index , where:
- there is a path in from down to ;
- ; and
- if is on the path connecting to , then .
In other words, the third condition demands that a reversing substitution must be contiguous: no other substitutions can appear between the initial and reversing substitution.
Given: A rooted binary tree with labeled nodes in Newick format, followed by a collection of at most 100 DNA strings in FASTA format whose labels correspond to the labels of . We will assume that the DNA strings have the same length, which does not exceed 400 bp).
Return: A list of all reversing substitutions in (in any order), with each substitution encoded by the following three items:
- the name of the species in which the symbol is first changed, followed by the name of the species in which it changes back to its original state
- the position in the string at which the reversing substitution occurs; and
- the reversing substitution in the form original_symbol->substituted_symbol->reverted_symbol.
对于有根的树 其内部节点标有遗传字符串,我们的目标是识别。假设所有的字符串具有相同的长度,反转置换被正式定义为两个父-子串对 和 连同位置指数 ,其中:
- 有一条路 从 向下 ;
- ; 和
- 如果 在连接的路径上 至 , 然后 。
换句话说,第三个条件要求反向替换必须是连续的:在初始替换和反向替换之间不能出现其他替换。
给定:一棵有根的二叉树 与标记的节点Newick格式,接着至多100的集合DNA串在FASTA格式,其标签对应的标签。我们将假设DNA串的长度相同,不超过400 bp。
返回值:中所有反向替换的列表 (以任何顺序),每次替换均由以下三个项目编码:
- 首次更改符号的物种名称,然后是其变回其原始状态的物种名称
- 字符串中发生反向替换的位置;和
- 反向替换形式为original_symbol->取代的_symbol-> reverted_symbol。
Sample Dataset
(((ostrich,cat)rat,mouse)dog,elephant)robot;
>robot
AATTG
>dog
GGGCA
>mouse
AAGAC
>rat
GTTGT
>cat
GAGGC
>ostrich
GTGTC
>elephant
AATTC
Sample Output
dog mouse 1 A->G->A
dog mouse 2 A->G->A
rat ostrich 3 G->T->G
rat cat 3 G->T->G
dog rat 3 T->G->T