Rosalind第57题:Creating a Character Table

Problem

Given a collection of  taxa, any subset  of these taxa can be seen as encoding a character that divides the taxa into the sets  and ; we can represent the character by , which is called a split. Alternately, the character can be represented by a character array  of length  for which  if the th taxon belongs to  and  if the th taxon belongs to  (recall the "ON"/"OFF" analogy from “Counting Subsets”).

At the same time, observe that the removal of an edge from an unrooted binary tree produces two separate trees, each one containing a subset of the original taxa. So each edge may also be encoded by a split .

trivial character isolates a single taxon into a group of its own. The corresponding split  must be such that  or  contains only one element; the edge encoded by this split must be incident to a leaf of the unrooted binary tree, and the array for the character contains exactly one 0 or exactly one 1. Trivial characters are of no phylogenetic interest because they fail to provide us with information regarding the relationships of taxa to each other. All other characters are called nontrivial characters (and the associated splits are called nontrivial splits).

character table is a matrix  in which each row represents the array notation for a nontrivial character. That is, entry  denotes the "ON"/"OFF" position of the th character with respect to the th taxon.

Given: An unrooted binary tree  in Newick format for at most 200 species taxa.

Return: A character table having the same splits as the edge splits of . The columns of the character table should encode the taxa ordered lexicographically; the rows of the character table may be given in any order. Also, for any given character, the particular subset of taxa to which 1s are assigned is arbitrary.

给定一个集合 分类单元,任何子  这些分类单元中的一个可以看作是编码一个将分类单元划分为集合的字符  和 ; 我们可以用,称为分割。或者,字符可以由字符数组表示  长度  为此  如果 第类群属于  和  如果 第类群属于 (从“计数子集”中调用“ ON” /“ OFF”类比)。

同时,观察到除去一个的边缘从一个无根二叉树产生两个单独的树,每一个包含原始类群的子集。所以每个边缘也可以通过分割来编码。

一个平凡的人物分离的单个分类单元为一组自身。对应拆分  必须是这样的  要么 只包含一个元素;此拆分编码的边必须入射到无根二叉树的叶子上,并且该字符的数组恰好包含一个0或恰好包含一个1。琐碎的字符没有系统发生意义,因为它们无法向我们提供有关类别之间的关系。所有其他字符称为非平凡字符 (关联的拆分称为非平凡拆分)。

字符表是一个矩阵其中每一行代表一个非平凡字符的数组符号。即进入 表示...的“ ON” /“ OFF”位置 关于 th分类单元。

给定:无根的二叉树以Newick格式最多可分类200种。

返回值:一个字符表,其字符分割与的边缘分割相同。字符表的列应按字典顺序对分类单元进行编码;字符表的行可以以任何顺序给出。同样,对于任何给定字符,分配了1的分类单元的特定子集都是任意的。

 

Sample Dataset

(dog,((elephant,mouse),robot),cat);

Sample Output

00110
00111
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值