Rosalind第57题：Creating a Character Table

最新推荐文章于 2021-01-18 11:13:46 发布

automan_huyaoge

最新推荐文章于 2021-01-18 11:13:46 发布

阅读量142

点赞数

分类专栏：控制科学与工程 python

原文链接：http://rosalind.info/problems/ctbl/

版权

python 同时被 2 个专栏收录

211 篇文章 2 订阅

订阅专栏

控制科学与工程

179 篇文章 19 订阅

订阅专栏

Problem

Given a collection of taxa, any subset of these taxa can be seen as encoding a character that divides the taxa into the sets and ; we can represent the character by , which is called a split. Alternately, the character can be represented by a character array of length for which if the th taxon belongs to and if the th taxon belongs to (recall the "ON"/"OFF" analogy from “Counting Subsets”).

At the same time, observe that the removal of an edge from an unrooted binary tree produces two separate trees, each one containing a subset of the original taxa. So each edge may also be encoded by a split .

A trivial character isolates a single taxon into a group of its own. The corresponding split must be such that or contains only one element; the edge encoded by this split must be incident to a leaf of the unrooted binary tree, and the array for the character contains exactly one 0 or exactly one 1. Trivial characters are of no phylogenetic interest because they fail to provide us with information regarding the relationships of taxa to each other. All other characters are called nontrivial characters (and the associated splits are called nontrivial splits).

A character table is a matrix in which each row represents the array notation for a nontrivial character. That is, entry denotes the "ON"/"OFF" position of the th character with respect to the th taxon.

Given: An unrooted binary tree in Newick format for at most 200 species taxa.

Return: A character table having the same splits as the edge splits of . The columns of the character table should encode the taxa ordered lexicographically; the rows of the character table may be given in any order. Also, for any given character, the particular subset of taxa to which 1s are assigned is arbitrary.

给定一个集合分类单元，任何子集这些分类单元中的一个可以看作是编码一个将分类单元划分为集合的字符和 ; 我们可以用，称为分割。或者，字符可以由字符数组表示长度为此如果第类群属于和如果第类群属于（从“计数子集”中调用“ ON” /“ OFF”类比）。

同时，观察到除去一个的边缘从一个无根二叉树产生两个单独的树，每一个包含原始类群的子集。所以每个边缘也可以通过分割来编码。

一个平凡的人物分离的单个分类单元为一组自身。对应拆分必须是这样的要么只包含一个元素；此拆分编码的边必须入射到无根二叉树的叶子上，并且该字符的数组恰好包含一个0或恰好包含一个1。琐碎的字符没有系统发生意义，因为它们无法向我们提供有关类别之间的关系。所有其他字符称为非平凡字符（关联的拆分称为非平凡拆分）。

甲字符表是一个矩阵其中每一行代表一个非平凡字符的数组符号。即进入表示...的“ ON” /“ OFF”位置关于 th分类单元。

给定：无根的二叉树以Newick格式最多可分类200种。

返回值：一个字符表，其字符分割与的边缘分割相同。字符表的列应按字典顺序对分类单元进行编码；字符表的行可以以任何顺序给出。同样，对于任何给定字符，分配了1的分类单元的特定子集都是任意的。

Sample Dataset

(dog,((elephant,mouse),robot),cat);

Sample Output

00110
00111

automan_huyaoge

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Rosalind第57题：Creating a Character Table

ProblemGiven a collection oftaxa, anysubsetof these taxa can be seen as encoding a character that divides the taxa into the setsand; we can represent the character by, which is called asplit. Alternately, the character can be represented by ac...
复制链接

扫一扫

专栏目录