TOJ-1346 Phylogenetic Trees Inherited

Among other things, Computational Molecular Biology deals with processing genetic sequences. Considering the evolutionary relationship of two sequences, we can say that they are closely related if they do not differ very much. We might represent the relationship by a tree, putting sequences from ancestors above sequences from their descendants. Such trees are called phylogenetic trees.
Whereas one task of phylogenetics is to infer a tree from given sequences, we'll simplify things a bit and provide a tree structure - this will be a complete binary tree. You'll be given the n leaves of the tree. Sure you know, n is always a power of 2. Each leaf is a sequence of amino acids (designated by the one-character-codes you can see in the figure). All sequences will be of equal length l. Your task is to derive the sequence of a common ancestor with minimal costs.

 

Amino Acid   
Alanine AlaA
Arginine ArgR
Asparagine AsnN
Aspartic Acid AspD
Cysteine CysC
Glutamine GlnQ
Glutamic Acid GluE
Glycine GlyG
Histidine HisH
Isoleucine IleI
 
Amino Acid   
Leucine LeuL
Lysine LysK
Methionine MetM
Phenylalanine PheF
Proline ProP
Serine SerS
Threonine ThrT
Tryptophan TrpW
Tyrosine TyrY
Valine ValV

The costs are determined as follows: every inner node of the tree is marked with a sequence of length l, the cost of an edge of the tree is the number of positions at which the two sequences at the ends of the edge differ, the total cost is the sum of the costs at all edges. The sequence of a common ancestor of all sequences is then found at the root of the tree. An optimal common ancestor is a common ancestor with minimal total costs.

Input Specification

The input file contains several test cases. Each test case starts with two integers n and l, denoting the number of sequences at the leaves and their length, respectively. Input is terminated by n=l=0. Otherwise, 1≤n≤1024 and 1≤l≤1000. Then follow n words of length l over the amino acid alphabet. They represent the leaves of a complete binary tree, from left to right.

Output Specification

For each test case, output a line containing some optimal common ancestor and the minimal total costs.

Sample Input

 

4 3
AAG
AAA
GGA
AGA

4 3
AAG
AGA
AAA
GGA

4 3
AAG
GGA
AAA
AGA

4 1
A
R
A
R

2 1
W
W

2 1
W
Y

1 1
Q

0 0

Sample Output

 

AGA 3
AGA 4
AGA 4
R 2
W 0
Y 1
Q 0

Note: Special judge problem, you may get "Wrong Answer" when output in wrong format.



Source: University of Ulm Local Contest 2000

给出完全二叉树的所有叶节点;每对兄弟节点的父节点可与二者之一相同;求根节点与叶子节点相应位置字符不同的数目之和。

 

这道题自然可以的dp来解,但有一个很巧妙的方法。

A-Z共26个字符,而一个int型数据抛去符号位是31位,我们可以用每一位来代表一个字符,某位上是1或0表示对应字符是否可用。

这种方法即状态压缩,将可选的情况压缩在一个数据中,从而减少了多维数组带来的空间开销,而且这种方法有时在运算上的特性能减少时间开销。

以下代码摘自:http://www.aiuxian.com/article/p-1651570.html

 

#include<cstdio>
#include<iostream>
#include<vector>
#include<cmath>
using namespace std;


int n, l;
int cost;

int Hash(char s){
    return 1<<(s-'A');
}

vector<int> dfs(int dep){
    vector<int> ret, tem1, tem2;
    ret.clear();
    if(dep == 1){
        string s;
        cin >> s;
        for(int i = 0;i < l;i++){
            ret.push_back(Hash(s[i]));
        }
        return ret;
    }
    tem1 = dfs(dep/2);
    tem2 = dfs(dep/2);
    for(int i = 0;i < l;i++){
        int choose = tem1[i]&tem2[i];//判断两个字符是否相等 
        if(choose==0){
            cost++;
            ret.push_back(tem1[i]|tem2[i]);//可选的字符 
        }
        else{
            ret.push_back(choose);
        }
    }
    return ret;
}

int main(){
    while(cin >> n >> l){
        if(n == 0 && l == 0)
            break;
        cost = 0;
        vector<int> ans = dfs(n);
        for(int i = 0;i < l;i++){
            for(int j = 0;j < 30;j++){
                if(ans[i]&1){
                    cout << (char)('A'+j);
                    break;
                }
                ans[i] /= 2;
            }
        }
        cout << ' ' << cost << endl;
    }
    return 0;
}

 

对于给定的一对兄弟节点,其与父节点的相应位置字符不同数之和是一定的,因为无论选哪个做父节点,其与另一节点的差异是一定的。

因此对一对兄弟节点,我们分字符比较,即代码中&运算。若&运算结果非0,即有相同的可选字符,父节点该位置就为该压缩状态;若&运算结果为0,

即没有任何相同的可选字符,边权增1,用 | 运算将所有可选字符压缩在一个int中表示。

为了更明显,我们以样例2为例:

叶子节点(用7位表示,省略其他位):

  AAG 0000001 0000001 1000000  

  AGA 0000001 1000000 0000001  

  AAA 0000001 0000001 0000001  

  GGA 1000000 1000000 0000001

上一层节点:

  A?? 0000001 1000001 1000001 cost:2

  ??A 1000001 1000001 0000001 cost:2

根节点:

  ??? 1000001 1000001 1000001 cost:0

总cost即为4。

转载于:https://www.cnblogs.com/shenchuguimo/p/6383488.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值