An Introduction to Bioinformatics Algorithms - II - page79 -114

Finally step into the world of Algorithm...微笑

07/14 

Restriction Mapping problem was coming up with to give an example to illustrate exhaustive search. Restriction Mapping: in order to get the restriction enzyme sites along the sequence, biologist use the enzyme to partial digest the sequence, and then get several short sequences. Through the length of those short sequences, they infer the position of restriction sites. (Partial Digest Problem, PDP; also called Turnpike problem). It should be noted that the restriction map get from the length information is not unique. 

Impractical Restriction Mapping Algorithm: 1. BruteForce PDP, given the short sequences (L) which returns the set of X of n integers, take the largest sequence M as the largest factor. scan all the arrays with n factors within M to see of ∆X can be L. This solution has a big O notation of O(M **n-2) time; 2. a wiser solution is not scan every integer between 0 and M, but only choose those integer form L, which would have a time of O(n **2n-4).

Practical Restriction Mapping Algorithm (develop in 1990): For every step, choose the largest number left, put it into the right position between 0 and M (check if minus results match L), delete the gotten sequence from L, and step by step, fit the every number into every position. However, if both "right" and "left" alternative hold and it continues to happen in future steps. It would become exponential. and finally, the polynomial algorithm was designed recently. 


07/15-16

1. Describe the Problem:

Motif finding problem: motif is assumed to appear most frequently in DNA sequence, therefore, the problem is : given the length of motif, find the most frequently appeared sequence with the length within a long DNA sequence. To simplify the question, given several DNA sequences, we need to find the starting positions s corresponding to the most conserved profile. When we use Score(s, DNA) to represent consensus score, the motif finding problem can be shown as given a set of DNA sequences, find a set of l-mers, one from each sequence, that maximizes the consensus score. 

Another view into this problem is to find a median string. Since we can use Hamming Distance to describe the difference between two strings. The motif finding problem can also be viewed as finding the minimum total Hamming Distance between string v and any set of starting positions in the DNA. Notice that this is a double minimization: we are finding a string v that minimizes TotalDistance(v, DNA), which is in turn the smallest distance among all choices of staring positions points in the DNA sequences. 

2. Basic Algorithm:

In both Motif Find Problem and Median String Problem, we need to sift through a large number of strings. How to consider them one by one, NEXTLeaf  algorithm give us an answer;

To scan the entire tree, we can use "NEXT VERTEX" which can be used in branch-and-bound approach. 

3. If we use Motif Finding method, we can use brute force approach ( O(l*(n**t)) ), as well as branch-and-bound approach (which spend less time).

If we solve finding median string problem, we can also use both of brute force approach and brand-and-bound approach ( O((4**l)*nt) ), which is more favorable than Motif Finding method. 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值