An Introduction to Bioinformatics Algorithms - III - Chapter5 Greedy Algorithm

Biological Problem:

Rearrangement (Flipping of genomic sequence) often happened in mammalian evolutionary history. For example, human X chromosome can be viewed as rearrangement scenario of mouse X chromosome. Biologists are interested in the most parsimonious evolutionary scenario, which involving the smallest number of reversals. 

To Describe the Problem: π = π1*π2*...* (πi*πi+1*.....*πj-1*πj) *πj+1*.....πn

π*∂(i, j) = π1*π2*...* (πj*πj-1*.....*πji1*πi) *πj+1*.....πn

Reversal Distance Problem: given permutations π and å, output a series of ∂ (t in total ) which can transform π into å, such that t is minimum.

When we set å as standard, like 1, 2, 3....n, then "Reversal Distance Problem" can be transformed into "Sorting by Reversal" Problem: given permutations π , output a series of ∂ (t in total ) which can transform π into identity permutation, such that t is minimum.

To Solve the Problem: (Greedy Algorithm)

SIMPLE REVERSAL SORT: (however, this methods is quite short sighted)

for i <- 1 to n-1

j <- position of element i in π (π j = i )

if i ne j,   π <- π * ∂(i, j)

if π is the identity permutation, return.

If we define prefix(π) to be the number of already-sorted elements of π, then the strategy is to increase prefix(π) at every step. 

-> Bofore this, computer scientists are faced with "Prefix Reversal Problem", also known as "Pancake Flipping Problem", which is similar to "Sorting by Reversal" problem, but for ∂(i, j), i is always 1. 

-> Approximation Algorithm: when optimal algorithm is still unknown, we often use an approximate algorithm to give an approximate solution. The approximation ratio is : A(π) / OPT(π), A(π) refers to the solution produced by algorithm A, OPT(π) refers to the optimal algorithm, and an approximation ratio of 1 is the acme of perfection. 

For minimization algorithm, the ratio is max(A(π)/OPT(π)); for maximization, the ratio is min(A(π)/OPT(π)). Since the ratio often gives the worst-case scenario. For example, Reversal Sorting Problem required the least sorting times. Therefore, the approximation ratio for the algorithm is max(A(π)/OPT(π)). 

The approximation ratio of SIMPLE REVERSAL SORT is at least (n-1)/2, like π = n123...(n-1), even though d(π) = 2.

BREAKPOINTS:

The problem of SIMPLE REVERSAL SORT is prefix(π) is a naive measure of our progress toward the identity permutation. So we have a new concept - breakpoint. 

If πi and πi+1are not consecutive numbers, then there is a breakpoint between πi and πi+1 .Strip is an interval between two breakpoints. Strip can be divided into increasing and decreasing. Here we introduce another way to solve the problem:

BREAKPOINT REVERSAL SOTR (π):

while b(π) > 0

among all revisals, choosing reversal ∂ that minimize b(π*∂);

π <- π*∂

output π

return

However, there are two problems concerned this method: 1. whether this algorithm could terminate?; 2. If it is a better approximation algorithm then SIMPLE REVERSAL SORT ? 

These two questions can be solved two theorems. 

Theorem 5.1: If a permutation π contins a decreasing strip, then there is a reversal ∂ that decreases the number of breakpoints in π, that is, b(π*∂) < b(π). so when decreasing the breakpoints step by step, the algorithm would finally terminate. But what if there is no decreasing strip, well, just flip one increasing strip to make it decreasing. 

Theorem 5.2: It is an approximation algorithm with a performance guarantee of at most 4. Suppose every step can only reduce one breakpoint and suppose before every step, we need to flip one increasing strip into decreasing one. Then the approximation ratio is at most 2b(π)/d(π). d(π) > b(π)/2, therefore, 2b(π)/d(π) <= 4. 

Plus, there is also a greedy algorithm to Motif Finding Problem. It first find the two close l-mere in sequence 1 and 2, and forms a 2*l seed matrix, which requires 

l*(n - l +1)**2 operations. Then for left sequences, it searches the l-mer that maximize Score(s, i) for every sequence and add the row to seed matrix. This step requires l * (n-l+1) operations in each iteration. Thus the running time of this algorithm is O(ln**2 + lnt), which is vastly better than O (ln**4) or O(nt*4**l).



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值