笔记 Bioinformatics Algorithms Chapter2

Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS 

寻找模序 

一、

转录因子会结合基因上游的特定序列,调控基因的转录表达,但是在不同个体中,这个序列会有一些差别。本章讲述用贪婪、随机算法来寻找这个序列:寻找模序。

(NF-κB binding sites from the Drosophila melanogaster genome)

二、一些概念:

1. Score、Profile 的含义如图

根据profile matrix 可以计算出某个kmer在某一profile下的概率

三、

提出问题:Motif Finding Problem:

Given a collection of strings, find a set of k-mers, one from each string, that minimizes the score of the resulting motif.

Input: A collection of strings Dna and an integer k.

Output: A collection Motifs of k-mers, one from each string in Dna, minimizing Score(Motifs) among all possible choices of k-mers.

一组序列中,寻找一组k-mer,它们的Score是最低的(或者与consensus sequence的海明距离之和最小)

 

1 遍历

MedianString(Dna, k)
        distance ← ∞
        for each k-mer Pattern from AA…AA to TT…TT
            if distance > d(Pattern, Dna)
                 distance ← d(Pattern, Dna)
                 Median ← Pattern
        return Median

 

2 贪婪法 GreedyMotifSearch

GREEDYMOTIFSEARCH(Dna, k, t)
        BestMotifs ← motif matrix formed by first k-mers in each string
                      from Dna
        for each k-mer Motif in the first string from Dna
            Motif1 ← Motif
            for i = 2 to t
                form Profile from motifs Motif1, …, Motifi - 1
                Motifi ← Profile-most probable k-mer in the i-th string
                          in Dna
            Motifs ← (Motif1, …, Motift)
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        output BestMotifs

详解 http://www.mrgraeme.co.uk/greedy-motif-search/

 

*贪婪法 GreedyMotifSearch with pseudocounts

pseudocounts:在形成profile matrix时,给0项设为一个较小的值

GreedyMotifSearch(Dna, k, t)
        form a set of k-mers BestMotifs by selecting 1st k-mers in each string from Dna
        for each k-mer Motif in the first string from Dna
            Motif1 ← Motif
            for i = 2 to t
                apply Laplace's Rule of Succession to form Profile from motifs   Motif1, …, Motifi-1
                Motifi ← Profile-most probable k-mer in the i-th string in Dna
            Motifs ← (Motif1, …, Motift)
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        output BestMotifs

 

3. 随机法Randomized Motif Search

RandomizedMotifSearch(Dna, k, t)
     #随机从每个DNA取k-mer,生成一组motifs randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna BestMotifs ← Motifs while forever Profile ← Profile(Motifs)#根据motifs形成Profile矩阵 Motifs ← Motifs(Profile, Dna) #根据profile矩阵从一组DNA生成一组几率最大的motifs if Score(Motifs) < Score(BestMotifs) BestMotifs ← Motifs else return BestMotifs

随机算法起到作用的原因是,随机选取的一组Motifs,有可能选到潜在正确的一个k-mer,那么就在这中形成倾斜,直至寻找到较优解

改进: 上一个算法,每次迭代都重新随机生成一组新的Motifs,这可能把潜在的正确模序抛弃了,改进的方法是每次随机只更改一行k-mer

GibbsSampler(Dna, k, t, N)
        randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna
        BestMotifs ← Motifs
        for j ← 1 to N
            i ← Random(t)
            Profile ← profile matrix constructed from all strings in Motifs except for Motif[i]
            Motif[i] ← Profile-randomly generated k-mer in the i-th sequence
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        return BestMotifs

 

转载于:https://www.cnblogs.com/lokwongho/p/9682667.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
BIOINFORMATICS ALGORITHMS,VOL.I By 作者: Phillip Compeau – Pavel Pevzner ISBN-10 书号: 0990374610 ISBN-13 书号: 9780990374619 Edition 版本: 2nd 出版日期: 2015 pages 页数: 384 Overview Contents List of Code Challenges About the Textbook Chapter 1:Where in the Genome Does DNA Replication Begin? Chapter 2:Which DNA Patterns Play the Role of Molecular Clocks? Chapter 3:How Do We Assemble Genomes? Chapter 4:How Do We Sequence Antibiotics? Chapter 5:How Do We Compare DNA Sequences? Chapter 6:Are There Fragile Regions in the Human Genome? Bibliography mage Courtesies This is Vol. 1 of Bioinformatics Algorithms: an Active Learning Approach, one of the first textbooks to emerge from the recent Massive Open Online Course MOOC revolution. A light hearted and analogy filled companion to the author’s acclaimed Bioinformatics Specialization on Coursera, this book presents students with a dynamic approach to learning bioinformatics. It strikes a unique balance between practical challenges in modern biology and fundamental algorithmic ideas, thus capturing the interest of students of both biology and computer science. Each chapter begins with a biological question, such as Are There Fragile Regions in the Human Genome or Which DNA Patterns Play the Role of Molecular Clocks and then steadily develops the algorithmic sophistication required to answer this question. Hundreds of exercises are incorporated directly into the text as soon as they are needed; readers can test their knowledge through automated coding challenges on the Rosalind Bioinformatics Textbook Track.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值