Rosalind第14题：Finding a Shared Motif

最新推荐文章于 2023-06-06 15:37:18 发布

automan_huyaoge

最新推荐文章于 2023-06-06 15:37:18 发布

阅读量234

点赞数

分类专栏：控制科学与工程 python

原文链接：http://rosalind.info/problems/lcsm/

版权

python 同时被 2 个专栏收录

211 篇文章 2 订阅

订阅专栏

控制科学与工程

179 篇文章 19 订阅

订阅专栏

Problem

A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of () DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

一个常见的子字符串的集合是一个子集的每一个成员。我们说，如果不存在更长的公共子字符串，则公共子字符串是最长的公共子字符串。例如，“ CG”是“ A CG TACGT”和“ AAC CG TATA”的通用子字符串，但是它的长度不能太长；在这种情况下，“ CGTA”是“ A CGTA CGT”和“ AAC CGTA TA”的最长公共子串。

注意，最长的公共子字符串不一定是唯一的。举一个简单的例子，“ AA”和“ CC”都是“ AACC”和“ CCAA”的最长公共子串。

给定的：FASTA格式的（）DNA串的集合，每个串的长度最大为1 kbp。

返回：集合的最长公共子字符串。（如果存在多个解决方案，则可以返回任何单个解决方案。）

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

python解决方案

s = """>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA""".split(">")[1:]

for i in range(len(s)):
    s[i] = s[i].replace("\n", '')
    while s[i][0] not in "ACGT":
        s[i] = s[i][1:]
# ^^^^^^^^^^^^^ all of that to format in FAST in array
#Get shortest of DNA strings
index = s.index(min(s, key=len))

motif = ''
shortest = s[index]

#cycle over the DNA string letters
for i in range(len(shortest)):
    n = 0
    present = True
    while present:
            #cycle inside over all other DNA strings and if it's present in there considered a motif and length gets increased by 1
        for each in s:
            if shortest[i:i+n] not in each or n>1000:
                present = False
                break
        if present:
            motif = max(shortest[i:i+n], motif, key=len)
        n += 1
print (motif)