Problem
A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".
Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".
Given: A collection of () DNA strings of length at most 1 kbp each in FASTA format.
Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)
一个常见的子字符串的集合是一个子集的每一个成员。我们说, 如果不存在更长的公共子字符串,则公共子字符串是最长的公共子字符串。例如,“ CG”是“ A CG TACGT”和“ AAC CG TATA”的通用子字符串,但是它的长度不能太长;在这种情况下,“ CGTA”是“ A CGTA CGT”和“ AAC CGTA TA”的最长公共子串。
注意,最长的公共子字符串不一定是唯一的。举一个简单的例子,“ AA”和“ CC”都是“ AACC”和“ CCAA”的最长公共子串。
给定的:FASTA格式的()DNA串的集合,每个 串的长度最大为1 kbp。
返回:集合的最长公共子字符串。(如果存在多个解决方案,则可以返回任何单个解决方案。)
Sample Dataset
>Rosalind_1 GATTACA >Rosalind_2 TAGACCA >Rosalind_3 ATACA
Sample Output
AC
python解决方案
s = """>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA""".split(">")[1:]
for i in range(len(s)):
s[i] = s[i].replace("\n", '')
while s[i][0] not in "ACGT":
s[i] = s[i][1:]
# ^^^^^^^^^^^^^ all of that to format in FAST in array
#Get shortest of DNA strings
index = s.index(min(s, key=len))
motif = ''
shortest = s[index]
#cycle over the DNA string letters
for i in range(len(shortest)):
n = 0
present = True
while present:
#cycle inside over all other DNA strings and if it's present in there considered a motif and length gets increased by 1
for each in s:
if shortest[i:i+n] not in each or n>1000:
present = False
break
if present:
motif = max(shortest[i:i+n], motif, key=len)
n += 1
print (motif)