Rosalind第10题：Consensus and Profile

最新推荐文章于 2022-05-22 11:32:20 发布

automan_huyaoge

最新推荐文章于 2022-05-22 11:32:20 发布

阅读量346

点赞数 2

分类专栏： python 控制科学与工程

原文链接：http://rosalind.info/problems/cons/

版权

python 同时被 2 个专栏收录

211 篇文章 2 订阅

订阅专栏

控制科学与工程

179 篇文章 19 订阅

订阅专栏

Problem

A matrix is a rectangular table of values divided into rows and columns. An matrix has rows and columns. Given a matrix , we write to indicate the value found at the intersection of row and column .

Say that we have a collection of DNA strings, all having the same length . Their profile matrix is a matrix in which represents the number of times that 'A' occurs in the th position of one of the strings, represents the number of times that C occurs in the th position, and so on (see below).

A consensus string is a string of length formed from our collection by taking the most common symbol at each position; the th symbol of therefore corresponds to the symbol having the maximum value in the -th column of the profile matrix. Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.

	A T C C A G C T
	G G G C A A C T
	A T G G A T C T
DNA Strings	A A G C A A C C
	T T G G A A C T
	A T G C C A T T
	A T G G C A C T

	A 5 1 0 0 5 5 0 0
Profile	C 0 0 1 4 2 0 6 1
	G 1 1 6 3 0 1 0 0
	T 1 5 0 0 0 1 1 6

Consensus	A T G C A A C T

Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.

Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

Sample Dataset

>Rosalind_1
ATCCAGCT
>Rosalind_2
GGGCAACT
>Rosalind_3
ATGGATCT
>Rosalind_4
AAGCAACC
>Rosalind_5
TTGGAACT
>Rosalind_6
ATGCCATT
>Rosalind_7
ATGGCACT

Sample Output

ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6

python解决方案

#%%

import numpy as np
import os
from collections import Counter
fhand = open('./10.txt')

t = []
for line in fhand:
    if line.startswith('>'):
        continue
    else:
        line = line.rstrip()
    line_list = list(line)
    t.append(line_list)
a = np.array(t)#创建一个二维数组
print(a)
L1,L2,L3,L4 = [], [], [], []
comsquence=''
for i in range(a.shape[1]):
    l = [x[i] for x in a] #调出二维数组的每一列
    L1.append(l.count('A'))
    L2.append(l.count('C'))
    L3.append(l.count('T'))
    L4.append(l.count('G'))
    comsquence=comsquence+Counter(l).most_common()[0][0]
print (comsquence)
print ('A:',L1,'\n','C:',L2,'\n','T:',L3,'\n','G:',L4)

automan_huyaoge

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Rosalind第10题：Consensus and Profile

ProblemAmatrixis a rectangular table of values divided into rows and columns. Anmatrix hasrows andcolumns. Given a matrix, we writeto indicate the value found at the intersection of rowand column.Say that we have a collection ofDNA strin...
复制链接

扫一扫

专栏目录