08 Translating RNA into Protein

最新推荐文章于 2022-11-17 23:14:48 发布

acoikw2620

最新推荐文章于 2022-11-17 23:14:48 发布

阅读量274

点赞数

原文链接：http://www.cnblogs.com/think-and-do/p/7272590.html

版权

Problem

The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

Given: An RNA string s corresponding to a strand of mRNA (of length at most 10 kbp).

Return: The protein string encoded by s.

Sample Dataset

AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA

Sample Output

MAMAPRTEINSTRING

方法一：

# -*- coding: utf-8 -*-
### 8. Translating RNA into Protein ###
import re
from collections import OrderedDict

codonTable = OrderedDict()
with open('rna_codon_table.txt') as f:
    for line in f:
        line = line.rstrip()
        lst = re.split('\s+', line)      #\s+ 匹配空格1次或无限次
        for i in [0, 2, 4, 6]:
            codonTable[lst[i]] = lst[i + 1]

rnaSeq = ''
with open('rosalind_prot.txt', 'rt') as f:
    for line in f:
        line = line.rstrip()
        rnaSeq += line.upper()

aminoAcids = []
i = 0
while i < len(rnaSeq):
    codon = rnaSeq[i:i + 3]
    if codonTable[codon] != 'Stop':
        aminoAcids.append(codonTable[codon])
    i += 3

peptide = ''.join(aminoAcids)

print (peptide)

方法二：

def translate_rna(sequence):
    codonTable = {
    'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
    'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
    'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
    'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
    'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
    'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
    'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
    'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
    'UAC':'Y', 'UAU':'Y', 'UAA':'', 'UAG':'',
    'UGC':'C', 'UGU':'C', 'UGA':'', 'UGG':'W',
    }
    proteinsequence = ''
    for n in range(0,len(sequence),3):
        if sequence[n:n+3] in codonTable.keys():
            proteinsequence += codonTable[sequence[n:n+3]]
    return proteinsequence
 
se = open('rosalind_prot.txt').read().strip('\n') #sequence

方法三：

from Bio.Seq import Seq
from Bio.Alphabet import generic_dna, generic_rna

# translation
messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", generic_rna)
messenger_rna.translate()

# reverse complement
my_dna = Seq("AGTACACTGGT", generic_dna)
my_dna.reverse_complement()

转载于:https://www.cnblogs.com/think-and-do/p/7272590.html

acoikw2620

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
08 Translating RNA into Protein

ProblemThe 20 commonly occurring amino acids are abbreviated by using 20 letters from the Englishalphabet(all letters except for B, J, O, U, X, and Z).Protein stringsare constructed from th...
复制链接

扫一扫