Rosalind第16题——ros_bio16_MPRT

最新推荐文章于 2021-08-17 22:40:00 发布

他城她糖i

最新推荐文章于 2021-08-17 22:40:00 发布

阅读量94

点赞数

分类专栏： ROSALIND答案文章标签：生物信息学

本文链接：https://blog.csdn.net/qq_45380519/article/details/119281907

版权

ROSALIND答案专栏收录该内容

35 篇文章 7 订阅

订阅专栏

这段代码主要实现了从UniProt数据库中获取蛋白序列，并通过正则表达式`N[^P][ST][^P]`查找N-糖基化位点的功能。它遍历输入的蛋白ID，访问相应网址获取fasta格式的蛋白序列，然后匹配并打印出含有N-糖基化位点的蛋白ID及其位点索引。

摘要由CSDN通过智能技术生成

如果第一次阅读，请查看写在前面

import re
import requests

#通过compile设定N-糖基化模式
pattern = re.compile(r'N[^P][ST][^P]')

#读入蛋白名称
with open("../examples/ros_bio16_MPRT.txt") as f:
    id = f.read().rstrip()
id = id.split("\n")

#通过reques遍历访问各蛋白网站并读取蛋白序列
for i in id:
    url = 'http://www.uniprot.org/uniprot/' + i + '.fasta'
    r = requests.get(url)
    all = r.text.split("\n")
    protein = ''
    for line in all:
        lable = re.match(r'^>.*', line)
        if lable:
            continue
        else:
            protein += line
    #通过finditer方法获取匹配位置索引
    m = pattern.finditer(protein)
    index = ''
    for match in m:
        index = index + str(match.start() + 1) + ' '
    if index == '':
        continue
    else:
        print(i)
        print(index)