利用Swiss-model API进行蛋白序列提交及蛋白结构建模

@ZyuanZhang

已于 2024-07-24 16:49:38 修改

阅读量3.3k

点赞数 3

分类专栏：生信文章标签：生信 swiss-model建模 swiss-model api python

于 2023-02-16 11:37:01 首次发布

本文链接：https://blog.csdn.net/weixin_44065416/article/details/129056843

版权

生信专栏收录该内容

10 篇文章

订阅专栏

文章介绍了如何利用Swiss-Model的API进行蛋白质结构预测，提供了一个Python脚本来提交序列、检查状态和下载结果。主要关注点包括序列提交、状态检查、结果下载以及评估模型质量的指标如GMQE和QMEANDisCo。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

0. Swiss-model简介：

Swiss-Model是利用同源建模的方法对蛋白质结构进行在线预测的网站。
网站：https://swissmodel.expasy.org/
注：官方有给出Swiss-model API，实现以编程的形式提交建模项目（详情见：https://swissmodel.expasy.org/docs/help#modelling_api），所以本文章基于 Swiss-model API进行蛋白序列提交、结构预测和结果下载。

1. Swiss-model API的Python脚本：

1.1 提交单条序列：

""" 利用 SWISS-MODEL API，将本地的病毒序列提交到 SWISS-MODEL 网站上进行结构预测。 """
# 参考：https://swissmodel.expasy.org/docs/help#modelling_api

import requests
import time
import os
from Bio import SeqIO

    
def swiss_model(token, target_seq, seq_id, outpath):
    # 创建建模项目
    response = requests.post(
        "https://swissmodel.expasy.org/automodel",
        headers={ "Authorization": f"Token {token}" },
        json={ 
            "target_sequences": target_seq,
            "project_title": seq_id
        },
        timeout=10)
    
    # 查看运行状态并返回结果下载链接
    project_id = response.json()["project_id"] ## 获取 project id
    url_list = []
    score_list = []
    while True:
        ## 每隔10s不断地查看运行状态，完成或失败都会终止并输出信息
        time.sleep(10)
        response = requests.get(
            f"https://swissmodel.expasy.org/project/{ project_id }/models/summary/", 
            headers={ "Authorization": f"Token {token}" })
        response_object = response.json()
        status = response_object["status"]
        
        if status == "COMPLETED":
            for model in response_object['models']:
                ## 举例： model 的内容是 {'model_id': '01', 'status': 'COMPLETED', 'gmqe': 0.05, 'qmean_global': {'avg_local_score': 0.71}, 'coordinates_url': 'https://swissmodel.expasy.org/project/WbWWJA/models/01.pdb.gz'}
                url_list.append(model['coordinates_url'])
                score_list.append(model["qmean_global"]["avg_local_score"]) ## 记录平均局部最高分
            break
        elif status == "FAILED":
            break
    
    # 下载最高分（平均局部最高分）对应的结构
    max_score = max(score_list)
    max_struct = url_list[score_list.index(max_score)] ## 下载局部最高分对应的结构
    with open(outpath+"model_score.log", "a") as outF: ## 将每个蛋白的局部最高分输出
        outF.write("SeqID: %s, Score: %s\n" % (seq_id, max_score))
    cmd_download = "wget "+max_struct+" -O "+outpath+seq_id+".pdb.gz" ## 根据url利用wget进行下载并重命名为 序列ID.pdb.gz
    os.system(cmd_download)
    cmd_rmwget = "rm wget-log" ## 删掉wget下载时重定向生成的 wget-log文件
    os.system(cmd_rmwget)


def swiss_model_single_seq(token,inf_fa, outpath): ## 单个序列的提交，token（swiss-model提供的令牌）inf_fa（输入文件，fasta格式）；outpath（输入文件路径）
    for record in SeqIO.parse(inf_fa, "fasta"):
        target_seq = str(record.seq)
        seq_id = record.id
    swiss_model(token=token, target_seq=target_seq,seq_id=seq_id,outpath=outpath)

提交单条序列的用法：

token = "" ## 登入swiss-model会在自己的账户那里生成 token，复制过来即可。
inf_fa = "test.fasta" ## fasta格式的输入文件
outpath = "./" ## 结果输出到当前目录（注意最后要有 /）
swiss_model_single_seq(token, inf_fa, outpath)

1.2 提交多条序列：

def swiss_model_multi_seqs(token, inf_fa_path, outpath): ## inf_fa_path（输入的fasta文件所在的目录）；outpath（结果输出的位置）
    print("开始: %s" % time.asctime(time.localtime(time.time())), flush=True)
    num = 0
    for fa in [inf_fa_path+f for f in os.listdir(inf_fa_path) if f.endswith(".fasta")]:
        try:
            swiss_model_single_seq(token=token, inf_fa=fa, outpath=outpath)
            num += 1
            print("第 %s 个蛋白 %s 的结构预测完成" % (num, fa.split("/")[-1].split(".")[0]), flush=True)
        except Exception as e:
            num += 1
            print("第 %s 个蛋白 %s 的结构预测出错，错误为 %s: " % (num, fa.split("/")[-1].split(".")[0], e), flush=True)

    print("结束: %s" % time.asctime(time.localtime(time.time())), flush=True)

提交多条序列的用法：

token = "" ## 令牌
inf_fa_path = "infasta/" ## 输入的fasta文件所在的位置
outpath = "results/" ## 输出结果的位置（注意最后要有 /）

swiss_model_multi_seqs(token, inf_fa_path, outpath)

提交多条序列的结果举例：

包括：.pdb.gz文件和 model_score.log文件，比如：
在这里插入图片描述

更新：

1. 关于建模结果的质量的评估标准：

上面脚本用到的是qmean_global中的avg_local_score，实际上，swiss-model帮助文档（https://swissmodel.expasy.org/docs/help）中有给出一些评价指标，当时（2023年3月份）主要有三个：GMQE, QMEANDisCo global score 和 QMEAN Z-score analysis，这三个指标的大概的含义如下所述：

指标	含义
`GMQE`	整合target-template alignment和template structure特性评估建模质量。范围0-1之间，越大越好，但是GMQE依赖于覆盖率（当模型只覆盖一半target sequence时GMQE不会高于0.5）
`QMEANDisCo global score`	每个残基的QMEANDisCo score的平均值，是对QMEAN的拓展，不完全依赖于覆盖率。范围0-1之间，越大越好。
`QMEAN Z-score analysis`	已经被弃用，建议使用GMQE和QMEANDisCo来评估模型的全局质量。

PS. 实际上，QMEANDisCo就是avg_local_score，这点可以从网页源码进行验证（如下图所示）。

在这里插入图片描述