proteinMPNN 主要思路解析

原文连接:proteinMPNN论文

一、方法背景

 background: The protein sequence design problem is finding, given a protein backbone structure of interest, an amino acid sequence that will fold to this structure.
蛋白质序列设计的问题是,给定一个感兴趣的蛋白质主干结构,找到一个可以折叠到这个结构的氨基酸序列。

  • Physically based approaches such as Rosetta treat sequenced ensign as an energy optimization problem, searching for the combination of amino acididentities and conformations that has the lowest energy for a given input structure.(Rosetta method)
    基于物理的方法,如Rosetta,将序列ensign视为一个能量优化问题,寻找给定输入结构具有最低能量的氨基酸酸化实体和构象的组合。

  • The amino acid sequence at different positions can be coupled between single or multiple chains, enabling application to a wide range of current protein design challenges.
    不同位置的氨基酸序列可以在单链或多链之间耦合,使其能够广泛应用于当前的蛋白质设计挑战。

  • Recently, deep-learning approaches have shown promise in rapidly generating candidate amino acid sequences given monomeric protein backbones without the need for compute-intensive explicit consideration of side chain rotameric states.However, the methods described thus far do not apply to the full range of current protein design challenges and have not been extensively validated experimentally.
    最近,深度学习方法在快速生成候选氨基酸序列方面显示出了希望,给出了单体蛋白质骨架,而不需要对侧链rotamerican状态进行计算密集型的显式考虑。然而,到目前为止所描述的方法并不适用于当前蛋白质设计的所有挑战,也没有经过广泛的实验验证

 二、对比实验

  • Tool: MPNN(message-passing neural network) with three encoder and three decoderlayers and 128 hidden dimensions that predicts protein sequences in an autoregressive manner from N to C.
    具有三个编码器和三个解码器层和128个隐藏维度的消息传递神经网络(MPNN),自回归

  • Input: distances between Cα-Cα atoms, relative Cα-Cα-Cα frame orientations and rotations, and backbone dihedral angles.
    使用蛋白质骨架特征作为输入,包括Cα-Cα原子之间的距离、Cα-Cα-Cα框架的方向和旋转以及骨架二面角

 三、主要实验

 

输入protein backbone features and distances(PDB files) between N, Ca,C,O,and a virtual Cb placed based on the other backbone atoms as additional input feature.
蛋白质主干特征和N, Ca,C,O和基于其他主干原子放置的虚拟Cb之间的距离(PDB文件)作为额外的输入特征。

  1. 第一阶段:predicts protein sequences in an Order-agnostic autoregressive manner.

    1. Backbone encoder:更新节点和边特征

    2. Sequence decoder:Order-agnostic decoding enables design in cases where, for example, the middle of the protein sequence is fixed and the rest needs to be designed, as in protein binder design where the target sequence is known; decoding skips the fixed regions but includes them in the sequence context for the remaining positions(to enable application to a broad range ofsingle- and multichain design problem)。
      在某些情况下,例如蛋白质序列的中间部分是固定的,需要设计其余部分,例如在目标序列已知的蛋白质结合剂设计中,顺序不可知的解码可以实现设计;解码跳过固定区域,但将其包含在剩余位置的序列上下文中(以使应用于广泛的单链和多链设计问题)

    3. fixed left-to-right decoding can not use sequence context (green) for preceding positions (yellow), whereas a model trained with random decoding orders can be used with an arbitrary decoding order during the inference. The decoding order can be chosen such that the fixed context is decoded first.
      固定从左到右解码的模型在解码时无法利用先前位置的序列上下文,而随机解码的模型则可以在推断过程中使用任意的解码顺序。解码顺序可以选择使固定上下文首先解码。解码模型在生成文本时是否能够灵活地利用前面的文本内容

  2. 第二阶段:Iterative decoding(多次迭代以逐步生成整个序列)

  3. 输出:protein sequence(蛋白质序列)

  • 15
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值