AlphaFold2代码模块简要分析

最新推荐文章于 2025-04-07 18:00:31 发布

DylanDing21

最新推荐文章于 2025-04-07 18:00:31 发布

阅读量3.6k

点赞数 4

分类专栏： alphafold2 文章标签： python 人工智能算法

本文链接：https://blog.csdn.net/qq_33500415/article/details/119658056

版权

alphafold2 专栏收录该内容

2 篇文章

订阅专栏

主要对AlphaFold2中Inference中涉及的代码模块捋一捋，说明论文中各个模块的代码主要在源码中的位置。

1.AlphaFold2 输入

首先是input这一块，简单介绍一下，具体的介绍可以看原论文的Supplementary Information的1.2.9 Featurization and model inputs
1.target feature 是输入的氨基酸序列
2.residue_index是残基的编号
3.msa feature是一些置信度较高的同源序列
4.extra msa feature是除msa feature以外的同源序列
5.template_angle_feature包括模板、模板的旋转角等信息，
6.template_pair_feature包括氨基酸序列的one-hot编码，β碳原子之间的距离等信息
论文中这里template找的bad template,是否可以在train的时候找bad template,inference的时候找good template？

2.AlphaFold2 各个模块

AlphaFold2 Model architecture. AlphaFold2 Model architecture主要分为三块：Input embedding、Evoformer和Structure module
modules.py中class AlphaFoldIteration实现一次model的执行，class AlphaFold实现了recycling。

2.1 Input embedding

Input feature embeddings.
Figure1.Input feature embeddings
residue_index是残基的编号，使用relpos计算后与经过处理的target feature更新pair represention
extra msa feature通过create_extra_msa_feature形成extra MSA representions

2.2 Evoformer Block

Evoformer block总体结构如下图所示：

modules.py中EvoformerIteration是对主要实现了一个Evoformer Block，EmbeddingsAndEvoformer则包含了input embedding和48个Evoformer Block。

输入是MSA representions和pair representions，在源代码的module.py中class EvoformerIteration构造了单个Evoformer blcok,其中包括多个方法。

2.2.1 MSA representation分支

2.2.1.1 row-wise gated self-attention with pair bias

在row-wise gated self-attention with pair bias之前使用dropout_wrapper函数进行dropout和residual 操作。

module.py中MSARowAttentionWithPairBias实现了 row-wise gated self-attention with pair bias，MSA row-wise gated self-attention with pair bias的结构如下图所示：

2.2.1.2 column-wise gated self-attention

module.py中MSAColumnAttention实现了 column-wise gated self-attention，MSA column-wise gated self-attention的结构如下图所示：

MSA column-wise gated self-attention.
注意点：当Evoformer在input embedding使用时，用的是MSAColumnGlobalAttention，

2.2.1.3 transition

MSA transition layer 由modules.py中class Transition实现。

2.2.1.4 Outer product mean

modules.py中OuterProductMeans实现了Outer product mean

2.2.2 pair representation分支

2.2.2.1 triangular multiplicative update using “outgoing” /“incoming” edges

Triangular multiplicative update using “outgoing” /“incoming” edges的结构如下图所示：

Triangular multiplicative update using “outgoing” edges.
modules.py中的class TriangleMultiplication实现了triangle update using “outgoing” /“incoming” edges ，“outgoing” 还是“incoming”根据输入参数调节。

2.2.2.2 triangle self-attention around starting/ending node

triangle self-attention around starting/ending node 的结构如下图所示：

Triangular self-attention around starting node
modules.py中的class TriangleAttention实现了triangle self-attention around starting/ending node，starting或者ending根据输入参数调节。

transition与MSA representation一致。

2.3 Structure Module

2.3.1 Invariant Point Attention Module

Structure Module中的代码在folding.py中，主要工作是输入之前的pair representation和MSA representation的第一条（同源序列最相似，且包含了其他同源序列的信息）以及backbone frames（使用 GramSchmidt方法利用PDB数据库中N、Cα 和 C的位置来构造backbone frames）,然后将backbone frames通过IPA来更新single representation，然后计算所有原子的坐标确定侧链，然后根据原子的坐标信息对原子重新命名，查看是否有冲突。最后通过迭代使得整个结构的力场最小化。使用single representation通过四元组和旋转向量对backbone进行更新。

Invariant Point Attention Module其结构如下图所示：

在class InvariantPointAttention中主要实现了Invariant Point Attention Module，compute_renamed_ground_truth进行原子重命名，compute_violation_metrics方法来确定构建的三维结构是否不合规，class MultiRigidSidechain来更新侧链