论文简读《A 3d generative model for structure-based drug design》

嘿嘿我跑了

已于 2024-04-08 21:40:47 修改

阅读量1.5k

点赞数 35

分类专栏：三维药物分子设计文章标签：人工智能

于 2024-04-06 17:52:42 首次发布

本文链接：https://blog.csdn.net/xulomh/article/details/137433017

版权

三维药物分子设计专栏收录该内容

4 篇文章 0 订阅

订阅专栏

A 3d generative model for structure-based drug design.

Autoregressive Model
Advances in Neural Information Processing Systems, 34, 2021.
在这里插入图片描述

流程：

$C (a, r)$ ---->r–Encoder、Classifier–> $p (e ∣ r, C)$ ---->e在r的概率
$C$ ：a set of atoms ${(a_i, r_i)}N^b_{i=1}$ , where Nb is the number of atoms in the binding site,
$a_i$ ：the i-th atom’s attributes such as chemical element, belonging amino acid, etc., and
$r_i$ ：the i-th atom’s 3D coordinate.
To generate atoms in the binding site, we consider modeling the probability of atom occurring at some position r in the site. Formally, this is to model the density p(e|r, C), where r ∈ R3 is an arbitrary 3D coordinate, and $e ∈ E = {H, C, O, . . .}$ is the chemical element. Intuitively, this density can be interpreted as a classifier that takes as input a 3D coordinate r conditional on C and predicts the probability of r being occupied by an atom of type e.
To model p(e|r, C), we devise a model consisting of two parts: Context Encoder、Spatial Classifier

1、Context Encoder

目的：extract information-rich representations for each atom in C
要求：
(1) context-awareness: the representation of an atom should not only encode the property of the atom itself, but also encode its context(上下文).
(2) rotational and translational invariance: since the physical and biological properties of the system do not change according to rigid transforms, the representations that reflect these properties should be invariant to rigid transforms as well. To this end, we employ rotationally and translationally invariant graph neural networks.
步骤：
(1) construct a k-nearest-neighbor graph based on inter-atomic distances, denoted as G = 〈C, A〉, where A is the adjacency matrix(邻接矩阵). We also denote the k-NN neighborhood of atom i as Nk(ri) for convenience(原子i的K-NN邻域). The context encoder will take G as input and output structure-aware node embeddings.
{ $a_i$ }—a linear layer-→{ $h^{(0)}_i$ }–L message passing layers–>{ $h^{(L)}_i$ }
(2) The first layer of the encoder is a linear layer. It maps atomic attributes {ai} to initial embeddings {h(0) i }. Then, these embeddings along with the graph structure A are fed into L message passing layers. Specifically, the formula of message passing takes the form:
在这里插入图片描述
where w(·) is a weight network and dij denotes the distance between atom i and atom j. The formula is similar to continuous filter convolution [25]. Note that, the weight of message from j to i depends only on dij, ensuring its invariance to rotation and translation. Finally, we obtain {h(L) i } a set of embeddings for each atom in C.

2、Spatial Classifier

The spatial classifier takes as input a query position r ∈ R3 and predicts the type of atom occupying r. In order to make successful predictions, the model should be able to perceive the context around r.
(1) the first step of this part is to aggregate atom embeddings from the context encoder(聚合特征):
在这里插入图片描述
where Nk® is the k-nearest neighborhood of r. Note that we weight different embedding using the weight network waggr(·) according to distances because it is necessary to distinguish the contribution of different atoms in the context.（区分不同原子在上下文的贡献，类似于注意力）
(2) Finally, in order to predict p(e|r, C), the aggregated feature v is then passed to a classical multi-layer perceptron classifier:
在这里插入图片描述
where c is the non-normalized probability of chemical elements(未归一化). The estimated probability of position r being occupied by atom of type e is:

where E is the set of possible chemical elements.

注：（1）不使用softmax，是因为即使所有元素的概率都很小导致分布接近1,但总概率会强制为1
（2）non-normalized(未归一化)：概率分布总和可能不为1
non-standardized(未标准化)：未执行，均值为0或方差为1等标准化操作

3、Sampling

Sampling a molecule amounts to generating a set of atoms {(ei, ri)}Na i=1. However, formulating an effective sampling algorithm is non-trivial because of the following three challenges.
(1)Joint Distribution：jointly sample an atom’s chemical element and its position. 在这里插入图片描述
define the joint distribution of coordinate r and atom type e
where Z is an unknown normalizing constant and c is a function of r and C as defined in Eq.3. Though p(e, r) is a non-normalized distribution, drawing samples from it would be efficient because the dimension of r is only 3. Viable sampling methods include Markov chain Monte Carlo (MCMC) or discretization.
(2)Auto-Regressive Sampling：the sampling algorithm should be able to attend to the dependencies between atoms, not simply drawing i.i.d. samples from p(e, r|C)
We sample a molecule by progressively sampling one atom at each step. In specific, at step t, the context Ct contains not only protein atoms but also t atoms sampled beforehand. Sampled atoms in Ct are treated equally as protein atoms in the model, but they have different attributes in order to differentiate themselves from protein atoms.
To determine when the auto-regressive sampling should stop, we employ an auxiliary network. The network takes as input the embedding of previously sampled atoms, and classifies them into two categories: frontier and non-frontier. If all the existing atoms are non-frontier, which means there is no room for more atoms, the sampling will be terminated. Finally, we use OpenBabel to obtain bonds of generated structures. 在这里插入图片描述
(3)the sampling algorithm should produce multi-modal samples. This is important because in reality there is usually more than one molecule that can bind to a specific target.
auto-regressive sampling is a stochastic process. Its sampling path naturally diverges, leading to diverse samples.

4、Training

(1) Loss
As we adopt auto-regressive sampling strategies, we propose a cloze-filling training scheme(填空训练方案) — at training time, a random portion of the target molecule is masked, and the network learns to predict the masked part from the observable part(只考虑观察到的部分) and the binding site. This emulates the sampling process where the model can only observe partial molecules.
$L = L_{BCE} + L_{CAT} + L_F$
在这里插入图片描述
(2) $L_{BCE}$
First, to make sure the model is able to predict positions that actually have atoms (positive positions), we include a binary cross entropy loss(二元交叉熵损失) to contrast positive positions against negative positions:

Here, p+ is a positive sampler that yields coordinates of masked atoms(正位置，应该出现原子的3d坐标). p− is a negative sampler that yields random coordinates in the ambient space. p− is empirically defined as a Gaussian mixture model containing |C| components centered at each atom in C. The standard deviation of each component is set to 2Å in order cover to the ambient space. Intuitively, the first term in Eq.8 increases the likelihood of atom placement for positions that should get an atom. The second term decreases the likelihood for other positions.
鼓励模型更好的预测，减少错误预测。
正位置：P+→ 1，P-→ 0。负位置：P+→ 0，P-→ 1。
(3) $L_{CAT}$
Second, our model should be able to predict the chemical element of atoms(原子的化学元素类型). Hence, we further include a standard categorical cross entropy loss(标准分类交叉熵损失):
在这里插入图片描述
(4) $L_F$
Third, the sampling algorithm requires a frontier network to tell whether the sampling should be terminated. This leads to the last term — a standard binary cross entropy loss(二元交叉，类似于正负位置) for training the frontier network:
atom embeddings—F(.)-→logit probability(atom being a frontier)
在这里插入图片描述
where F is the set of frontier atoms in C, σ is the sigmoid function, and F (·) is the frontier network that takes atom embedding as input and predicts the logit probability of the atom being a frontier. During training, an atom is regarded as a frontier if and only if (1) the atom is a part of the target molecule, and (2) at least one of its bonded atom is masked.

5、Experiments

Data：CrossDocked dataset
Model：We trained a universal model for all the tasks. The number of message passing layers in context encoder L is 6, and the hidden dimension is 256. We train the model using the Adam optimizer at learning rate 0.0001.
Metrics：Binding Affinity(如Vina Score)、Drug Likeness(如QED)、Synthesizability(如SA)、Percentage of Samples with High Affinity、Diversity

6、Results

在这里插入图片描述

嘿嘿我跑了

关注

35
点赞
踩
38

收藏

觉得还不错? 一键收藏
1
评论
论文简读《A 3d generative model for structure-based drug design》

我们的目标是生成一组原子，能够形成符合特定结合位点的有效药物分子。为此，我们首先在第3.1节介绍一个3D生成模型，该模型预测了结合位点的三维空间中原子出现的概率。其次，在第3.2节中，我们提出了一个自回归采样算法，用于从模型中生成有效且多模态的分子。最后，在第3.3节中，我们推导了训练目标，通过该目标模型学习预测应该放置原子的位置以及应该放置何种类型的原子。
复制链接

扫一扫