论文简读《A 3d generative model for structure-based drug design》

A 3d generative model for structure-based drug design.

Autoregressive Model
Advances in Neural Information Processing Systems, 34, 2021.
在这里插入图片描述

流程:

C ( a , r ) C(a,r) C(a,r)---->r–Encoder、Classifier–> p ( e ∣ r , C ) p(e|r,C) p(er,C)---->e在r的概率
C C C:a set of atoms C = ( a i , r i ) N i = 1 b C = {(a_i, r_i)}N^b_{i=1} C=(ai,ri)Ni=1b, where Nb is the number of atoms in the binding site,
a i a_i ai:the i-th atom’s attributes such as chemical element, belonging amino acid, etc., and
r i r_i ri :the i-th atom’s 3D coordinate.
To generate atoms in the binding site, we consider modeling the probability of atom occurring at some position r in the site. Formally, this is to model the density p(e|r, C), where r ∈ R3 is an arbitrary 3D coordinate, and e ∈ E = H , C , O , . . . e ∈ E = {H, C, O, . . .} eE=H,C,O,... is the chemical element. Intuitively, this density can be interpreted as a classifier that takes as input a 3D coordinate r conditional on C and predicts the probability of r being occupied by an atom of type e.
To model p(e|r, C), we devise a model consisting of two parts: Context Encoder、Spatial Classifier

1、Context Encoder

目的:extract information-rich representations for each atom in C
要求:
(1) context-awareness: the representation of an atom should not only encode the property of the atom itself, but also encode its context(上下文).
(2) rotational and translational invariance: since the physical and biological properties of the system do not change according to rigid transforms, the representations that reflect these properties should be invariant to rigid transforms as well. To this end, we employ rotationally and translationally invariant graph neural networks.
步骤:
(1) construct a k-nearest-neighbor graph based on inter-atomic distances, denoted as G = 〈C, A〉, where A is the adjacency matrix(邻接矩阵). We also denote the k-NN neighborhood of atom i as Nk(ri) for convenience(原子i的K-NN邻域). The context encoder will take G as input and output structure-aware node embeddings.
{ a i a_i ai}—a linear layer-→{ h i ( 0 ) h^{(0)}_i hi(0)}–L message passing layers–>{ h i ( L ) h^{(L)}_i hi(L)}
(2) The first layer of the encoder is a linear layer. It maps atomic attributes {ai} to initial embeddings {h(0) i }. Then, these embeddings along with the graph structure A are fed into L message passing layers. Specifically, the formula of message passing takes the form:
在这里插入图片描述
where w(·) is a weight network and dij denotes the distance between atom i and atom j. The formula is similar to continuous filter convolution [25]. Note that, the weight of message from j to i depends only on dij, ensuring its invariance to rotation and translation. Finally, we obtain {h(L) i } a set of embeddings for each atom in C.

2、Spatial Classifier

The spatial classifier takes as input a query position r ∈ R3 and predicts the type of atom occupying r. In order to make successful predictions, the model should be able to perceive the context around r.
(1) the first step of this part is to aggregate atom embeddings from the context encoder(聚合特征):
在这里插入图片描述
where Nk® is the k-nearest neighborhood of r. Note that we weight different embedding using the weight network waggr(·) according to distances because it is necessary to distinguish the contribution of different atoms in the context.(区分不同原子在上下文的贡献,类似于注意力)
(2) Finally, in order to predict p(e|r, C), the aggregated feature v is then passed to a classical multi-layer perceptron classifier:
在这里插入图片描述
where c is the non-normalized probability of chemical elements(未归一化). The estimated probability of position r being occupied by atom of type e is:
在这里插入图片描述
where E is the set of possible chemical elements.

注:(1)不使用softmax,是因为即使所有元素的概率都很小导致分布接近1,但总概率会强制为1
(2)non-normalized(未归一化):概率分布总和可能不为1
non-standardized(未标准化):未执行,均值为0或方差为1等标准化操作

3、Sampling

Sampling a molecule amounts to generating a set of atoms {(ei, ri)}Na i=1. However, formulating an effective sampling algorithm is non-trivial because of the following three challenges.
(1)Joint Distribution:jointly sample an atom’s chemical element and its position.在这里插入图片描述
define the joint distribution of coordinate r and atom type e
where Z is an unknown normalizing constant and c is a function of r and C as defined in Eq.3. Though p(e, r) is a non-normalized distribution, drawing samples from it would be efficient because the dimension of r is only 3. Viable sampling methods include Markov chain Monte Carlo (MCMC) or discretization.
(2)Auto-Regressive Sampling:the sampling algorithm should be able to attend to the dependencies between atoms, not simply drawing i.i.d. samples from p(e, r|C)
We sample a molecule by progressively sampling one atom at each step. In specific, at step t, the context Ct contains not only protein atoms but also t atoms sampled beforehand. Sampled atoms in Ct are treated equally as protein atoms in the model, but they have different attributes in order to differentiate themselves from protein atoms.
To determine when the auto-regressive sampling should stop, we employ an auxiliary network. The network takes as input the embedding of previously sampled atoms, and classifies them into two categories: frontier and non-frontier. If all the existing atoms are non-frontier, which means there is no room for more atoms, the sampling will be terminated. Finally, we use OpenBabel to obtain bonds of generated structures.在这里插入图片描述
(3)the sampling algorithm should produce multi-modal samples. This is important because in reality there is usually more than one molecule that can bind to a specific target.
auto-regressive sampling is a stochastic process. Its sampling path naturally diverges, leading to diverse samples.

4、Training

(1) Loss
As we adopt auto-regressive sampling strategies, we propose a cloze-filling training scheme(填空训练方案) — at training time, a random portion of the target molecule is masked, and the network learns to predict the masked part from the observable part(只考虑观察到的部分) and the binding site. This emulates the sampling process where the model can only observe partial molecules.
L = L B C E + L C A T + L F L = L_{BCE} + L_{CAT} + L_F L=LBCE+LCAT+LF
在这里插入图片描述
(2) L B C E L_{BCE} LBCE
First, to make sure the model is able to predict positions that actually have atoms (positive positions), we include a binary cross entropy loss(二元交叉熵损失) to contrast positive positions against negative positions:
在这里插入图片描述
Here, p+ is a positive sampler that yields coordinates of masked atoms(正位置,应该出现原子的3d坐标). p− is a negative sampler that yields random coordinates in the ambient space. p− is empirically defined as a Gaussian mixture model containing |C| components centered at each atom in C. The standard deviation of each component is set to 2Å in order cover to the ambient space. Intuitively, the first term in Eq.8 increases the likelihood of atom placement for positions that should get an atom. The second term decreases the likelihood for other positions.
鼓励模型更好的预测,减少错误预测。
正位置:P+→ 1,P-→ 0。负位置:P+→ 0,P-→ 1。
(3) L C A T L_{CAT} LCAT
Second, our model should be able to predict the chemical element of atoms(原子的化学元素类型). Hence, we further include a standard categorical cross entropy loss(标准分类交叉熵损失):
在这里插入图片描述
(4) L F L_F LF
Third, the sampling algorithm requires a frontier network to tell whether the sampling should be terminated. This leads to the last term — a standard binary cross entropy loss(二元交叉,类似于正负位置) for training the frontier network:
atom embeddings—F(.)-→logit probability(atom being a frontier)
在这里插入图片描述
where F is the set of frontier atoms in C, σ is the sigmoid function, and F (·) is the frontier network that takes atom embedding as input and predicts the logit probability of the atom being a frontier. During training, an atom is regarded as a frontier if and only if (1) the atom is a part of the target molecule, and (2) at least one of its bonded atom is masked.

5、Experiments

Data:CrossDocked dataset
Model:We trained a universal model for all the tasks. The number of message passing layers in context encoder L is 6, and the hidden dimension is 256. We train the model using the Adam optimizer at learning rate 0.0001.
Metrics:Binding Affinity(如Vina Score)、Drug Likeness(如QED)、Synthesizability(如SA)、Percentage of Samples with High Affinity、Diversity

6、Results

在这里插入图片描述

  • 35
    点赞
  • 38
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值