论文简读《Generating 3D molecules conditional on receptor binding sites with deep generative models》

最新推荐文章于 2024-09-11 12:11:23 发布

嘿嘿我跑了

最新推荐文章于 2024-09-11 12:11:23 发布

阅读量1.6k

点赞数 43

分类专栏：三维药物分子设计文章标签：人工智能

本文链接：https://blog.csdn.net/xulomh/article/details/137432319

版权

三维药物分子设计专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了一种使用深度生成模型，如VAE和liGAN，在考虑受体结合位点的条件下生成3D分子的方法。流程包括原子类型分配、密度网格构建、原子拟合和键推断，以及基于条件VAE的训练和评估。实验利用跨对接2020数据集，评估了模型在多种指标下的性能。

摘要由CSDN通过智能技术生成

Generating 3D molecules conditional on receptor binding sites with deep generative models.

liGAN : a conditional VAE model
Chem Sci, 13:2701–2713, Feb 2022.
在这里插入图片描述

流程：

data–(VAE)–>atomic density grid–(atom fitting、bond inference)–>molecular conformations

1、Atom typing

First, we assign atom types to molecules using a set of Np atomic property functions p and value ranges for those properties v, which are listed in Table 1. For a given atom a, the atom type vector t∈ℝNT is created by concatenating Np atomic property vectors p through the following:
在这里插入图片描述
p：原子性质函数。Np：性质函数的个数。v：某个性质函数对应的值（某个性质的指标）。 NT：原子的个数

The atomic properties we used were element, aromaticity, H-bond donor and acceptor status, and formal charge. Different element ranges were represented for receptor atoms and ligand atoms（受体和配体的原子个数可能不同）, but the value ranges for all other properties were the same. The process we used to construct value ranges for properties and compare different type schemes is described in the supplement.

2、Atom gridding

The density value of an atom at a grid point is defined by a kernel function f: ℝ * ℝ → ℝ that takes as input the distance d between the atom coordinate and the grid point and the atomic radius r:
在这里插入图片描述
The radius was fixed at r =1.0 for all atoms in this work. Grid values are computed by summing the density kernel of each atom at each point on a 3D grid, multiplied by the value of the atom’s type vector in the corresponding grid channel. A molecule with N atoms and atom type vectors of length NT can be represented as a matrix of atom types T∈ℝNNT and a matrix of atomic coordinates C∈ℝN3. The function that computes atomic density grids g: ℝNNT * ℝN → ℝNTNXNY*NZ is then defined as follows:
在这里插入图片描述
All atoms that fit within the spatial extent of the grid are represented. We used cubic grids with side lengths of 23.5 ̊ A and 0.5 ̊ A resolution, resulting in spatial dimensions NX = NY = NZ = 48.

3、Atom fitting

The inverse problem of converting a reference density grid Gref into a discrete 3D molecular structure does not have an analytic solution, so we solve it as the following optimization problem:
在这里插入图片描述
网格密度g(T,C)近似参考密度网格
We can detect initial locations of atoms on a grid by selecting from the grid points with the largest density values. libmolgrid allows us to compute the grid representation of an atomic structure and backpropagate a gradient from grid values to atomic coordinates. Therefore, we devised an algorithm that combines iterative atom detection with gradient descent to find a set of atoms that best fits a reference density. 在这里插入图片描述

4、Bond inference

基于openbabel
在这里插入图片描述

5、Conditional VAE

Z：（输入编码器生成的）reg和lig的潜在特征向量
c：（条件编码器生成的）rec的特征向量
liggen：（解码器生成的）配体密度
在这里插入图片描述

6、Training

Lrecon：重构损失，最大化在给定受体密度的情况下解码潜在样本为真实配体密度的概率
在这里插入图片描述
LKL：鼓励近似后验分布与真实先验分布匹配
Lsteric：立体阻碍，确保生成的配体与受体在空间上没有碰撞或重叠，减少分子不稳定性

The loss weights were initialized at lrecon = 4.0, lKL = 0.1, and lsteric = 1.0, though the KL divergence loss weight was gradually ramped up to 1.6 over 200 000 iterations, starting at iteration 450 000. The model was trained using RMSprop with learning rate 10-5 for 1 000 000 iterations with a batch size of 8.
在这里插入图片描述

7、Experiments

Data：CrossDocked2020 data
Metrics：Validity、novelty、uniqueness、Fingerprint similarity、Per-target diversity、Shape similarity、Molecular weight and drug-likeness、UFF energy minimization、Vina energy and predicted binding affinity、Atom type distributions、Bond length distributions、Bond angle distributions、Torsion angle distributions.
Sampling methods：
prior sampling：先验采样，从标准正态分布中绘制潜在变量，再结合CVAE
变异性因子增大时，多样性增大，结合口袋中能量稳定性和有利性下降
posterior sampling：后验采样，将真实的蛋白质-配体复合物编码为潜在变量参数，再结合CVAE
变异性因子增大时，分子大小、复杂性增大