Paper reading (三十七)：Generating focused molecule libraries for drug discovery with RNN

最新推荐文章于 2022-12-08 20:40:43 发布

盲人骑瞎马5555

最新推荐文章于 2022-12-08 20:40:43 发布

阅读量510

点赞数

分类专栏： Paper Reading 文章标签： Recurrent Neural Networks molecule

本文链接：https://blog.csdn.net/wxw060709/article/details/102624790

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Generating focused molecule libraries for drug discovery with recurrent neural networks

scholar 引用：203

页数：12

发表时间：2017.12

发表刊物：ASC(American Chemical Society) Central Science

作者：Marwin H. S. Segler, Thierry Kogej,Christian Tyrchan,§ and Mark P. Waller

摘要：

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus(金黄色葡萄球菌), this model reproduced 14% of 6051 holdout test molecules that medicinal chemists designed, whereas against Plasmodium falciparum(Malaria) 恶性疟原虫（疟疾）, it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

不太理解reproduce在这两种情况下的含义。

结论：

we have shown that recurrent neural networks based on the long short-term memory (LSTM) can be applied to learn a statistical chemical language model.
This can be used to generate libraries for virtual screening. 应用1
we demonstrated that the model performs transfer learning when fine-tuned to smaller sets of molecules active toward a specific biological target, which enables the creation of novel molecules with the desired activity. 另一种应用
we do not even need a set of known active molecules to start our procedure with
three main advantages of our method：

it is conceptually orthogonal to established molecule generation approaches
our method is very simple to set up, to train, and to use
it merges structure generation and optimization in one model.

A weakness of our model is interpretability.
extend our work: a small step to cast molecule generation as a reinforcement learning problem
we believe that deep neural networks can be complementary to established approaches in drug discovery.

Introduction：

One of the many challenges in drug design is the sheer size of the search space for novel molecules.
Virtual screening is a commonly used strategy to search for promising molecules among millions of existing or billions of virtual molecules.
in any molecular design task, the computer has to: create molecules, score and filter them, and search for better molecules, building on the knowledge gained in the previous steps.
the generation of novel molecules: One strategy is to build molecules from predefined groups of atoms or fragments. another established approach is to conduct virtual chemical reactions based on expert coded rules
we have recently shown that the predicted reactions from these rule-based expert systems can sometimes fail.
scoring molecules and filtering out undesired structures: Target prediction classifies molecules into active and inactive, and quantitative structure–activity relationships (QSAR)
the mapping from a target property value y to possible structures X is one-to-many
In this work, we suggest a complementary, completely data-driven de novo drug design approach.
we highlight the analogy of language and chemistry, and show that RNNs can also generate reasonable molecules.
we demonstrate that RNNs can also transfer their learned knowledge from large molecule sets to directly produce novel molecules that are biologically active by retraining the models on small sets of already known actives.
We test our models by reproducing hold-out test sets of known biologically active molecules.

正文组织架构：

1. Introduction

2. Methods

2.1 Representing Molecules

2.2 Language Models and Recurrent Neural Networks

2.3 Transfer Learning

2.4 Target Prediction

2.5 Data

2.6 Model Evaluation

3. Results and Discussion

3.1 Training the Recurrent Network

3.2 Generating Novel Molecules

3.3 Generating Active Drug Molecules and Focused Libraries

3.4 Simulating Design-Synthesis-Test Cycles

3.5 Why Does the Model Work?

4. Conclusion

正文部分内容摘录：