Paper reading (三十五)：Neural-symbolic machine learning for retrosynthesis and reaction prediction

最新推荐文章于 2024-04-27 09:40:08 发布

盲人骑瞎马5555

最新推荐文章于 2024-04-27 09:40:08 发布

阅读量651

点赞数 1

分类专栏： Paper Reading 文章标签： reaction prediction retrosynthesis deep neural networks

本文链接：https://blog.csdn.net/wxw060709/article/details/102609860

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Neural-symbolic machine learning for retrosynthesis and reaction prediction

scholar 引用：99

页数：9

发表时间：2017.01

发表刊物：Chemistry – A European Journal

作者：Marwin H. S. Segler and Mark P. Waller

摘要：

Reaction prediction and retrosynthesis are the cornerstone of organic chemistry. Rule-based expert systems have been the most widespread approach to computationally solve these two related challenge to date. However, reaction rules often fail because they ignore the molecular context, which leads to reactivity conflicts. Herein, we report that deep neural networks can learn to resolve reactivity conflicts and to prioritize the most suitable transformation rules. We show that by training out model on 3.5 million reactions taken from the collective published knowledge of the entire discipline of chemistry, out model exhibits a top 10-accuracy of 95% in retrosynthesis and 97% for reaction prediction on a validation set of almost 1 million reactions.

那其他rule-based应用广泛的场景，或许也可以尝试一下DNN
top-10？这里是说预测取排名前10

Limitations：

Most of the limitations stem from the underlying rules, and not the machine‐learning component.

our system shares with other rule‐based systems, is that it cannot predict anything outside its rule base. It does not solve the dilemma of rules: Either, one defines rules that are too general, which would generate a lot of noise or rules that are too specific, which can only predict the substrate used to derive the rule. This is especially problematic for reaction types that only occur a few times. 可能的解决方案：a model of chemical reasoning based on knowledge graphs
our system does not take stereochemistry into account. 可能的解决方案：a global model without involving quantum chemistry

we report on a hybrid neural‐symbolic approach for both retrosynthesis and reaction prediction that can be trained with large reaction sets from databases.
neural networks can learn to which molecular context particular rules can be applied, and can prioritize the rules for both retrosynthesis and reaction prediction using either hand‐coded or automatically extracted rule sets.
We anticipate that neural‐symbolic models will be a key building block in future systems for computer‐aided synthesis design, robot synthesis, virtual chemical space exploration, and de novo drug design.

Introduction：

To rationally synthesize new molecules, two intimately related problems, reaction prediction and retrosynthesis, have to be solved.
反应预测：任务是推断一组分子（原料）将如何反应以及产物将是什么。
逆合成分析，也称作逆合成法、反合成分析，是解决有机合成路线的重要方法，也是有机合成路线设计的最简单、最基本的方法。其实质是目标分子的分拆，通过分析目标分子结构，逐步将其拆解为更简单、更容易合成的前体和原料，从而完成路线的设计。
The standard methodology for retrosynthesis and reaction prediction are rule‐based expert systems.
The rules are applied to the reactants to obtain the product in reaction prediction, or in reverse, to the product, for retrosynthesis.
The great advantage of rules is that they are straightforward to interpret.
the rule‐based approach has several drawbacks：

rule‐based expert systems cannot predict anything outside of their knowledge
the rules have to be compiled and curated
lack an inherent ranking mechanism

rule‐based expert systems for retrosynthesis have never been rigorously evaluated with large hold out test sets.
前人尝试过的方法：random forest，neural network，unsupervise pre-training of self‐organizing maps
we propose a novel neural‐symbolic model, which can be used for both reaction prediction and retrosynthesis.
We hypothesize that the advantage of combining machine learning with symbolic rules is that we retain the familiar concept of rules, whereas the model learns to prioritize the rules and to estimate selectivity and compatibility from the provided training data, which are successfully performed experiments.
In top‐n accuracy, we examine if the correct reaction rule is among the n highest ranked rules, similar to being on the first page of the results of a search engine.
we compare our best neural‐symbolic models, a neural network with one hidden layer (FC512 ELU) and a deep highway network, to a purely rule‐based expert system operating with the same rule set.
no hand‐annotated expert systems are free or open source, and the annotations themselves are not published in the public domain making a direct comparison unfeasible.

正文组织架构：

1. Introduction

2. Hand-coded reactions

3. Automatically extracted rules

4. Timing

5. Limitations

6. Experimental Section

6.1 Data

6.2 Reaction rules

6.3 Molecular descriptors

6.4 Neural networks

正文部分内容摘录：