Paper reading (三十八):Convolutional networks on graphs for learning molecular fingerprints

论文题目:Convolutional networks on graphs for learning molecular fingerprints

scholar 引用:798

页数:9

发表时间:2015.09

发表刊物:Neural Information Processing Systems

作者:David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre Rafael Go ́mez-Bombarelli, Timothy Hirzel, Ala ́n Aspuru-Guzik, Ryan P. Adams

摘要:

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better pre- dictive performance on a variety of tasks.

结论:

  • We generalized existing hand-crafted molecular features to allow their optimization for diverse tasks.

  • We demonstrated the interpretability and predictive performance of these new fingerprints.

  • Data-driven features have already replaced hand-crafted features in speech recognition, machine vision, and natural-language processing.

  • Carrying out the same task for virtual screening, drug design, and materials design is a natural next step.

Introduction:

  • The current state of the art is to use off-the-shelf fingerprint software to compute fixed-dimensional feature vectors, and use those features as inputs to a fully-connected deep neural network or other standard machine learning method.  主流做法
  • During training, the molecular fingerprint vectors were treated as fixed.  缺陷
  • we replace the bottom layer of this stack – the function that computes molecular fingerprint vectors – with a differentiable neural network whose input is a graph representing the original molecule.  本文主要工作
  • In this graph, vertices represent individual atoms and edges represent bonds.
  • The lower layers of this network is convolutional in the sense that the same local filter is applied to each atom and its neighborhood. After several such layers, a global pooling step combines features from all the atoms in the molecule.
  • These neural graph fingerprints offer several advantages over fixed fingerprints:  本文的方法比传统做法的优势(这篇paper各种直截了当啊!)
  1. Predictive performance. provide substantially better predictive performance than fixed fingerprints
  2. Parsimony(简约性) Fixed fingerprints must be extremely large to encode all possible substructures without overlap.
  3. Interpretability. each feature of a neural graph fingerprint can be activated by similar but distinct molecular fragments, making the feature representation more meaningful.

正文组织架构:

1. Introduction

2. Circular fingerprints

3. Creating a differentiable fingerprint

4. Experiments

4.1 Examining learned features

4.2 Predictive performance

5. Limitations

6. Related work

7. Conclusion

正文部分内容摘录:

2. Circular fingerprints

  • Ignoring collisions, each index of the fingerprint denotes the presence of a particular substructure. The size of the substructures represented by each index depends on the depth of the network. Thus the number of layers is referred to as the ‘radius’ of the fingerprints.
  • Circular fingerprints are analogous to convolutional networks in that they apply the same operation locally everywhere, and combine information in a global pooling step.

3. Creating a differentiable fingerprint

  • Hashing:The purpose of the hash functions applied at each layer of circular fingerprints is to combine information about each atom and its neighboring substructures.
  • We replace the hash operation with a single layer of a neural network.
  • Using a smooth function allows the activations to be similar when the local molecular structure varies in unimportant ways.
  • Indexing:to combine all the nodes’ feature vectors into a single fingerprint of the whole molecule.
  • This pooling-like operation converts an arbitrary-sized graph into a fixed-sized vector.
  • We use the softmax operation as a differentiable analog of indexing.
  • Canonicalization:An alternative to canonicalization is to apply a permutation-invariant function, such as summation.
  • Circular fingerprints can be interpreted as a special case of neural graph fingerprints having large random weights.

4. Experiments

We ran two experiments to demonstrate that neural fingerprints with large random weights behave similarly to circular fingerprints.

4.1 Examining learned features

  • Solubility features
  • Toxicity features

4.2 Predictive performance

  • We ran several experiments to compare the predictive performance of neural graph fingerprints to that of the standard state-of-the-art setup: circular fingerprints fed into a fully-connected neural network.
  • Experimental setup: Our pipeline takes as input the SMILES string encoding of each molecule, which is then converted into a graph using RDKit.
  • Training and Architecture: batch normalization, relu had a slight but consistent performance advantage on the validation set. dropout led to worse validation error in general.  Each experiment optimized for 10000 minibatches of size 100 using the Adam algorithm, a variant of RMSprop that includes momentum.
  • Hyperparameter Optimization: random search, 50 trials for each cross-validation fold, 
  • Datasets: Solubility; Drug efficacy; Organic photovoltaic efficiency
  • Predictive accuracy: We compared the performance of circular fingerprints and neural graph fingerprints under two conditions: In the first condition, predictions were made by a linear layer using the fingerprints as input. In the second condition, predictions were made by a one-hidden-layer neural network using the fingerprints as input.
  • In all experiments, the neural graph fingerprints matched or beat the accuracy of circular fingerprints, and the methods with a neural network on top of the fingerprints typically outperformed the linear layers.
  • Automatic differentiation (AD) software packages such as Theano significantly speed up development time by providing gradients automatically, but can only handle limited control structures and indexing.
  • Software: Autograd

5. Limitations

  • Computational cost: In practice, training neural networks on top of circular fingerprints usually took several minutes, while training both the fingerprints and the network on top took on the order of an hour on the larger datasets.
  • Limited computation at each layer: In this paper we chose the simplest feasible architecture: a single layer of a neural network.
  • Limited information propagation across the graph: The local message-passing architecture developed in this paper scales well in the size of the graph (due to the low degree of organic molecules), but its ability to propagate information across the graph is limited by the depth of the network.
  • Inability to distinguish stereoisomers: Most circular fingerprint implementations have the option to make these distinctions. Neural fingerprints could be extended to be sensitive to stereoisomers, but this remains a task for future work.

6. Related work

  • Neural nets for quantitative structure-activity relationship (QSAR)
  • Neural graph fingerprints
  • Convolutional neural networks: standard convolutional architectures use a fixed computational graph, making them difficult to apply to objects of varying size or structure, such as molecules.
  • Neural networks on fixed graphs
  • Neural networks on input-dependent graphs:Our method replaces their complex training algorithms with simple gradientbased optimization, generalizes existing circular fingerprint computations, and applies these networks in the context of modern QSAR pipelines which use neural networks on top of the fingerprints to increase model capacity.
  • Unrolled inference algorithms
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值