Paper reading (四十二):Low Data Drug Discovery with One-Shot Learning

论文题目:Low Data Drug Discovery with One-Shot Learning

scholar 引用:177

页数:11

发表时间:2017.04

发表刊物:ACS Central Science

作者:Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, and Vijay Pande

摘要:

Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds. However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery (Ramsundar, B. deepchem.io. https://github.com/deepchem/deepchem, 2016).

  • We demonstrate how one-shot learning can lower the amount of data required to make meaningful predictions in drug discovery.
  • Our architecture, the iterative refinement long short-term memory, permits the learning of meaningful distance metrics on small-molecule space.

结论:

  • This paper introduces the task of low data learning for drug discovery and provides an architecture for learning such models. 
  • Our results go further and demonstrate that iterative refinement LSTMs can generalize to new experimental assays, related but not identical to assays in the training collection. 
  • it is clear that there are strong limitations to the generalization powers of current one-shot learning methods. 
  • but it is left to future work to determine the precise limits. Future work might also investigate the structure of the embeddings learned by the iterative refinement LSTM modules, to understand how these representations compare to standard techniques such as circular fingerprints.

Introduction:

  • Yet, with only a small amount of biological data available on the candidate and related molecules, it is challenging to form accurate predictions for novel compounds.
  • This capability of deep neural networks is underpinned by their ability to learn sophisticated representations of their input given large amounts of data.
  • these new graph convolutional feature extracting architectures are learnable, meaning they can be modified to improve performance.
  • The practical effect of these innovations in drug discovery has been limited as most of the aforementioned deep-learning frameworks require large amounts of data. 
  • in some circumstances, nontrivial predictors may be learned from only a few data points. these techniques are known as “one-shot learning” methods.
  • Standard one-shot learning focuses on recognizing new-classes (say recognizing a giraffe given only one example). 
  • We introduce a new deep-learning architecture, the iterative refinement long short-term memory (LSTM), a modification of the matching-networks architecture and the residual convolutional network.

正文组织架构:

1. Introduction

2. Methods

2.1 Mathematical Formalism

2.2 One-Shot Learning

       2.2.1 Review of Prior One-Shot Techniques

2.3 Iterative Refinement LSTMs

2.4 Graph Convolutions

       2.4.1 Previous Work on Molecular Graph Convolution

       2.4.2 New Layers

2.5 Model Training and Evaluation

3. Results and Discussion

4. Experiments

5. Appendix

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • low data learning for drug discovery
  • one-shot learning: in some circumstances, nontrivial predictors may be learned from only a few data points. 
  • These methods work by using related data to learn a meaningful distance metric over the space of possible inputs. This sophisticated metric is used to compare new data points to the limited available data and subsequently predict properties of these new data points.  
  • More broadly, these techniques are known as “one-shot learning” methods. 

2. Main discoveries: What is the main discoveries in this paper?

  • We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. 
  • one-shot learning can lower the amount of data required to make meaningful predictions in drug discovery. 
  • Our architecture, the iterative refinement long short-term memory, permits the learning of meaningful distance metrics on small-molecule space.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • We introduce a new deep-learning architecture, the iterative refinement long short-term memory (LSTM), a modification of the matching-networks architecture and the residual convolutional network.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • traditional methods: random forests and simple deep-networks are capable of learning meaningful chemical information from only a few hundred compounds, but even a hundred compounds is often too resource intensive for standard drug discovery campaigns.
  • prior one-shot learning methods: 
  • This architecture allows for the learning of sophisticated metrics which can trade information between evidence and query molecules.
  • We demonstrate that this architecture offers significant boosts in predictive power for a variety of problems meaningful for low-data drug discovery.
  • Our results go further and demonstrate that iterative refinement LSTMs can generalize to new experimental assays, related but not identical to assays in the training collection. 

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • Our proposed architecture for one-shot learning preserves the context-aware design of matching networks but resolves the order dependence in the support-embedding g and the nonsymmetric treatment of the query and support noted in the previous section
  •  This construction allows the embedding of the data set to iteratively inform the embedding of the query.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • it is clear that there are strong limitations to the generalization powers of current one-shot learning methods. 
  • On the MUV data sets, one-shot learning methods struggle compared to simple machine-learning baselines. 
  • on the transfer learning experiment, which attempts to use the Tox21 collection to train SIDER predictors, all one-shot learning methods collapse entirely.
  •  it is left to future work to determine the precise limits. 
  • Future work might also investigate the structure of the embeddings learned by the iterative refinement LSTM modules, to understand how these representations compare to standard techniques such as circular fingerprints. 

7. Mine Question(Optional)

  • One-shot learning is a classification task where one example (or a very small number of examples) is given for each class, that is used to prepare a model, that in turn must make predictions about many unknown examples in the future.
  • 适用于样例少的各种情况?我怎么第一次听说这个啊。。。那不是很多传统机器学习的方法适用的场景都可以试试这个?
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值