Chapter20: Machine Learning for In Silico ADMET Prediction

reading notes of《Artificial Intelligence in Drug Design》


1.Introduction

  • The multiple task deep learning network (MT-DNN) and graph convolutional neural network (GCNN) methods play important role in the accuracy boost.

2.Materials

2.1.Dataset Overview

  • PubChem is a large-scale chemical database of bioactive molecules with drug like properties.
  • PubChem’s European counterpart ChEMBL is another database housing small molecule dataset for machine learning.
  • Some additional well-curated databases include the Aquasol database for aqueous solubility and Tox21 for toxicity.

2.2. Descriptor Set Overview

  • 2D molecular descriptors are the most popular for traditional ADMET modeling. These include cLogP (BioByte Corp., Claremont, CA), Kier connectivity, shape, and E-state indices, a subset of MOE descriptors (Chemical Computing Group Inc., 2004, http://www.chemcomp.com), and a set of ADMET keys that are structural features were used for our ADMET modeling.
  • Some of the descriptors such as Kier shape indices contain implicit 3D information. Explicit 3D molecular descriptors were not routinely used to avoid bias of the analysis due to predicted conformational effects and speed of calculation for fast prediction.
  • In the deep learning approach, molecular graph convolutional neural network was applied to transform molecular structures to embeddings.

2.3.Machine Learning Algorithms

2.4.Software

  • Python and R are often used for data processing.
  • Spotfire and JMP can perform data analysis and visualization.
  • Some commonly used software for calculating descriptors include Dragon, RDKit, Daylight, ACDlabs, Molecular Operating Environment (MOE), Schrodinger, and Pipeline Pilot.
  • The graph convolutional descriptors can be computed by several deep learning–based ADMET prediction software including DeepChem, Chemprop, and Chemi-Net.
  • The Sklearn and Caret packages in Python and R, respectively, are used for applying traditional machine learning algorithms. Tensorflow, Keras, and PyTorch are commonly used DL framework software.
  • Pipeline pilot is used for data pipelining and automating the whole ADMET training and inference processes.

2.5.Computer Hardware

  • Either on premise or on cloud or a hybrid of such computing hardware solution can be applied for performing the machine learning tasks.
    • For traditional machine learning tasks, an HP Z series workstation with at minimum a 4-core CPU with 16GB RAM and 1 TB hard drive or a similar setup with the M5 instance of Amazon Web Services (AWS) can meet the requirement. Preferred hardware setup includes an 8-core CPU with 64GB RAM and 4 TB SSD hard drive.
    • For deep learning tasks with large datasets, GPUs are preferred for the training process. Some preferred GPUs include Nvidia GeForce RTX 2080Ti, Quadro RTX 6000, Titan RTX, or Tesla V100. On AWS, the P2 or P3 instance is suitable for the GPU training tasks.

3.Methods

3.1.Training and Test Set preparation

  • To resemble real time prediction situations, training set and test set were split temporally with newer compounds selected as the test set.
    请添加图片描述

3.2.Model Training with Machine Learning and Performance Evaluation

  • The test set was used solely for testing purposes to avoid bias in the training procedure.

3.3.Model Deployment and Automation

请添加图片描述
请添加图片描述

3.4.Performance Monitoring

  • During the model training update run, we retrieve molecules with newly measured data since last time training. We use the model from last training process to predict the ADMET activity of compounds with newly measured data. In this case, we make sure that the new molecules are not present in the last training model that we used for this evaluation.

3.5.Additional Tips When Training ADMET Models

  • There are several important factors which need to be considered when building in silico ADMET models.
    • One of the first considerations is the understanding of the ADMET property to be analyzed and how the research team intends to use this property to make design decisions.
    • Next, the variability of the experimental data should be examined. Since in silico modeling is intended to simulate an experimental assay, the models are only as good as the quality of the data based on which they are trained.
    • Following this, the machine learning method(s) to be used to analyze the structure–activity relationship (SAR) should be examined in the context of the structural diversity, SAR linearity, and size of the dataset to be analyzed.
      • For small datasets, especially for a congeneric series of compounds, simple multilinear regression analysis or partial least squares can be sufficient.
      • For large and structurally diverse sets of data with nonlinear SAR relationships, more sophisticated methods such as RF, ANN, Cubist, or advanced deep learning methods can be more practical.
    • The next aspect to be considered is the available molecular descriptor set, as accuracy, interpretability, reproducibility, and speed need to be evaluated.
    • Finally, the application domain or prediction confidence needs to be examined if the model is meant to be applied for prospective property predictions.

4.Notes

  • To incorporate DL-based ADMET prediction seamlessly with our existing ADMET prediction service, we had to stay with the same Pipeline Pilot platform.
    请添加图片描述
    请添加图片描述
    请添加图片描述

5.Summary

  • We describe development and implementation of ADMET prediction methods. The methods are widely used in pharmaceutical industry.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Pipeline Pilot 2021是一种科学数据分析和工作流程管理软件,它可以帮助实验室和研究人员高效地处理和分析大量的科学数据。 要安装Pipeline Pilot 2021,首先需要下载安装程序。你可以在Accelrys(现在是BIOVIA)的官方网站或其他可靠的软件下载平台上找到安装程序。确保选择与你的操作系统兼容的版本。 一旦下载完成,双击安装程序并按照提示进行安装。你可能需要提供一些必要的信息,例如安装路径和许可证密钥。确保你有管理员权限,并且关闭所有与Pipeline Pilot 2021相关的应用程序或进程。 安装完成后,你可以启动Pipeline Pilot 2021。你可以在开始菜单、桌面图标或安装路径下找到启动程序。点击它,等待一段时间直到软件载入完成。 当软件启动后,你将被要求输入许可证密钥。如果你有有效的许可证密钥,输入它并点击确认。否则,你可以选择试用版本或联系供应商获得许可证。 一旦成功验证许可证,你就可以开始使用Pipeline Pilot 2021了。它提供了丰富的功能和工具,可以帮助你构建、运行和管理各种科学数据分析工作流程。你可以使用现有的模板,也可以自定义你自己的工作流程。 除了安装Pipeline Pilot 2021本身,你还可以安装一些与其兼容的模块和插件,以扩展软件的功能和应用范围。你可以从官方网站或其他来源获取这些模块和插件,并按照其各自的安装指南进行安装。 总之,通过简单的下载、安装和许可证验证,你就可以使用Pipeline Pilot 2021了。它可以帮助你更高效地处理科学数据,并提供了丰富的功能和工具以满足各种数据分析需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

_森罗万象

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值