Chapter20: Machine Learning for In Silico ADMET Prediction

_森罗万象

已于 2022-09-27 09:56:15 修改

阅读量597

点赞数

分类专栏：读书笔记

于 2022-09-25 09:03:58 首次发布

本文链接：https://blog.csdn.net/weixin_52812620/article/details/127034005

版权

33 篇文章 5 订阅

订阅专栏

reading notes of《Artificial Intelligence in Drug Design》

1.Introduction

The multiple task deep learning network (MT-DNN) and graph convolutional neural network (GCNN) methods play important role in the accuracy boost.

PubChem is a large-scale chemical database of bioactive molecules with drug like properties.
PubChem’s European counterpart ChEMBL is another database housing small molecule dataset for machine learning.
Some additional well-curated databases include the Aquasol database for aqueous solubility and Tox21 for toxicity.

2D molecular descriptors are the most popular for traditional ADMET modeling. These include cLogP (BioByte Corp., Claremont, CA), Kier connectivity, shape, and E-state indices, a subset of MOE descriptors (Chemical Computing Group Inc., 2004, http://www.chemcomp.com), and a set of ADMET keys that are structural features were used for our ADMET modeling.
Some of the descriptors such as Kier shape indices contain implicit 3D information. Explicit 3D molecular descriptors were not routinely used to avoid bias of the analysis due to predicted conformational effects and speed of calculation for fast prediction.
In the deep learning approach, molecular graph convolutional neural network was applied to transform molecular structures to embeddings.

Cubist is a prediction-oriented regression algorithms developmented by Quinlan. The advantage of Cubist, comparing to other traditional statistical algorithms, is that it can handle large dataset with highly nonlinearity relationship.
A deep learning algorithm for ADMET prediction is described in detail in Chemi-net: a molecular graph convolutional network for accurate drug property prediction.

Python and R are often used for data processing.
Spotfire and JMP can perform data analysis and visualization.
Some commonly used software for calculating descriptors include Dragon, RDKit, Daylight, ACDlabs, Molecular Operating Environment (MOE), Schrodinger, and Pipeline Pilot.
The graph convolutional descriptors can be computed by several deep learning–based ADMET prediction software including DeepChem, Chemprop, and Chemi-Net.
The Sklearn and Caret packages in Python and R, respectively, are used for applying traditional machine learning algorithms. Tensorflow, Keras, and PyTorch are commonly used DL framework software.
Pipeline pilot is used for data pipelining and automating the whole ADMET training and inference processes.

Either on premise or on cloud or a hybrid of such computing hardware solution can be applied for performing the machine learning tasks.
- For traditional machine learning tasks, an HP Z series workstation with at minimum a 4-core CPU with 16GB RAM and 1 TB hard drive or a similar setup with the M5 instance of Amazon Web Services (AWS) can meet the requirement. Preferred hardware setup includes an 8-core CPU with 64GB RAM and 4 TB SSD hard drive.
- For deep learning tasks with large datasets, GPUs are preferred for the training process. Some preferred GPUs include Nvidia GeForce RTX 2080Ti, Quadro RTX 6000, Titan RTX, or Tesla V100. On AWS, the P2 or P3 instance is suitable for the GPU training tasks.

To resemble real time prediction situations, training set and test set were split temporally with newer compounds selected as the test set.

The test set was used solely for testing purposes to avoid bias in the training procedure.

请添加图片描述

During the model training update run, we retrieve molecules with newly measured data since last time training. We use the model from last training process to predict the ADMET activity of compounds with newly measured data. In this case, we make sure that the new molecules are not present in the last training model that we used for this evaluation.

To incorporate DL-based ADMET prediction seamlessly with our existing ADMET prediction service, we had to stay with the same Pipeline Pilot platform.