Paper reading (五十)：Supervised prediction of drug-induced nephrotoxicity

最新推荐文章于 2024-05-08 20:43:47 发布

盲人骑瞎马5555

最新推荐文章于 2024-05-08 20:43:47 发布

阅读量174

点赞数

分类专栏： Paper Reading 文章标签： Random Forest drug-induced nephrotoxicity

本文链接：https://blog.csdn.net/wxw060709/article/details/102882368

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Supervised prediction of drug-induced nephrotoxicity based on interleukin-6 and -8 expression levels

scholar 引用：22

页数：9

发表时间：2014.12

发表刊物：BMC Bioinformatics

作者：Ran Su, Yao Li, Daniele Zink, Lit-Hsin Loo

摘要：

Background
Drug-induced nephrotoxicity causes acute kidney injury and chronic kidney diseases, and is a major reason for late-stage failures in the clinical trials of new drugs. Therefore, early, pre-clinical prediction of nephrotoxicity could help to prioritize drug candidates for further evaluations, and increase the success rates of clinical trials. Recently, an in vitro model for predicting renal-proximal-tubular-cell (PTC) toxicity based on the expression levels of two inflammatory markers, interleukin (IL)-6 and -8, has been described. However, this and other existing models usually use linear and manually determined thresholds to predict nephrotoxicity. Automated machine learning algorithms may improve these models, and produce more accurate and unbiased predictions.
Results
Here, we report a systematic comparison of the performances of four supervised classifiers, namely random forest, support vector machine, k-nearest-neighbor and naive Bayes classifiers, in predicting PTC toxicity based on IL-6 and -8 expression levels. Using a dataset of human primary PTCs treated with 41 well-characterized compounds that are toxic or not toxic to PTC, we found that random forest classifiers have the highest cross-validated classification performance (mean balanced accuracy = 87.8%, sensitivity = 89.4%, and specificity = 85.9%). Furthermore, we also found that IL-8 is more predictive than IL-6, but a combination of both markers gives higher classification accuracy. Finally, we also show that random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%).
Conclusions
Our results suggest that a random forest classifier can be used to automatically predict drug-induced PTC toxicity based on the expression levels of IL-6 and -8.

结论：

All parameters of the classifiers were determined automatically without any user intervention.
This better performance is likely due to the non-linear and multivariate decision boundaries generated by the random forest classifier.
Our methods are general and can be easily applied to test and identify other potential nephrotoxicity markers based on gene expression levels, metabolic profiles, or cellular phenotypes.
further increased by combining markers from these different modalities, and also by increasing the number of training compounds.
An important application of our automated classifier is to predict nephrotoxicity of novel chemical compounds identified from large-scale screening of small-molecule or natural product libraries.

Background：

most of these existing predictors use simple linear thresholds to distinguish between the effects of nephrotoxic and non-nephrotoxic compounds, even though more than one markers (or "features") are measured from the cells.
These manually-determined thresholds may be subject to human biases, and have difficulties in distinguishing features that are non-linearly separable

正文组织架构：

1. Background

2. Methods

2.1 Dataset

2.2 Classifier evaluation

2.3 Random forest

2.4 Binary support vector machine

2.5 k-NN classifier

2.6 Naive Bayes classifier

3. Results and discussion

3.1 Random forest classification

3.2 SVM parameter optimization

3.3 SVM classification using linear, polynomial, sigmoid and RBF kernels

3.4 k-NN classification

3.5 Comparison between random forest, SVM, k-NN and naive Bayes classifiers

3.6 Feature comparison

3.7 Construction of final classifiers using all compounds

4. Conclusions

正文部分内容摘录：

1. Biological Problem: What biological problems have been solved in this paper?

predicting renal-proximal-tubular-cell (PTC) toxicity based on IL-6 and -8 expression levels
prediction of drug-induced nephrotoxicity

2. Main discoveries: What is the main discoveries in this paper?

random forest classifiers have the highest cross-validated classification performance (mean balanced accuracy = 87.8%, sensitivity = 89.4%, and specificity = 85.9%)
found that IL-8 is more predictive than IL-6, but a combination of both markers gives higher classification accuracy.
show that random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%).
We found that the performance differences between random forest classifiers using the different tested numbers of trees are very small

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

The dataset was collected from HPTCs derived from three different human donors (HPTC1, 2, and 3).
The cells were exposed to 41 compounds for 16 hours, and the expression levels of IL-6 and -8 were determined using quantitative polymerase chain reaction (qPCR)
random forest, support vector machine, k-nearest-neighbor and naive Bayes classifiers
3-fold cross validation procedure
We used three different classification performance indicators: sensitivity, specificity, and balanced accuracy.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

traditional methods:

linear and manually determined thresholds to predict nephrotoxicity
most of these existing predictors use simple linear thresholds to distinguish between the effects of nephrotoxic and non-nephrotoxic compounds, even though more than one markers (or "features") are measured from the cells.
These manually-determined thresholds may be subject to human biases, and have difficulties in distinguishing features that are non-linearly separable
imple classifiers based on manually determined thresholds were used, and cross validation was not used to test the performance of these classifiers.
The mean balanced accuracy of these classifiers constructed using all the data points (compounds) was reported to be 80.7%

This better performance is likely due to the non-linear and multivariate decision boundaries generated by the random forest classifier.
Our methods are general and can be easily applied to test and identify other potential nephrotoxicity markers based on gene expression levels, metabolic profiles, or cellular phenotypes.

5. Biological Significance: What is the biological significance of these ML methods’ results?

The perfect AUC score indicates that the toxic and non-toxic categories can be fully separated by the random forest classifier.
We also noticed that most of the toxic compounds mis-classified by random forest classifiers are usually also mis-classified by threshold-based classifiers.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

further increased by combining markers from these different modalities, and also by increasing the number of training compounds.

7. Mine Question(Optional)

random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%).
99.3% ?? Overfitting problem?

盲人骑瞎马5555

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Paper reading (五十)：Supervised prediction of drug-induced nephrotoxicity

论文题目：Supervised prediction of drug-induced nephrotoxicity based on interleukin-6 and -8 expression levelsscholar 引用：22页数：9发表时间：2014.12发表刊物：BMC Bioinformatics作者：Ran Su, Yao Li, Daniele Zink, ...
复制链接

扫一扫