Paper reading (五十):Supervised prediction of drug-induced nephrotoxicity

论文题目:Supervised prediction of drug-induced nephrotoxicity based on interleukin-6 and -8 expression levels

scholar 引用:22

页数:9

发表时间:2014.12

发表刊物:BMC Bioinformatics

作者:Ran Su, Yao Li, Daniele Zink, Lit-Hsin Loo

摘要:

Background
Drug-induced nephrotoxicity causes acute kidney injury and chronic kidney diseases, and is a major reason for late-stage failures in the clinical trials of new drugs. Therefore, early, pre-clinical prediction of nephrotoxicity could help to prioritize drug candidates for further evaluations, and increase the success rates of clinical trials. Recently, an in vitro model for predicting renal-proximal-tubular-cell (PTC) toxicity based on the expression levels of two inflammatory markers, interleukin (IL)-6 and -8, has been described. However, this and other existing models usually use linear and manually determined thresholds to predict nephrotoxicity. Automated machine learning algorithms may improve these models, and produce more accurate and unbiased predictions.
Results
Here, we report a systematic comparison of the performances of four supervised classifiers, namely random forest, support vector machine, k-nearest-neighbor and naive Bayes classifiers, in predicting PTC toxicity based on IL-6 and -8 expression levels. Using a dataset of human primary PTCs treated with 41 well-characterized compounds that are toxic or not toxic to PTC, we found that random forest classifiers have the highest cross-validated classification performance (mean balanced accuracy = 87.8%, sensitivity = 89.4%, and specificity = 85.9%). Furthermore, we also found that IL-8 is more predictive than IL-6, but a combination of both markers gives higher classification accuracy. Finally, we also show that random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%).
Conclusions
Our results suggest that a random forest classifier can be used to automatically predict drug-induced PTC toxicity based on the expression levels of IL-6 and -8.

结论:

  •  All parameters of the classifiers were determined automatically without any user intervention. 
  • This better performance is likely due to the non-linear and multivariate decision boundaries generated by the random forest classifier.
  • Our methods are general and can be easily applied to test and identify other potential nephrotoxicity markers based on gene expression levels, metabolic profiles, or cellular phenotypes.
  •  further increased by combining markers from these different modalities, and also by increasing the number of training compounds. 
  • An important application of our automated classifier is to predict nephrotoxicity of novel chemical compounds identified from large-scale screening of small-molecule or natural product libraries. 

Background:

  • most of these existing predictors use simple linear thresholds to distinguish between the effects of nephrotoxic and non-nephrotoxic compounds, even though more than one markers (or "features") are measured from the cells.
  • These manually-determined thresholds may be subject to human biases, and have difficulties in distinguishing features that are non-linearly separable

正文组织架构:

1. Background

2. Methods

2.1 Dataset

2.2 Classifier evaluation

2.3 Random forest

2.4 Binary support vector machine

2.5 k-NN classifier

2.6 Naive Bayes classifier

3. Results and discussion

3.1 Random forest classification

3.2 SVM parameter optimization

3.3 SVM classification using linear, polynomial, sigmoid and RBF kernels

3.4 k-NN classification

3.5 Comparison between random forest, SVM, k-NN and naive Bayes classifiers

3.6 Feature comparison

3.7 Construction of final classifiers using all compounds

4. Conclusions

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • predicting renal-proximal-tubular-cell (PTC) toxicity based on IL-6 and -8 expression levels
  • prediction of drug-induced nephrotoxicity 

2. Main discoveries: What is the main discoveries in this paper?

  • random forest classifiers have the highest cross-validated classification performance (mean balanced accuracy = 87.8%, sensitivity = 89.4%, and specificity = 85.9%)
  • found that IL-8 is more predictive than IL-6, but a combination of both markers gives higher classification accuracy.
  • show that random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%).
  • We found that the performance differences between random forest classifiers using the different tested numbers of trees are very small 

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • The dataset was collected from HPTCs derived from three different human donors (HPTC1, 2, and 3).
  • The cells were exposed to 41 compounds for 16 hours, and the expression levels of IL-6 and -8 were determined using quantitative polymerase chain reaction (qPCR)
  • random forest, support vector machine, k-nearest-neighbor and naive Bayes classifiers
  •  3-fold cross validation procedure
  • We used three different classification performance indicators: sensitivity, specificity, and balanced accuracy.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • traditional methods: 
  1. linear and manually determined thresholds to predict nephrotoxicity
  2. most of these existing predictors use simple linear thresholds to distinguish between the effects of nephrotoxic and non-nephrotoxic compounds, even though more than one markers (or "features") are measured from the cells.
  3. These manually-determined thresholds may be subject to human biases, and have difficulties in distinguishing features that are non-linearly separable
  4. imple classifiers based on manually determined thresholds were used, and cross validation was not used to test the performance of these classifiers. 
  5. The mean balanced accuracy of these classifiers constructed using all the data points (compounds) was reported to be 80.7%
  • This better performance is likely due to the non-linear and multivariate decision boundaries generated by the random forest classifier.
  • Our methods are general and can be easily applied to test and identify other potential nephrotoxicity markers based on gene expression levels, metabolic profiles, or cellular phenotypes.

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • The perfect AUC score indicates that the toxic and non-toxic categories can be fully separated by the random forest classifier.
  • We also noticed that most of the toxic compounds mis-classified by random forest classifiers are usually also mis-classified by threshold-based classifiers. 

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • further increased by combining markers from these different modalities, and also by increasing the number of training compounds. 

7. Mine Question(Optional)

  • random forest classifiers trained automatically on the whole dataset have higher mean balanced accuracy than a previous threshold-based classifier constructed for the same dataset (99.3% vs. 80.7%). 
  • 99.3% ?? Overfitting problem? 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值