scholar 引用:204
页数:7
发表时间:2013.04
发表刊物:PLOS ONE
作者:Michael P. Menden1, ..., Julio Saez-Rodriguez
摘要:
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measuring them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.
Discussion:
- with non-parametric machine learning algorithms such as neural networks and random forests.
- This means that the error (on average) of predicting a given IC50 value is the same in the multi-drug and single-drug models (RMSE) and, since some drugs are active at different concentration ranges, the model is able to cover a much larger dynamic range with a similar precision.
- The predictive ability of our methods for individual values is still limited and could be further improved by extending the set of input features with additional layers of molecular characterization of the cell lines, such as basal transcriptional profiles and phosphoproteomic data.
- Our method uses purely experimental data, but additional predictive power can be expected from including knowledge of the underlying network
- A fertile ground for further research is investigating the application of other modeling techniques, including linear regression methods (e.g. LASSO, ElasticNets).
- The implications of our results go beyond their utility to optimise the experimental design of drug screenings.
Introduction:
- an accurate tool to impute missing IC50s and estimate them for novel cell lines would be of great value for drug screening design.
- in silico methods to accurately predict the effectiveness of drugs based on the molecular making of tumours (i.e. genome, transcriptome) would be a major milestone towards personalized therapies for cancer patients based on molecular biomarkers.
正文组织架构:
1. Introduction
2. Results
3. Discussion
4. Materials and Methods
4.1 Training dataset
4.2 Blind test dataset
4.3 Features
4.4 Cross-validation
4.5 Machine learning
4.6 Data access
4.7 Software access
正文部分内容摘录:
1. Biological Problem: What biological problems have been solved in this paper?
- Prediction of Cancer Cell Sensitivity to Drugs
2. Main discoveries: What is the main discoveries in this paper?
- potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure
- providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.
3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?
- neural networks and random forests
- input chemical features from the drugs, besides the molecular characterization of the cell lines
- data was pre-processed to include 689 chemical descriptors of the drugs and 138 genomic features for differentiating the cell lines, resulting in an input space of 827 features.
4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?
- traditional methods: Quantitative Structure-Activity Relationship (QSAR) approaches to predicting whole-cell activity of molecules based of their chemical properties
- QSAR approaches exclusively based on chemical features cannot distinguish between resistant and sensitive cell lines.
- This integrative approach not only integrates two complementary streams of information, but also allows the model to be trained with much larger amounts of data, which is often a key factor to improve predictive performance
5. Biological Significance: What is the biological significance of these ML methods’ results?
- Neural networks were able to impute missing log(IC50) values on the test sets with an averaged Pearson correlation coefficient (Rp), coefficient of determination (R2) and root mean square error (RMSE) (Text S1) of 0.85, 0.72 and 0.83 across all 111 drugs
6. Prospect: What are the potential applications of these machine learning methods in biological science?
- The predictive ability of our methods for individual values is still limited and could be further improved by extending the set of input features with additional layers of molecular characterization of the cell lines, such as basal transcriptional profiles and phosphoproteomic data.
- Our method uses purely experimental data, but additional predictive power can be expected from including knowledge of the underlying network
- A fertile ground for further research is investigating the application of other modeling techniques, including linear regression methods (e.g. LASSO, ElasticNets).
7. Mine Question(Optional)
- We used the Java implementation from Encog 3.0.1
Is there any chance we need to use Encog now?