Paper reading (六十八):A ML approach to predicting protein–ligand binding affinity

论文题目:A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking 

scholar 引用:289

页数:7

发表时间:2010.03

发表刊物:Bioinformatics

作者:Pedro J. Ballester, John B. O. Mitchell

摘要:

Motivation: Accurately predicting the binding affinities of large sets of diverse protein–ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.

Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.

正文组织架构:

1. Introduction

2. Materials

2.1 Validation using the PDBbind benchmark

3. Methods

3.1 Intermolecular interaction features

3.2 RFs for regression

3.3 Scoring functions for comparative assessment

4.  Results and discussion

4.1 Building RF-Score

4.2 RF-Score on the PDBbind benchmark

4.3 Comparison with the state of the art

5. Conslusions 

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • predicting the binding affinities
  • predicting how strongly the docked conformation binds to the target (scoring)

2. Main discoveries: What is the main discoveries in this paper?

  • Results show that RF-Score is a very competitive scoring function.
  • Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.
  • conclusions: RF-Score has been shown to be particularly effective as a re-scoring function and can be used for virtual screening and lead optimization purposes.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • a novel scoring function (RF-Score)
  • the first application of Random Forests (RFs) to predicting protein–ligand binding affinity.
  • The process of training RF to provide a new scoring function (RF-Score) starts by separating the 195 complexes of the core set from the remaining 1105 complexes in the refined set. The former constitutes the test set of the PDBbind benchmark, while the latter is used here as training data.
  • The PDBbind benchmark essentially consists of testing the predictions of scoring functions on the 2007 core set, which comprises 195 diverse complexes with measured binding affinities spanning more than 12 orders of magnitude

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • circumvents the need for problematic modelling assumptions via non-parametric machine learning
  • Random Forest was used to implicitly capture binding effects that are hard to model explicitly.
  • RF does not assume any a priori relationship between the descriptors that characterize the complex and binding data, and thus should be sufficiently flexible to account for the wide variety of binding mechanisms observed across diverse protein–ligand complexes.
  • RF is particularly suited for this task, as it has been shown to perform very well in non-linear regression.
  • RF can be also used to estimate variable importance as a way to identify those protein–ligand contacts that contribute the most to the binding affinity prediction across known complexes. 

5. Biological Significance: What is the biological significance of these ML methods’ results?

  •  It is very encouraging that this initial version has already obtained a high correlation with measured binding affinities on such a diverse test set.
  •  interpretability is currently a drawback of this approach.However, it is important to realize that, although the terms comprising model-based scoring functions provide a description of protein–ligand binding, such a description is only as good as the accuracy of the scoring function.
  • This is quantified through Pearson's correlation coefficient (R), defined as the ratio of the covariance of both variables over the product of their standard deviations (SDs). In this training set, R = 0.953, indicating a very high linear dependence between these variables over the training data. Another commonly reported performance measure is the root mean square error (RMSE)

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • we plan to study the use of distance-dependent features, which could result in further performance improvements given that the strength of intermolecular interactions naturally depends on atomic separation. 
  • less coarse atom types will be investigated by considering the atom's hybridization state and bonding environment. 
  • machine learning-based scoring functions constitute an effective way to assimilate the fast growing volume of high-quality structural and interaction data in the public domain and are expected to lead to more accurate and general predictions of binding affinity

7. Mine Question(Optional)

•Initial version of RF-Score. We can pay more attention on the RF-Score v4.
基于SSM框架的智能家政保洁预约系统,是一个旨在提高家政保洁服务预约效率和管理水平的平台。该系统通过集成现代信息技术,为家政公司、家政服务人员和消费者提供了一个便捷的在线预约和管理系统。 系统的主要功能包括: 1. **用户管理**:允许消费者注册、登录,并管理他们的个人资料和预约历史。 2. **家政人员管理**:家政服务人员可以注册并更新自己的个人信息、服务类别和服务时间。 3. **服务预约**:消费者可以浏览不同的家政服务选项,选择合适的服务人员,并在线预约服务。 4. **订单管理**:系统支持订单的创建、跟踪和管理,包括订单的确认、完成和评价。 5. **评价系统**:消费者可以在家政服务完成后对服务进行评价,帮助提高服务质量和透明度。 6. **后台管理**:管理员可以管理用户、家政人员信息、服务类别、预约订单以及处理用户反馈。 系统采用Java语言开发,使用MySQL数据库进行数据存储,通过B/S架构实现用户与服务的在线交互。系统设计考虑了不同用户角色的需求,包括管理员、家政服务人员和普通用户,每个角色都有相应的权限和功能。此外,系统还采用了软件组件化、精化体系结构、分离逻辑和数据等方法,以便于未来的系统升级和维护。 智能家政保洁预约系统通过提供一个集中的平台,不仅方便了消费者的预约和管理,也为家政服务人员提供了一个展示和推广自己服务的机会。同时,系统的后台管理功能为家政公司提供了强大的数据支持和决策辅助,有助于提高服务质量和管理效率。该系统的设计与实现,标志着家政保洁服务向现代化和网络化的转型,为管理决策和控制提供保障,是行业发展中的重要里程碑。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值