机器学习 阴性集的选择 —— drug-target interactions (DTIs)

文章目录


前言

在机器学习中,阴性集的选择  会影响结果的准确性
高度可靠的阴性样本可以帮助分类模型学习明确的决策边界,从而有助于提高性能。


一、已存在的阴性集选择方法

1、未知的DTIs

drug-target interaction的预测模型中,通常把已知的DTIs当作阳性集,未知的DTIs或其随机子集当作阴性集。

缺点:可能包含潜在的候选DTIs,不准确的阴性集会大大影响结果的准确性

ref 1:Chen R, Liu X, Jin S, Lin J, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules. 2018;23(9):2208. Published 2018 Aug 31. doi:10.3390/molecules23092208
ref 2: Wang JT, Liu W, Tang H, et al. Screening drug target proteins based on sequence information. J Biomed Inform 2014;49:269–74.


 

2. wang et al.的两种策略

目的:旨在提高交叉验证的预测准确性,并过滤掉尽可能多的非药物靶蛋白。

These two strategies aim at increasing the prediction accuracy in crossvalidation and filtering out as many non-drug-target proteins as possible, respectively

2.1 策略一

The training datasets have two classes. One is called the positive dataset (proteins that are known as DT proteins), and the other is called the negative dataset (proteins that are not DT proteins)

药物蛋白的deviation定义为:

Xij:表示第i(药物)蛋白的第j个属性
Xi = ( xi1, xi2,… , xim):蛋白im个属性
Xj = ( x1j, x2j,…, xnj):属性j的向量

作者的实验中,选择结果>0.42的蛋白作为阴性集,因为

2.2 策略二

The negative dataset (non-DT proteins) was chosen from the proteins whose mean values of protein sequence properties have a larger difference from the positive data.

未知蛋白i作为阴性集的概率为:

In the author’s experiments, they supposed each proteinhas a probability of 0.5 to be considered as the negative sample.

refWang JT, Liu W, Tang H, et al. Screening drug target proteins based on sequence information. J Biomed Inform 2014;49:269–74.
 

3. 基于guilt-by-association反向选择

Based on the “guilt-by-association” assumption that similar drugs tend to interact with similar targets, the existing methods have achieved remarkable performance.

Thus it is also reasonable to select reliable negative samples based on its converse negative proposition, i.e., a drug dissimilar to all drugs known to interact with a target is less likely to bind the target and vice versa.

ref : Zheng Y, Peng H, Zhang X, Zhao Z, Gao X, Li J. Old drug repositioning and new drug discovery through similarity learning from drug-target joint feature spaces. BMC Bioinformatics. 2019;20(Suppl 23):605. Published 2019 Dec 27. doi:10.1186/s12859-019-3238-y
 

4. OCSVM: 基于阳性推测阴性集

One-class Support Vector Machine (OCSVM) [11] has demonstrated its advantages for classification in the absence of positive or negative samples [12].

OCSVM requires one-class data only, thus it is an ideal technique to identify reliable negatives (i.e., outliners) for drug-target prediction where only positives are available.

ref1Zheng Y, Peng H, Zhang X, Zhao Z, Gao X, Li J. Old drug repositioning and new drug discovery through similarity learning from drug-target joint feature spaces. BMC Bioinformatics. 2019;20(Suppl 23):605. Published 2019 Dec 27. doi:10.1186/s12859-019-3238-y
ref2 : Xiao Y, Wang H, Xu W. Parameter selection of gaussian kernel for one-class svm. IEEE Trans Cybernet. 2014;45(5):941–53. doi: 10.1109/TCYB.2014.2340433
ref3 :Khan SS, Madden MG. Irish Conference on Artificial Intelligence and Cognitive Science. Dublin: Springer; 2009. A survey of recent trends in one class classification.


 

5. 结合guilt-by-association逆否命题和OCSVM

In this work, we propose a method to construct highly-reliable negative samples for drug target prediction by a pairwise drug-target similarity measurement and OCSVM with a high-recall constraint.

On one hand, we measure the pair-wise similarity between every two drug-target interactions by combining the chemical similarity between their drugs and the Gene Ontology-based similarity between their targets. Then we calculate the accumulative similarity with all known drug-target interactions for every unobserved drug-target interaction.

On the other hand, we obtain the signed distance using OCSVM learned from the known interactions with high recall (≥0.95) for each unobserved drug-target interaction. Unobserved DTPs with lower accumulative similarities or lower signed distances are less likely to be positives, thus of high-probability to be negatives.

Consequently, we compute the score for each unobserved drug-target interaction via averaging its accumulative similarity and signed distance after normalizing all accumulative similarities and signed distances to the range [0,1].

Unobserved interactions with lower scores are preferentially served as reliable negative samples for the classification algorithms. The specific negative number is determined by the negative sample ratio which will be discussed in the experiment section.

文章代码和结果数据下载网址:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933655/

refZheng Y, Peng H, Zhang X, Zhao Z, Gao X, Li J. Old drug repositioning and new drug discovery through similarity learning from drug-target joint feature spaces. BMC Bioinformatics. 2019;20(Suppl 23):605. Published 2019 Dec 27. doi:10.1186/s12859-019-3238-y

 

二. 文献中阴性集选择

1. 药物对的阴性集

Drug targets were extracted from DrugBank and drug pairs were classified as a “shared-target” pair if they had at least one target in common.

We used fivefold cross validation to split our set of drug pairs into a test and training set containing 20% and 80% of the drug pairs respectively.

We sub-sampled the two classes (ST and non-ST drug pairs) and required the ratio of true positives (ST pairs) to true negatives (non-ST pairs) to remain the same as the total set.

ref : Madhukar NS, Khade PK, Huang L, et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nat Commun. 2019;10(1):5221. Published 2019 Nov 19. doi:10.1038/s41467-019-12928-6


 

2. 药靶阴性集


ref:基于机器学习的药物靶蛋白预测,上海大学硕士学位论文,2015.5
https://www.doc88.com/p-9798638678036.html

 

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

wangchuang2017

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值