ReSimNet:使用Siamese神经网络进行药物反应相似性预测
Abstract
Motivation: Traditional drug discovery approaches identify a target for a disease and find a compound
that binds to the target. In this approach, structures of compounds are considered as the
most important features because it is assumed that similar structures will bind to the same target.
Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates.
However, even though compounds are not structural analogs, they may achieve the desired response.
A new drug discovery method based on drug response, which can complement the
structure-based methods, is needed.
Results: We implemented Siamese neural networks called ReSimNet that take as input two chemical
compounds and predicts the CMap score of the two compounds, which we use to measure the
transcriptional response similarity of the two compounds. ReSimNet learns the embedding vector
of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the
difference between the cosine similarity of the embedding vectors of the two compounds and the
CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response
even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet
outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a
Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis,
we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies
chemical compounds that are relevant to a prototype drug whose mechanism of action is known.
Availability and implementation: The source code and the pre-trained weights of ReSimNet are
available at https://github.com/dmis-lab/ReSimNet.
Contact: kangj@korea.ac.kr
Supplementary information: Supplementary data are available at Bioinformatics online.
动机:
传统的药物发现方法识别一种疾病的靶标并找到与靶标结合的化合物。在这种方法中,化合物的结构被认为是最重要的特征,因为它假定相似的结构会结合到相同的目标上。因此,选择与靶标结合的药物的结构类似物作为候选药物。然而,即使化合物不是结构类似物,它们也可能达到预期的反应。需要一种新的基于药物反应的药物发现方法来补充基于结构的方法。
结果:我们实现了一种称为ReSimNet的Siamese神经网络,它将两个化合物作为输入,并预测这两个化合物的CMap分数,我们使用它来测量这两个化合物的转录响应相似性。ReSimNet学习一个化合物在转录响应空间中的嵌入向量。对ReSimNet进行训练,使两种化合物嵌入向量的余弦相似度与两种化合物的CMap评分之间的差异最小化。ReSimNet可以找到响应相似的化合物对,即使它们可能具有不同的结构。在我们的定量评估中,ReSimNet优于基线机器学习模型。ReSimNet集成模型的Pearson相关系数为0.518,精确度为0.989。此外,在定性分析方面,我们在ZINC15数据库上对ReSimNet进行了测试,
结果表明,ReSimNet成功地识别了与一种作用机制已知的原型药物相关的化合物。
可用性和实现:源代码和ReSimNet的预训练权重可以在https://github.com/dmis-lab/ReSimNet获得。
联系:kangj@korea.ac.kr补充信息:补充数据可在生物信息学在线上获得。