Chapter23: Molecule Ideation Using Matched Molecular Pairs

reading notes of《Artificial Intelligence in Drug Design》


1.Introduction

  • Matched Molecular Pair (MMP) analysis is one of the many ways medicinal chemists can understand SAR data. The attraction of MMP analysis lies in its ability to intuitively relate structural changes to changes in a rele- vant property.

2.MMP Algorithms

  • There are several implementations of the MMP algorithm in the literature. One of the most used MMP generation algorithm that has been adapted by many institutions was originally published by Hussain and Rea.
  • The common core fragment is termed the context (typically >50% of the molecule by heavy atom count). Two molecules with the same context are termed an MMP. The variable part between the molecule pair is termed the transform and encodes a change from fragment X to fragment Y. The transform is typically represented as a SMIRKS reaction.
  • A similar procedure has been extended for MMPs with a chemical core change. In this case multiple cuts or fragmentation operations are applied to the molecules. Where the terminal groups are all the same, but the core is different, an MMP is defined with a core or scaffold change encoded. Figure 1 shows a pictorial demonstration of the MMP algorithm.请添加图片描述
  • Deriving MMP’s across a large set of molecules with associated physicochemical properties or assay readouts allows for generalization of the Transforms across the dataset. If two or more com- pound pairs share the same transform the data can be aggregated. For each transform, statistics are derived to express the change for a chosen endpoint as a mean change with associated standard deviation or related statistics.

3.BioDig: The GSK Transform Database

  • For a dataset of 300K compounds approximately 2.3 million MMPs can be extracted. This necessitates a solution for bulk storage and fast query reporting. These requirements along with the process of indexing transforms lend themselves to a relational database. This database is named BioDig at GSK.
    请添加图片描述

4.Large Scale Molecule Ideation Using MMPs

  • MMPs have been historically used to interrogate the effect of a chemical transform on physicochemical properties such as LogD, clearance, and membrane permeability.
  • At GSK we have extended its applicability as a molecule library generation tool.
  • For example, the effect on solubility when a primary amide is replaced by a secondary amide is different for an aliphatic and an aromatic context (Refer Fig. 3).
    请添加图片描述
  • SMARTS patterns can be generalized with aliphatic and aromatic flags as opposed to full atom type information. This extends a single transform into 6 related forms as shown in Fig. 4.请添加图片描述

5.Quantifying the Value of an MMP-Based Knowledge Base

  • A key aspect in the application of an MMP-based knowledge base is quantifying its usefulness in a medicinal chemistry design scenario. Ideally, the database must be comprehensive enough to cover the full range of transforms that could be used. Each transform in the database must also be derived from enough data to make it statistically valid.
  • To help answer these questions, a comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database as compared to those in a larger 2.1 million compound diversity set. A second comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database against a subset of transforms seen in historical small molecule discovery projects.

6.The Ever-Growing Tail of New Transforms

  • A linear relationship was seen between the number of molecules in the dataset and the final number of derived matched pairs and transforms. This is seen in Table 1 and Fig. 5.
    请添加图片描述
    请添加图片描述

7.The Subset of Useful MedChem Transforms

  • The knowledge database was analyzed to assess how many of the Top 100, 500, 1000, 2500, 5K, 10K, 25K, 50K, and 100K MedChem project transforms were contained in the database. The results are given in Table 2.
    请添加图片描述

8.Assessing MMPs as a Molecule Generation Tool

  • Three tests were used to assess the performance of molecule generators used at GSK including an MMP-based molecule generator.
    • BioDig—a matched molecular pair-based algorithm described earlier in this chapter.

    • BRICS—a fragment replacement-based algorithm.

    • RG2Smi—a language processing machine learning algorithm that translates a reduced graph input to a SMILES output.

    • The first explored the ability of the algorithms to reproduce ideas generated by a team of medicinal chemists.

    • The second test explored whether the additional ~ 103 molecules generated by the algorithms were considered good ideas by the medicinal chemists.

    • Finally, the algorithms were assessed for their ability to generate molecules in legacy drug discovery programs from a single starting molecule in the series.

  • The tests were comparing three inhouse molecule generators (Fig. 6).
    请添加图片描述

9.First Test - Human Inclusion

请添加图片描述

10.Scond Test - Human Imitation

请添加图片描述

11.Third Test - Legacy Projects

请添加图片描述

12.Conclusion

  • MMP analysis has emerged as a key method in the medicinal chemistry toolbox and there are many examples of publicly available algorithms and applications. Many companies have worked to sum- marize MMPs into databases of transforms.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

_森罗万象

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值