rdkit 化学指纹(fingerprint)和相似性

RDKit具有多种内置功能,可用于生成分子指纹,并使用他们来计算分子相似性

一、引入所需库

#! /usr/bin/python
# coding: utf-8

from rdkit import Chem
from rdkit import DataStructs

from rdkit.Chem import AllChem
from rdkit.Chem import Draw

from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import MACCSkeys
from rdkit.Chem.AtomPairs import Pairs
from rdkit.Chem.AtomPairs import Torsions

from rdkit.SimDivFilters.rdSimDivPickers import MaxMinPicker

import matplotlib.pyplot as plt  # 画图

二、化学指纹

2.1 拓扑指纹 Chem.RDKFingerprint(mol)

ms = [
    Chem.MolFromSmiles('CCOC'),
    Chem.MolFromSmiles('CCO'),
    Chem.MolFromSmiles('COC'),
]
img = Draw.MolsToGridImage(
    ms,
    molsPerRow=3,
    subImgSize=(200, 200),
    legends=['' for x in ms]
)
img.save('/Users/zeoy/st/drug_development/st_rdcit/img/mol20.jpg')

拓扑指纹

fps = [Chem.RDKFingerprint(x) for x in ms]
print(fps)
# [<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bc49f30>,
# <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bc49f80>,
# <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bdf5030>]

ds_1 = DataStructs.FingerprintSimilarity(fps[0], fps[1])
print(ds_1)  # 0.6
ds_2 = DataStructs.FingerprintSimilarity(fps[0], fps[2])
print(ds_2)  # 0.4
ds_3 = DataStructs.FingerprintSimilarity(fps[2], fps[1])
print(ds_3)  # 0.25

# 也可以设置相似度指标
ds_4 = DataStructs.FingerprintSimilarity(
    fps[0], fps[1], metric=DataStructs.DiceSimilarity)
print(ds_4)  # 0.75

2.2 MACCS 指纹MACCSkeys.GenMACCSKeys(mol)

fps = [MACCSkeys.GenMACCSKeys(x) for x in ms]
ds_1 = DataStructs.FingerprintSimilarity(fps[0], fps[1])
print(ds_1)  # 0.5
ds_2 = DataStructs.FingerprintSimilarity(fps[0], fps[2])
print(ds_2)  # 0.5384615384615384
ds_3 = DataStructs.FingerprintSimilarity(fps[2], fps[1])
print(ds_3)  # 0.21428571428571427

2.3 原子对Atom Pairs

ms = [
    Chem.MolFromSmiles('C1CCC1OCC'),
    Chem.MolFromSmiles('CC(C)OCC'),
    Chem.MolFromSmiles('CCOCC')
]
img = Draw.MolsToGridImage(
    ms,
    molsPerRow=3,
    subImgSize=(200, 200),
    legends=['' for x in ms]
)
img.save(
    '/Users/zeoy/st/drug_development/st_rdcit/img/mol21.jpg'
)

在这里插入图片描述

pairFps = [Pairs.GetAtomPairFingerprint(x) for x in ms]
print(pairFps)
# 由于包含在原子对指纹中的位空间很大,因此他们以稀疏的方式存储为字典形式
d = pairFps[-1].GetNonzeroElements()
print(d)  # {541732: 1, 558113: 2, 558115: 2, 558146: 1, 1606690: 2, 1606721: 2}
print(d[541732])  # 1

位描述也可以像如下所示展示

de = Pairs.ExplainPairScore(558115)
print(de)  # (('C', 1, 0), 3, ('C', 2, 0))
# The above means: C with 1 neighbor and 0 pi electrons which is 3 bonds from a C with 2 neighbors and 0 pi electrons

碳带有一个邻位孤电子和0个π电子,这是因为碳与两个邻位原子和氧原子形成3个化学键。

2.4 拓扑扭曲topological torsions

tts = [Torsions.GetTopologicalTorsionFingerprintAsIntVect(x) for x in ms]
d_ds = DataStructs.DiceSimilarity(tts[0], tts[1])
print(d_ds)  # 0.16666666666666666

2.5 摩根指纹(圆圈指纹)AllChem.GetMorganFingerprint(mol,2)

通过将Morgan算法应用于一组用户提供的原子不变式,可以构建这一系列的指纹。生成Morgan指纹时,还必须提供指纹的半径

m1 = Chem.MolFromSmiles('Cc1ccccc1')
m2 = Chem.MolFromSmiles('Cc1ncccc1')

fp1 = AllChem.GetMorganFingerprint(m1, 2)
fp2 = AllChem.GetMorganFingerprint(m2, 2)
d_mf = DataStructs.DiceSimilarity(fp1, fp2)
print(d_mf)  # 0.55

Morgan指纹像原子对和拓扑扭转一样,默认情况系按使用计数,但有也可以将他们计算为位向量

fp1 = AllChem.GetMorganFingerprintAsBitVect(m1,
  • 19
    点赞
  • 84
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值