RDKit具有多种内置功能,可用于生成分子指纹,并使用他们来计算分子相似性
一、引入所需库
#! /usr/bin/python
# coding: utf-8
from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import MACCSkeys
from rdkit.Chem.AtomPairs import Pairs
from rdkit.Chem.AtomPairs import Torsions
from rdkit.SimDivFilters.rdSimDivPickers import MaxMinPicker
import matplotlib.pyplot as plt # 画图
二、化学指纹
2.1 拓扑指纹 Chem.RDKFingerprint(mol)
ms = [
Chem.MolFromSmiles('CCOC'),
Chem.MolFromSmiles('CCO'),
Chem.MolFromSmiles('COC'),
]
img = Draw.MolsToGridImage(
ms,
molsPerRow=3,
subImgSize=(200, 200),
legends=['' for x in ms]
)
img.save('/Users/zeoy/st/drug_development/st_rdcit/img/mol20.jpg')
fps = [Chem.RDKFingerprint(x) for x in ms]
print(fps)
# [<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bc49f30>,
# <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bc49f80>,
# <rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x11bdf5030>]
ds_1 = DataStructs.FingerprintSimilarity(fps[0], fps[1])
print(ds_1) # 0.6
ds_2 = DataStructs.FingerprintSimilarity(fps[0], fps[2])
print(ds_2) # 0.4
ds_3 = DataStructs.FingerprintSimilarity(fps[2], fps[1])
print(ds_3) # 0.25
# 也可以设置相似度指标
ds_4 = DataStructs.FingerprintSimilarity(
fps[0], fps[1], metric=DataStructs.DiceSimilarity)
print(ds_4) # 0.75
2.2 MACCS 指纹MACCSkeys.GenMACCSKeys(mol)
fps = [MACCSkeys.GenMACCSKeys(x) for x in ms]
ds_1 = DataStructs.FingerprintSimilarity(fps[0], fps[1])
print(ds_1) # 0.5
ds_2 = DataStructs.FingerprintSimilarity(fps[0], fps[2])
print(ds_2) # 0.5384615384615384
ds_3 = DataStructs.FingerprintSimilarity(fps[2], fps[1])
print(ds_3) # 0.21428571428571427
2.3 原子对Atom Pairs
ms = [
Chem.MolFromSmiles('C1CCC1OCC'),
Chem.MolFromSmiles('CC(C)OCC'),
Chem.MolFromSmiles('CCOCC')
]
img = Draw.MolsToGridImage(
ms,
molsPerRow=3,
subImgSize=(200, 200),
legends=['' for x in ms]
)
img.save(
'/Users/zeoy/st/drug_development/st_rdcit/img/mol21.jpg'
)
pairFps = [Pairs.GetAtomPairFingerprint(x) for x in ms]
print(pairFps)
# 由于包含在原子对指纹中的位空间很大,因此他们以稀疏的方式存储为字典形式
d = pairFps[-1].GetNonzeroElements()
print(d) # {541732: 1, 558113: 2, 558115: 2, 558146: 1, 1606690: 2, 1606721: 2}
print(d[541732]) # 1
位描述也可以像如下所示展示
de = Pairs.ExplainPairScore(558115)
print(de) # (('C', 1, 0), 3, ('C', 2, 0))
# The above means: C with 1 neighbor and 0 pi electrons which is 3 bonds from a C with 2 neighbors and 0 pi electrons
碳带有一个邻位孤电子和0个π电子,这是因为碳与两个邻位原子和氧原子形成3个化学键。
2.4 拓扑扭曲topological torsions
tts = [Torsions.GetTopologicalTorsionFingerprintAsIntVect(x) for x in ms]
d_ds = DataStructs.DiceSimilarity(tts[0], tts[1])
print(d_ds) # 0.16666666666666666
2.5 摩根指纹(圆圈指纹)AllChem.GetMorganFingerprint(mol,2)
通过将Morgan算法应用于一组用户提供的原子不变式,可以构建这一系列的指纹。生成Morgan指纹时,还必须提供指纹的半径
m1 = Chem.MolFromSmiles('Cc1ccccc1')
m2 = Chem.MolFromSmiles('Cc1ncccc1')
fp1 = AllChem.GetMorganFingerprint(m1, 2)
fp2 = AllChem.GetMorganFingerprint(m2, 2)
d_mf = DataStructs.DiceSimilarity(fp1, fp2)
print(d_mf) # 0.55
Morgan指纹像原子对和拓扑扭转一样,默认情况系按使用计数,但有也可以将他们计算为位向量
fp1 = AllChem.GetMorganFingerprintAsBitVect(m1,