一、代码来源:
计算rmsd参考:GitHub - 0ut0fcontrol/isoRMSD: calculating RMSD between 2 conformers with different atom names
这个github代码是在命令行中输入命令运行的,命令如下:
python isoRMSD.py -r crystal_ligand.sdf -p crystal_out_1.sdf -o rmsd.txt
这样执行计算一对小分子还算方便,小分子格式支持mol2,sdf,mol,pdb,虽然支持这么多种格式,但是计算的时候有些能用mol2算却不能用sdf算,如果计算不出来可以转换一下小分子的格式没准就可以了。但是这个代码批量计算小分子rmsd时实现就很麻烦,所以把主要用的的代码拿出来,需要批量的部分,保存结果的部分可以根据需求自己写写。-r后面接的是参考分子。
二、主要代码:
根据需求写读取多个文件的循环 ,结果保存也可以根据需求自己写
import os
import sys
import math
from numpy.lib.shape_base import column_stack
import pandas as pd
from os.path import split
from oddt.toolkits import rdk as toolkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.AllChem import AlignMol
rfile = 'crystal_ligand.sdf'#参考小分子
pfile = 'crystal_out_0.sdf'
ref_fmt = rfile.split(".")[-1]
ref_oddt = next(toolkit.readfile(ref_fmt, rfile))
ref_rdk = Chem.RemoveHs(ref_oddt.Mol)
probe_fmt = pfile.split(".")[-1]
probe_oddt_supp = toolkit.readfile(probe_fmt, pfile)
data = []
for i, probe_oddt in enumerate(probe_oddt_supp):
if probe_oddt is None:
name = "NaN"
rmsd_notalign = 10000.0
rmsd_align = 10000.0
else:
probe_rdk = Chem.RemoveHs(probe_oddt.Mol)
try:
name = probe_rdk.GetProp("_Name")
name = "_".join(name.split())
except KeyError as e:
name = "NaN"
ref = AllChem.AssignBondOrdersFromTemplate(probe_rdk, ref_rdk)
rmsd_notalign = GetBestRMSD(probe_rdk, ref, align=False)
data.append((name,rmsd_notalign))
print(data)
#下面是定义计算的函数,可以不动,上面加上批处理的一些代码就好
def GetBestRMSD(probe, ref, refConfId=-1, probeConfId=-1, maps=None, align=True):
# When mapping the coordinate of probe will changed!!!
ref.pos = orginXYZ(ref)
probe.pos = orginXYZ(probe)
try:
name = probe.GetProp("_Name")
except KeyError as e:
name = "NaN"
if not maps:
matches = ref.GetSubstructMatches(probe, uniquify=False)
if not matches:
raise ValueError(
"mol %s does not match mol %s"
% (ref.GetProp("_Name"), probe.GetProp("_Name"))
)
if len(matches) > 1e6:
print(
"{} matches detected for molecule {}, this may lead to a performance slowdown.".format(
len(matches), name
)
)
maps = [list(enumerate(match)) for match in matches]
bestRMSD = 10000.0
for amap in maps:
if align:
rmsd = AlignMol(probe, ref, probeConfId, refConfId, atomMap=amap)
else:
rmsd = RMSD_NotAlign(probe, ref, amap)
bestRMSD = min(bestRMSD, rmsd)
return bestRMSD
def RMSD_NotAlign(probe, ref, amap):
rmsd = 0.0
atomNum = ref.GetNumAtoms() + 0.0
for (pi, ri) in amap:
posp = probe.pos[pi]
posf = ref.pos[ri]
rmsd += dist_2(posp, posf)
rmsd = math.sqrt(rmsd / atomNum)
return rmsd
def orginXYZ(mol):
mol_pos = {}
for i in range(0, mol.GetNumAtoms()):
pos = mol.GetConformer().GetAtomPosition(i)
mol_pos[i] = pos
return mol_pos
def dist_2(atoma_xyz, atomb_xyz):
dis2 = 0.0
for i, j in zip(atoma_xyz, atomb_xyz):
dis2 += (i - j) ** 2
return dis2
阅读代码细节的一些参考链接:
1)Python中if __name__ == ‘__main__‘:的作用和原理
(20条消息) Python中if __name__ == ‘__main__‘:的作用和原理_农村詹姆斯的博客-CSDN博客
2)split(“.“)[-1]:(20条消息) split(“.“)[-1]的怎么用_花落雨微扬的博客-CSDN博客_split[-1]
3)parser = argparse.ArgumentParser():
(20条消息) 常用编程记录——parser = argparse.ArgumentParser()_umbrellalalalala的博客-CSDN博客
4)python 判断变量是否是 None 的三种写法:
(20条消息) python 判断变量是否是 None 的三种写法_Python热爱者的博客-CSDN博客
5)python中{},[],(),字典,列表,元组详解: Python基础 括号()[]{}的详解_python_脚本之家 (jb51.net)
6)GetSubstructMatches():rdkit.Chem.rdchem module — The RDKit 2022.09.1 documentation
RDKit|分子子结构的删除、替换与切割 - 简书 (jianshu.io)
7)AllChem.AssignBondOrdersFromTemplate():
rdkit.Chem.AllChem module — The RDKit 2022.09.1 documentation
8)关于rdkit包中的函数可以用下面这个网页索引:Index — The RDKit 2022.09.1 documentation