通过本地启动演示进行推理
为了使推理正常工作,需要创建另一个conda环境(rdkit),启动将smiles字符串转换为Torch几何图的后端进程。
创建rdkit环境
conda create -c conda-forge -n rdkit rdkit
conda activate rdkit
pip install numpy
效果如下:
smiles2graph_demo.py解析
1.导包,主要涉及到smiles和分子图数据的转换,所以需要用到rdkit.Chem。Chem是一款用于化学信息的python库,提供了广泛的工具和函数,可以根据读入的分子,进行分子的特征提取和描述符计算。
from rdkit import Chem
import numpy as np
import time
import pickle
from rdkit.Chem.rdchem import BondType, BondDir, ChiralType
import os
import datetime
2.bond_dir(bond) 接受一个化学键对象,返回化学键的方向。
def bond_dir(bond):
d = bond.GetBondDir()
return BOND_DIR[d]
3.bond_type(bond)接收一个化学键对象,返回化学键的类型
def bond_type(bond):
t = bond.GetBondType()
return BOND_TYPE[t]
4.atom_chiral(atom) 接收一个原子对象,返回原子的手性
def atom_chiral(atom):
c = atom.GetChiralTag()
return CHI[c]
5.atom_to_feature(atom)、bond_to_feature(bond) 接收一个原子/化学键对象,返回一个包含原子/化学键特征的列表
def atom_to_feature(atom):
return [atom.GetAtomicNum() - 1, atom_chiral(atom)]
def bond_to_feature(bond):
return [bond_type(bond), bond_dir(bond)]
6.smiles2graph(smiles_string) smiles to graph:将smiles字符串转换为图形数据对象,首先将smiles字符串转换为RDkit的分子对象mol,然后使用前面的2~5的函数将原子和化学键转换为特征列表,最后构建包含边索引,边特征、节点特征和节点数量的字典,并将其返回为图形对象
def smiles2graph(smiles_string):
"""
Converts SMILES string to graph Data object
:input: SMILES string (str)
:return: graph object
"""
mol = Chem.MolFromSmiles(smiles_string)
# atoms
atom_features_list = []
for atom in mol.GetAtoms():
atom_features_list.append(atom_to_feature(atom))
x = np.array(atom_features_list, dtype = np.int64)
# bonds
num_bond_features = 2
if len(mol.GetBonds()) > 0: # mol has bonds
edges_list = []
edge_features_list = []
for bond in mol.GetBonds():
i = bond.GetBeginAtomIdx()
j = bond.GetEndAtomIdx()
edge_feature = bond_to_feature(bond)
# add edges in both directions
edges_list.append((i, j))
edge_features_list.append(edge_feature)
edges_list.append((j, i))
edge_features_list.append(edge_feature)
# data.edge_index: Graph connectivity in COO format with shape [2, num_edges]
edge_index = np.array(edges_list, dtype = np.int64).T
# data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
edge_attr = np.array(edge_features_list, dtype = np.int64)
else: # mol has no bonds
edge_index = np.empty((2, 0), dtype = np.int64)
edge_attr = np.empty((0, num_bond_features), dtype = np.int64)
graph = dict()
graph['edge_index'] = edge_index
graph['edge_feat'] = edge_attr
graph['node_feat'] = x
graph['num_nodes'] = len(x)
return graph
7.convert_chembl() 用于不断检查,如果发现新输入的smiles,就将其转化为图形对象并保存。通过读入文本文件中的内容,解析时间戳和smiles字符串,然后使用6的函数把smiles字符串转化为图形对象,并将时间戳和图形对象保存到一个pickle文件中
def convert_chembl():
"""
Once started, constantly checks a text file.
If it finds new content, convert it to a graph and save it.
"""
old_t0 = 0
txt = "dataset/tmp_smiles.txt"
while True:
time.sleep(1)
if not os.path.isfile(txt):
continue
with open(txt, "rt") as f:
res = f.read().strip("\n ")
if not res:
continue
tmp = res.split(" ")
t0 = float(tmp[0])
if t0 <= old_t0:
continue
smi = " ".join(tmp[1:]).strip("\n ")
tt = datetime.datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
print(f"At time {tt}: {repr(res)}")
old_t0 = t0
g = smiles2graph(smi)
out = {"timestamp": time.time(), "graph": g}
with open("dataset/tmp_smiles.pkl", "wb") as f:
pickle.dump(out, f)
如此,convert_chembl()函数被调用,启动持续的转换过程。