山东大学项目实训纪实2024.5.4

最新推荐文章于 2024-08-12 14:30:00 发布

silence_li_

最新推荐文章于 2024-08-12 14:30:00 发布

阅读量404

点赞数 3

分类专栏：项目实训文章标签： python 开发语言人工智能

本文链接：https://blog.csdn.net/silence_li_/article/details/139322156

版权

项目实训专栏收录该内容

16 篇文章 0 订阅

订阅专栏

通过本地启动演示进行推理

为了使推理正常工作，需要创建另一个conda环境（rdkit），启动将smiles字符串转换为Torch几何图的后端进程。

创建rdkit环境

conda create -c conda-forge -n rdkit rdkit

conda activate rdkit

pip install numpy

效果如下：

smiles2graph_demo.py解析

1.导包，主要涉及到smiles和分子图数据的转换，所以需要用到rdkit.Chem。Chem是一款用于化学信息的python库，提供了广泛的工具和函数，可以根据读入的分子，进行分子的特征提取和描述符计算。

from rdkit import Chem
import numpy as np
import time
import pickle
from rdkit.Chem.rdchem import BondType, BondDir, ChiralType
import os
import datetime

2.bond_dir(bond) 接受一个化学键对象，返回化学键的方向。

def bond_dir(bond):
    d = bond.GetBondDir()
    return BOND_DIR[d]

3.bond_type(bond)接收一个化学键对象，返回化学键的类型

def bond_type(bond):
    t = bond.GetBondType()
    return BOND_TYPE[t]

4.atom_chiral(atom) 接收一个原子对象，返回原子的手性

def atom_chiral(atom):
    c = atom.GetChiralTag()
    return CHI[c]

5.atom_to_feature(atom)、bond_to_feature(bond) 接收一个原子/化学键对象，返回一个包含原子/化学键特征的列表

def atom_to_feature(atom):

    return [atom.GetAtomicNum() - 1, atom_chiral(atom)]

def bond_to_feature(bond):
    return [bond_type(bond), bond_dir(bond)]

6.smiles2graph(smiles_string) smiles to graph：将smiles字符串转换为图形数据对象，首先将smiles字符串转换为RDkit的分子对象mol，然后使用前面的2~5的函数将原子和化学键转换为特征列表，最后构建包含边索引，边特征、节点特征和节点数量的字典，并将其返回为图形对象

def smiles2graph(smiles_string):
    """
    Converts SMILES string to graph Data object
    :input: SMILES string (str)
    :return: graph object
    """

    mol = Chem.MolFromSmiles(smiles_string)

    # atoms
    atom_features_list = []
    for atom in mol.GetAtoms():
        atom_features_list.append(atom_to_feature(atom))
    x = np.array(atom_features_list, dtype = np.int64)

    # bonds
    num_bond_features = 2
    if len(mol.GetBonds()) > 0: # mol has bonds
        edges_list = []
        edge_features_list = []
        for bond in mol.GetBonds():
            i = bond.GetBeginAtomIdx()
            j = bond.GetEndAtomIdx()

            edge_feature = bond_to_feature(bond)

            # add edges in both directions
            edges_list.append((i, j))
            edge_features_list.append(edge_feature)
            edges_list.append((j, i))
            edge_features_list.append(edge_feature)

        # data.edge_index: Graph connectivity in COO format with shape [2, num_edges]
        edge_index = np.array(edges_list, dtype = np.int64).T

        # data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
        edge_attr = np.array(edge_features_list, dtype = np.int64)

    else:   # mol has no bonds
        edge_index = np.empty((2, 0), dtype = np.int64)
        edge_attr = np.empty((0, num_bond_features), dtype = np.int64)

    graph = dict()
    graph['edge_index'] = edge_index
    graph['edge_feat'] = edge_attr
    graph['node_feat'] = x
    graph['num_nodes'] = len(x)

    return graph

7.convert_chembl() 用于不断检查，如果发现新输入的smiles，就将其转化为图形对象并保存。通过读入文本文件中的内容，解析时间戳和smiles字符串，然后使用6的函数把smiles字符串转化为图形对象，并将时间戳和图形对象保存到一个pickle文件中

def convert_chembl():
    """
    Once started, constantly checks a text file. 
    If it finds new content, convert it to a graph and save it.
    """
    old_t0 = 0
    txt = "dataset/tmp_smiles.txt"
    while True:
        time.sleep(1)
        if not os.path.isfile(txt):
            continue
        with open(txt, "rt") as f:
            res = f.read().strip("\n ")
        if not res:
            continue
        tmp = res.split(" ")
        t0 = float(tmp[0])
        if t0 <= old_t0:
            continue
        smi = " ".join(tmp[1:]).strip("\n ")
        tt = datetime.datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
        print(f"At time {tt}: {repr(res)}")
        old_t0 = t0
        g = smiles2graph(smi)
        out = {"timestamp": time.time(), "graph": g}
        with open("dataset/tmp_smiles.pkl", "wb") as f:
            pickle.dump(out, f)

如此，convert_chembl()函数被调用，启动持续的转换过程。

silence_li_

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
山东大学项目实训纪实2024.5.4

6.smiles2graph(smiles_string) smiles to graph：将smiles字符串转换为图形数据对象，首先将smiles字符串转换为RDkit的分子对象mol，然后使用前面的2~5的函数将原子和化学键转换为特征列表，最后构建包含边索引，边特征、节点特征和节点数量的字典，并将其返回为图形对象。通过读入文本文件中的内容，解析时间戳和smiles字符串，然后使用6的函数把smiles字符串转化为图形对象，并将时间戳和图形对象保存到一个pickle文件中。通过本地启动演示进行推理。
复制链接

扫一扫