生成的分子图像是否可以识别为SMILES(Decimer),然后再将识别后的SMILES转换为图像?

一、2D化学结构识别: 

0、【2022综述】Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Review of techniques and models used in optical chemical structure recognition in images and scanned documents | Journal of Cheminformatics | Full Text

1、在线转换网址:DECIMER Web Application

它对应的代码地址为:GitHub - Kohulan/DECIMER_Short_Communication 

2、 DECIMER【目前用起来最准确

https://github.com/Kohulan/DECIMER-Image_Transformer

这里我用的是2:DECIMER,按照readme上面的做就可以了

3、Img2Mol: inferring molecules from pictures(也挺好的)

GitHub - bayer-science-for-a-better-life/Img2Mol

4、Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model(2022)

Applied Sciences | Free Full-Text | Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model

没有代码

5、scribe【文章写的比DECIMER准确,但是测试起来,还是DECIMER更加准确】

GitHub - thomas0809/MolScribe: Robust Molecular Structure Recognition with Image-to-Graph Generation

备注:我测试过,其实并有没有很好地能确定image2SMILES转换的model,也不知道怎么判断这个转换后的SMILES与原始的image之间的置信度(因为一个 very invalid的分子也可以转为SMILES)


Img2Mol 使用方法:

这里的3比2好用,但是3最好使用local-cddd,因为不使用本地的,还需要联网,可能会出现很多错误,

conda env create -f environment.local-cddd.yml
conda activate img2mol
pip install .

然后:

If you are working with the local CDDD installation, please * download and unzip the CDDD model and 将 directory default_model to path/to/anaconda3/envs/img2mol/lib/python3.6/site-packages/cddd/data/


DECIMER使用方法(目前我在用这个):

我自己做好的位置和文档在:D:\Pycharm_workspace\MolImageGeneration\Uni-Dock\image2smiles\image2smiles.py

下载:GitHub - Kohulan/DECIMER-Image_Transformer: DECIMER: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer

conda create --name DECIMER python=3.9

conda activate DECIMER

安装包:

pip install decimer

一定要再安装tensorflow==2.10.1,否则会报无法使用GPU的错误

pip install tensorflow==2.10.1

直接上代码:

image2smiles2image.py的代码:

其中input_images是生成器G生成的image;
image2smiles_all.csv是是G生成的image然后转为所有的的smiles
image2smiles_validity.csv是是G生成的image然后转为有效的smiles
image2smiles_unvalidity.csv是是G生成的image然后转为无效的的smiles
image2smiles2image:是G生成的image然后转为smiles再转为image的文件夹

command:

python image2smiles.py --input_images_path Tests/xxx --image2smiles2image_save_path Tests/image2smiles2image/ --image2smiles_all Tests/image2smiles_all.csv --image2smiles_validity Tests/image2smiles_validity.csv --image2smiles_unvalidity Tests/image2smiles_unvalidity.csv

code:

注意路径问题:# 在windows下使用“\\”,在linux下使用“/”,注意切换

from DECIMER import predict_SMILES

import glob
import os
import csv
from rdkit import Chem
import argparse
from rdkit.Chem import Draw
import time


"""
conda activate DECIMER
## Adenosine_A2a_receptor-sample-1k
python image2smiles/image2smiles.py --input_images_path data/Adenosine_A2a_receptor-sample-1k \
--image2smiles2image_save_path data/image2smiles2image/ \
--image2smiles_all data/image2smiles_all.csv \
--image2smiles_validity data/image2smiles_validity.csv \
--image2smiles_unvalidity data/image2smiles_unvalidity.csv

## Dopamine_D3_receptor-sample-1k
python image2smiles/image2smiles.py --input_images_path data/Dopamine_D3_receptor-sample-1k \
--image2smiles2image_save_path data/image2smiles2image-Dopamine_D3_receptor-sample-1k/ \
--image2smiles_all data/image2smiles_all-Dopamine_D3_receptor-sample-1k.csv \
--image2smiles_validity data/image2smiles_validity-Dopamine_D3_receptor-sample-1k.csv \
--image2smiles_unvalidity data/image2smiles_unvalidity-Dopamine_D3_receptor-sample-1k.csv


"""

# 记录开始时间
start_time = time.time()
# Get all png files under the input folder
parser = argparse.ArgumentParser(description='Testing script', add_help=False)
parser.add_argument('--input_images_path', default='../../eval_output_images/QM9/generator_images', help='Input images folder')
parser.add_argument('--image2smiles2image_save_path', default='../../eval_output_images/QM9/image2smiles2image/')
parser.add_argument('--image2smiles_all', default='../../eval_output_images/QM9/image2smiles_all.csv')
parser.add_argument('--image2smiles_validity', default='../../eval_output_images/QM9/image2smiles_validity.csv')
parser.add_argument('--image2smiles_unvalidity', default='../../eval_output_images/QM9/image2smiles_unvalidity.csv')
args = parser.parse_args()

input_img_path = glob.glob(args.input_images_path + "/*.[jp][pn]g")
image2smiles2image_save_path = args.image2smiles2image_save_path


def mkdir(path):
    folder = os.path.exists(path)
    if not folder:  # 判断是否存在文件夹如果不存在则创建为文件夹
        os.makedirs(path)  # makedirs 创建文件时如果路径不存在会创建这个路径
        print("--- create new folder...  ---")
    else:
        print("---  There is this folder!  ---")


mkdir(image2smiles2image_save_path)

i = 0
unrecover_images = 0
for file in input_img_path:
    # 在windows下使用“\\”,在linux下使用“/”,注意切换
    file_name = file.split('/')[-1]

    SMILES = predict_SMILES(file)
    i = i + 1
    print("The current process image ", i, " is :", file, ", And the SMILES is :", SMILES)

    f_validity = open(args.image2smiles_validity, 'a', newline='', encoding='utf-8')
    csv_writer_validity = csv.writer(f_validity)

    f_unvalidity = open(args.image2smiles_unvalidity, 'a', newline='', encoding='utf-8')
    csv_writer_unvalidity = csv.writer(f_unvalidity)

    # save all
    f_all = open(args.image2smiles_all, 'a', newline='', encoding='utf-8')
    csv_writer_all = csv.writer(f_all)
    csv_writer_all.writerow([file_name, SMILES])
    f_all.close()

    # save valid
    try:
        mol = Chem.MolFromSmiles(SMILES)
        canonical_smiles = Chem.MolToSmiles(mol, isomericSmiles=True)
        img = Draw.MolsToGridImage([mol], molsPerRow=1, subImgSize=(256, 256))
        img_save_path = str(image2smiles2image_save_path + file_name)
        img.save(img_save_path)

        if canonical_smiles :
            csv_writer_validity.writerow([file_name, SMILES])
            f_validity.close()

    # save unvalid
    # 不能从SMILES转换为分子结构图片【可能是包含R基的,但是不一定是错误的】
    except Exception as e:
        unrecover_images = unrecover_images + 1
        print("The current process image ", i, " is :", file, ", And the SMILES is :", SMILES, " is not un-validity !")
        csv_writer_unvalidity.writerow([file_name, SMILES])
        f_unvalidity.close()

print("The totel images is :", i, " , And the unrecover_images is :", unrecover_images, ", And the success rate is:", (i - unrecover_images) / i)


end_time = time.time()
total_time = end_time - start_time
hours = int(total_time // 3600)
minutes = int((total_time % 3600) // 60)
seconds = total_time % 60
print(f"总共运行的时间: {hours} 小时 {minutes} 分钟 {seconds:.2f} 秒")

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

马鹏森

太谢谢了

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值