如有不懂,留言评论
化学分子结构识别
首先推荐有兴趣和时间的小伙伴读一篇文章
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00465-0
这篇文章给出了众多在线和离线工具的介绍和评估。
易用的在线平台推荐
我个人使用过两个在线平台
-
https://cactus.nci.nih.gov/cgi-bin/osra/index.cgi(https://cactus.nci.nih.gov/osra/)
这是流传甚广的一个在线api,支持将图片转换为sd文件和smiles式。给大家举个例子: -
https://molvec.ncats.io/#
这是mol2vec的作者做的一个前端,集成了mol2vec, osra以及imago。同样举个例子:
使用python从图片提取分子结构1
go top
osra本人没有找到提供的api,可以使用selnium来模拟浏览器点击进行提取。贴一下我本人写的代码
import time
import os
import json
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def upload(img_path):
select_xpath = '/html/body/center/form/table/tbody/tr[3]/td[1]/input[2]'
submit_xpath = '//*[@id="b_upload"]'
clear_xpath = '/html/body/center/form/table/tbody/tr[3]/td[1]/center/input[2]'
firefox.find_element_by_xpath(clear_xpath).click()
firefox.find_element_by_xpath(select_xpath).send_keys(img_path)
submit_button = firefox.find_element_by_xpath(submit_xpath).click()
def get_information():
get_smiles_xpath = '//*[@id="b_getsmiles"]'
smiles_xpath = '/html/body/center/form/table/tbody/tr[3]/td[2]/input[1]'
firefox.find_element_by_xpath(get_smiles_xpath).click()
text = firefox.find_element_by_xpath(smiles_xpath).get_attribute("value")
return text
def main():
global firefox
firefox = webdriver.Firefox()
firefox.get('https://cactus.nci.nih.gov/cgi-bin/osra/index.cgi')
wait = WebDriverWait(firefox, 20)
img_folder = 'your_img_folder'
imgs76 = os.listdir(img_folder)
smiles_list = {}
for img in imgs76:
upload(img_folder.replace('/', '\\') + '\\' + img)
time.sleep(7)
try:
tmp_text = get_information()
firefox.save_screenshot('res/' + img.rstrip('.jpg') + '.png')
smiles_list[img] = tmp_text
except:
smiles_list[img] = 'Sorry, no structures found'
print(smiles_list)
firefox.quit()
with open('result.json', 'w') as fp:
json.dump(smiles_list, fp)
if __name__ == '__main__':
main()
上面的代码
- 打开你的目标文件夹(里面是一堆分子结构截图)
- 然后模拟浏览器行为进行批量处理。每个图片等待十秒并将结果截图。
需要注意的是你得下载一个webdriver,firefox或者chrome driver都可以(代码中firefox = webdriver.Firefox()
是用了Firefox的driver,driver百度即可下载)。
如果你需要保存sd文件,那么模拟浏览器点击Get SD File即可。可以留言询问本人
使用python从图片提取分子结构2
go top
import requests
import os
'''
This script transfer molecule image to mol format (saved as sdf files).
You should change the image folder (specified in line 11) ane the name of result file (specified in line 26).
'''
def get_sdf(name, img_folder):
imgs76 = os.listdir(img_folder)
url = 'https://molvec.ncats.io/molvec'
headers = {'Content-Type' : 'image/jpg'}
for imgs in imgs76:
with open('{}/{}'.format(img_folder, img), 'rb') as fp:
r = requests.post(url, data=fp, headers=headers)
with open('{}.sdf'.format(patent), 'a') as fp:
fp.write(r.json()['molvec']['molfile'])
fp.write('\n$$$$\n')
def main():
name = 'abc'
img_folder = "***"
get_sdf(name, img_folder)
if __name__ == '__main__':
main()
上面的代码将img_folder下的所有分子图片转换到名为name的sdf文件中(sdf是多个mol文件的合成,是化学结构的标准格式)