化学标识符(Chemical Identifier)获得和转换

化学标识符(Chemical Identifier)获得和转换

一、通过在线的网络服务器

1. Chemical Identifier Resolver

来自美国国立卫生研究院的NCI/CADD group,主页为(https://cactus.nci.nih.gov/chemical/structure)。可以通过该服务器的URL API,快速批量的获取化学标识符。它也有对应的Python API包 CIRpy

URL API scheme:

https://cactus.nci.nih.gov/chemical/structure/"structure identifier"/"representation"

Example: Chemical name to Standard InChIKey:

https://cactus.nci.nih.gov/chemical/structure/aspirin/stdinchikey

注意:输入中的特殊字符如#需要进行转换成别的字符,不然会报错。

Note: Triple bonds in SMILES strings represented by ‘#’ have to be URL-escaped as ‘%23’ (e.g. the SMILES string of ethyne has to be specified as ‘C%23C’ instead of ‘C#C’ if encoded as part of a URL). Similarly, question marks, which can occur in InChI, need to be URL-escaped as %3F.

通过python的网络请求(request):

import requests

opsin = 'https://cactus.nci.nih.gov/chemical/structure/{0}/{1}'
ide = 'C#C'  # SMILES of ethyne
ide = ide.replace('#', '%23')
rep = 'stdinchikey'  # the desired output is StdInChIKey
# for more representations
"""
rep = 'smiles'      # the desired output is SMILES
rep = 'stdinchi'    # the desired output is StdInChI
rep = 'iupac_name'  # the desired output is IUPAC name
rep = 'cas'         # the desired output is CAS Registry Number
rep = 'formula'     # the desired output is Chemical Formula
!!! also see in https://cactus.nci.nih.gov/chemical/structure_documentation
"""
url = opsin.format(ide, rep)
response = requests.get(url)
response.raise_for_status()
print(response.text)  # InChIKey=HSFWRNGVRCDJHI-UHFFFAOYSA-N

2. OPSIN web service

主页:(https://opsin.ch.cam.ac.uk/),
由剑桥大学分子信息学中心维护,这个网站可以将IUPAC命名转换为多种化学标识符,使用教程详见(https://opsin.ch.cam.ac.uk/instructions.html)。也可以直接使用他们的Java软件包,见他们的github网站。

通过python的网络请求(request):

import json
import requests

opsin = 'https://opsin.ch.cam.ac.uk/opsin/{0}.{1}'
iupac_name = 'Acetylene'
ext = 'json'
reps = ('smiles', 'stdinchikey')
url = opsin.format(iupac_name, ext)
response = requests.get(url)
response.raise_for_status()
representation = json.loads(response.text)
for rep in reps:
    print(rep + ': ' + representation[rep])

3. NIST Chemistry WebBook

主页:https://webbook.nist.gov/chemistry/
NIST Chemistry WebBook来自美国国家标准与技术研究院(NIST)。它也有对应的Python API包 NistChemPy

4. 其它

ChemSpider,PubChem,SciFinder等等
更多信息可见该网站,如8.3节讲的各种web数据库的API。
还有R软件包 webchem用于访问许多的化学数据库。

二、化学信息学相关的软件包(software/toolkit related to Cheminformatics)

1. rdkit

import rdkit
# https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html
mol = rdkit.Chem.MolFromSmiles(smiles)
# input molecule and return the standard InChI string
rdkit.Chem.inchi.MolToInchi(mol, options='', logLevel=None, treatWarningAsError=False)
# input molecule and return the standard InChI key
rdkit.Chem.inchi.MolToInchiKey(mol, options='')

2. openbabel

Open Babel 基本用法中文教程官方手册

for StdInChIKey: (https://openbabel.org/docs/dev/FileFormats/InChIKey.html)

obabel -:CC(=O)Cl -oinchi     # InChI=1S/C2H3ClO/c1-2(3)4/h1H3
obabel -:CC(=O)Cl -oinchikey  # WETWJCDKMRHUPV-UHFFFAOYSA-N

3. 其它

详见(https://zhuanlan.zhihu.com/p/74540059)或相关文献。
可能有用的链接

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值