关于rdkit 错误2w08_ligand: warning - O.co2 with non C.2 or S.o2 neighbor.

1 问题:

读取 PDBBindv2019的数据集,尝试把所有配体的mol2文件转换成对应smiles表达式。大约超过1千个出现问题。

在这里插入图片描述
主要问题就是‘warning - O.co2 with non C.2 or S.o2 neighbor’。

2 原因:

Phosphate group - warning O.co2 with non C.2 or S.o2 neighbor
[Rdkit-discuss] Phosphate containing mol2 files

Since the mol2 format is a bit of an inconsistent mess, where different toolkits/packages use different dialects of the format (or different meanings for the atom types), we chose to support the dialect generated by corina.

简单来说,mol2的格式不止一种,而RDKit采用了其中一种:corina。

from rdkit import Chem
Chem.MolFromMol2File():文档信息

Docstring:
MolFromMol2File( (str)molFileName [, (bool)sanitize=True [, (bool)removeHs=True [, 		(bool)cleanupSubstructures=True]]]) -> Mol :
Construct a molecule from a Tripos Mol2 file.

  NOTE:
    The parser expects the atom-typing scheme used by Corina.
    Atom types from Tripos' dbtranslate are less supported.
    Other atom typing schemes are unlikely to work.

  ARGUMENTS:
                                  
    - fileName: name of the file to read

    - sanitize: (optional) toggles sanitization of the molecule.
      Defaults to true.

    - removeHs: (optional) toggles removing hydrogens from the molecule.
      This only make sense when sanitization is done.
      Defaults to true.

    - cleanupSubstructures: (optional) toggles standardizing some 
      substructures found in mol2 files.
      Defaults to true.

  RETURNS:

    a Mol object, None on failure.



C++ signature :
    class RDKit::ROMol * __ptr64 MolFromMol2File(char const * __ptr64 [,bool=True [,bool=True [,bool=True]]]) Type:      function

3 解决方法

找了半天,直接换成openbabei来读取。

from openbabel import pybel
pybel.readfile () 文档

Required parameters:
   format - see the informats variable for a list of available
            input formats
   filename

Optional parameters:
   opt    - a dictionary of format-specific options
            For format options with no parameters, specify the
            value as None.

You can access the first molecule in a file using the next() method
of the iterator (or the next() keyword in Python 3):
    mol = readfile("smi", "myfile.smi").next() # Python 2
    mol = next(readfile("smi", "myfile.smi"))  # Python 3

You can make a list of the molecules in a file using:
    mols = list(readfile("smi", "myfile.smi"))

You can iterate over the molecules in a file as shown in the
following code snippet:
>>> atomtotal = 0
>>> for mol in readfile("sdf", "head.sdf"):
...     atomtotal += len(mol.atoms)
...

4 相关代码:

import tqdm
import numpy as np
path = './v2019-other-PL/'
def process_chunk(chunk):
    for i, row in chunk.iterrows():
        file_name = path + row["id"] + "/" + row["id"] + "_ligand.sdf"
        try:
            for mol in pybel.readfile('sdf', file_name):
                chunk.at[i, 'Smiles'] = str(mol).split()[0]
                print(row["id"],":",str(mol).split()[0])
        except:
            pass
    return chunk
# df_pro 之前已经处理过的pdframe
df_imputation = df_pro.copy()
chunks = np.array_split(df_imputation, 100)
out_smiles = []
for chunk in tqdm(chunks):
    out_chunks = process_chunk(chunk)
    out_smiles.append(out_chunks)

参考链接:
rdkit 读取各种小分子
批量转换.sdf文件为smiles到结构化数据表格的python脚本

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要提取AutoDock Vina对接文件中的对接分数和互相作用能,请使用以下示例代码: ```python def extract_scores(filename): dock_scores = [] inter_intra_scores = [] with open(filename, 'r') as file: for line in file: if line.startswith('REMARK VINA RESULT:'): parts = line.split() dock_score = float(parts[3]) dock_scores.append(dock_score) elif line.startswith('REMARK INTER + INTRA:'): parts = line.split() inter_intra_score = float(parts[3]) inter_intra_scores.append(inter_intra_score) return dock_scores, inter_intra_scores # 提取对接分数和互相作用能 dock_scores, inter_intra_scores = extract_scores('output.pdbqt') # 打印对接分数 for i, score in enumerate(dock_scores): print(f'Docking score for ligand {i+1}: {score}') # 打印互相作用能 for i, score in enumerate(inter_intra_scores): print(f'Inter + Intra score for ligand {i+1}: {score}') ``` 在上面的代码中,我们定义了一个`extract_scores`函数来提取对接分数和互相作用能。它打开文件并逐行读取,当遇到以`REMARK VINA RESULT:`开头的行时,提取对接分数,当遇到以`REMARK INTER + INTRA:`开头的行时,提取互相作用能。然后将它们分别添加到`dock_scores`和`inter_intra_scores`列表中。 你可以调用`extract_scores`函数并传递AutoDock Vina输出文件的路径(例如`output.pdbqt`),然后得到两个包含对接分数和互相作用能的列表。接下来,你可以根据需要进一步处理和使用这些数据。 希望这可以帮助你提取所需的数据。如果你有任何其他问题,请随时提问!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值