基于PostgreSQL使用rdkit做化合物检索

最新推荐文章于 2023-01-05 17:01:18 发布

冂吉麒麟

最新推荐文章于 2023-01-05 17:01:18 发布

阅读量491

点赞数 3

分类专栏： PostgreSQL rdkit 文章标签：数据库 postgresql

本文链接：https://blog.csdn.net/z765219/article/details/105734614

版权

PostgreSQL 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

rdkit

1 篇文章 0 订阅

订阅专栏

文章目录

- 基于`PostgreSQL`使用`rdkit`做化合物检索

基于`PostgreSQL`使用`rdkit`做化合物检索

一、`rdkit`安装使用

运行一下命令，可参考rdkit 安装；http://www.rdkit.org/docs/Install.html

--安装rdkit
conda install rdkit;

--引入rdkit
create extension if not exists rdkit;

二、根据化合物SMILES生成相应的表用于子结构搜索与相似度搜索

基础准备，假设表 `compoundStructures`用于存储化合物的smiles

--创建表
Create table  compoundStructures
(
    compId BIGSERIAL  primary key ,
    smi varchar(1000) not null
)
--构建数据
insert into compoundStructures (smi) values ('CCNCCCOc1ccc2cc(NC(=O)c3ccc(O)c(CC=C(C)C)c3)c(=O)oc2c1C'),
('CCNCCCOc1ccc2cc(NC(=O)c3ccc(O)c(CC=C(C)C)c3)c(=O)oc2c1C');

抽象生成专用于子结构搜索的表

使用一下SQL生成表结构及添加数据

select * into rdk.mols from (select compId,mol_from_smiles(smi::cstring) m  from compoundstructures) tmp where m is not null;

--创建索引
create index molidx on rdk.mols using gist(m);
alter table rdk.mols add primary key (compId);

用于子结构搜索的SQL

--使用查询子结构的sql
select count(*) from rdk.mols where m@>'c1cccc2c1nncc2'

抽象生成专用于相似度查询

创建专属表并且插入相应数据，再建索引

--根据rdk.mols生成fps作为专用于相似度查询的表
select molregno,torsionbv_fp(m) as torsionbv,morganbv_fp(m) as mfp2,featmorganbv_fp(m) as ffp2 into rdk.fps from rdk.mols;
create index fps_ttbv_idx on rdk.fps using gist(torsionbv);
create index fps_mfp2_idx on rdk.fps using gist(mfp2);
create index fps_ffp2_idx on rdk.fps using gist(ffp2);
alter table rdk.fps add primary key (compId);

设置相似度临界值，并且查询相似度

--设置临界值（筏值）
set rdkit.tanimoto_threshold=0.6;
--相似度搜索案例
select count(*) from rdk.fps where mfp2%morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1');

创建函数用于查询显示临界值以上的相似结构并排序，并且以表的形式展示

create or replace function get_mfp2_neighbors(smiles text)
    returns table(compId bigint, m mol, similarity double precision) as
  $$
  select molregno,m,tanimoto_sml(morganbv_fp(mol_from_smiles($1::cstring)),mfp2) as similarity
  from rdk.fps join rdk.mols using (molregno)
  where morganbv_fp(mol_from_smiles($1::cstring))%mfp2
  order by morganbv_fp(mol_from_smiles($1::cstring))<%>mfp2;
  $$ language sql stable ;
  
 --使用案例
 select * from get_mfp2_neighbors('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1') limit 10;
 
 --结果表
 compId |                                m                                 |    similarity
----------+------------------------------------------------------------------+-------------------
   751668 | COc1ccc2nc(NC(=O)[C@@H]3CCCN3C(=O)c3cccs3)sc2c1                  | 0.619718309859155
   740754 | Cc1ccc(NC(=O)C2CCCN2C(=O)c2cccs2)cc1C                            | 0.606060606060606
   732905 | O=C(Nc1ccc(S(=O)(=O)N2CCCC2)cc1)C1CCCN1C(=O)c1cccs1              | 0.602941176470588
   810850 | Cc1cc(C)n(-c2ccc(NC(=O)C3CCCCN3C(=O)c3cccs3)cc2)n1               | 0.583333333333333
  1224407 | O=C(Nc1cccc(S(=O)(=O)N2CCCC2)c1)C1CCCN1C(=O)c1cccs1              | 0.579710144927536
   779258 | CC1CCN(S(=O)(=O)c2ccc(NC(=O)[C@@H]3CCCN3C(=O)c3cccs3)cc2)CC1     | 0.569444444444444
   472441 | Cc1ccc2nc(-c3ccc(NC(=O)C4CCN(S(=O)(=O)C(C)C)CC4)cc3)sc2c1        | 0.569444444444444
   745651 | Cc1ccc(NC(=O)[C@@H]2CCCN2C(=O)c2cccs2)cc1S(=O)(=O)N1CCCCC1       | 0.567567567567568
   472510 | Cc1ccc2nc(-c3ccc(NC(=O)C4CCN(S(=O)(=O)c5cccc(Cl)c5)CC4)cc3)sc2c1 | 0.565789473684211
  1233426 | Cc1cccc2sc(NC(=O)[C@@H]3CCCN3C(=O)c3cccs3)nc12                   | 0.563380281690141

冂吉麒麟

关注

3
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
基于PostgreSQL使用rdkit做化合物检索

文章目录基于`PostgreSQL`使用`rdkit`做化合物检索一、`rdkit`安装使用二、根据化合物SMILES生成相应的表用于子结构搜索与相似度搜索基础准备，假设表 `compoundStructures`用于存储化合物的smiles抽象生成专用于子结构搜索的表抽象生成专用于相似度查询基于PostgreSQL使用rdkit做化合物检索一、rdkit安装使用运行一下命令，可参考rdki...
复制链接

扫一扫