MACCS分子指纹166bit版各点位的信息

 1.What is MACCS keys? 

The MACCS (Molecular ACCess System) keys [1,2] are one of the most commonly used structural keys. They are sometimes referred to as the MDL keys, named after the company that developed them [the MDL Information Systems (now BIOVIA)]. While there are two sets of MACCS keys (one with 960 keys and the other containing a subset of 166 keys), only the shorter fragment definitions are available to the public. These 166 public keys are implemented in popular open-source cheminformatics software packages, including RDKit [3], OpenBabel [4,5], CDK [6,7], etc.

2.What is the exact meaning of each bit? 

The fragment definitions for the MACCS 166 keys can be found in this document:

rdkit/MACCSkeys.py at master · rdkit/rdkit · GitHub

Additionally,MACCS is developed based on SMART which is a language for describing molecular patterns, owing to this, the official document is adhered below in which the meaning of every symbol in SMARTS sequence is  detailed.

Daylight Theory: SMARTS - A Language for Describing Molecular Patterns

The exact definition of  the MACCS 166 keys is displayed in the chart below:

To understand the remark better, it is recomemded that following extra note the reader should acknowlege:

        1.Q in chart means non-carbon group or atom

        2.A  means any group or atom

        3.Some chemical sub-structure graphs are provided as example to explain the SAMRTS code more vividly.

S.NSMARTS CODEREMARKCHINESE REMARK
1('?', 0)ISOTOPE 有无同位素
2('[#104]', 0)limit the above def'n since the RDKit only accepts up to #104目前RDKIT元素类型只更新到103号
3('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0)IVa,Va,VIa Rows 4-6

4-6周期 14-16族元素,包含锗、砷、硒、锡、锑(tī)、碲、

铊、铅、铋

4('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0)Actinide锕系元素
5('[Sc,Ti,Y,Zr,Hf]', 0)Group IIIB,IVB4-6 周期 鈧族 鈦族元素 
6('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0)Lanthanide镧系元素
7('[V,Cr,Mn,Nb,Mo,Tc,Ta,W,Re]', 0)Group VB,VIB,VIIB4-6周期 钒族、铬族、锰族元素
8('[!#6;!#1]1~*~*~*~1', 0)QAAA@1非碳元非氢元4元环,任意键
9('[Fe,Co,Ni,Ru,Rh,Pd,Os,Ir,Pt]', 0) Group VIII4-6周期,铁族、钴族、镍族元素
10('[Be,Mg,Ca,Sr,Ba,Ra]', 0)Group IIa (Alkaline earth)碱土
11('*1~*~*~*~1', 0)4M Ring通配4元环,任意键,任意元素
12('[Cu,Zn,Ag,Cd,Au,Hg]', 0)Group IB,IIB4-6周期,铜族、锌族元素
13('[#8]~[#7](~[#6])~[#6]', 0)ON(C)C
14('[#16]-[#16]', 0)S-S硫单键
15('[#8]~[#6](~[#8])~[#8]', 0)OC(O)O
16('[!#6;!#1]1~*~*~1', 0)QAA@1非碳基 任意2基团 组成的逆时针三元环
17('[#6]#[#6]', 0)CTC双碳三键
18('[#5,#13,#31,#49,#81]', 0)Group IIIA除第7周期外的硼族元素
19('*1~*~*~*~*~*~*~1', 0)7M Ring通配7元环,任意键,任意元素
20('[#14]', 0)Si硅元素
21('[#6]=[#6](~[!#6;!#1])~[!#6;!#1]', 0)C=C(Q)Q碳双键链接非碳非氢元素
22('*1~*~*~1', 0)3M Ring3元环,任意元素,任意键
23('[#7]~[#6](~[#8])~[#8]', 0)NC(O)O
24('[#7]-[#8]', 0)N-O氮、氧单键
25('[#7]~[#6](~[#7])~[#7]', 0)NC(N)N
26('[#6]=;@[#6](@*)@*', 0)C$=C($A)$A
27('[I]', 0)I碘元素
28('[!#6;!#1]~[CH2]~[!#6;!#1]', 0)QCH2Q亚甲基与非碳连接
29('[#15]', 0)P磷元素
30('[#6]~[!#6;!#1](~[#6])(~[#6])~*', 0)CQ(C)(C)A
31('[!#6;!#1]~[F,Cl,Br,I]', 0)QX非碳基接卤素
32('[#6]~[#16]~[#7]', 0)CSN碳硫氮任意键连接
33('[#7]~[#16]', 0)NS氮硫任意键连接
34('[CH2]=*', 0),CH2=A亚甲基双键连接任意
35('[Li,Na,K,Rb,Cs,Fr]', 0)Group IA (Alkali Metal)碱金属族
36('[#16R]', 0)S Heterocycle硫环
37('[#7]~[#6](~[#8])~[#7]', 0)NC(O)N
38('[#7]~[#6](~[#6])~[#7]', 0)NC(C)N
39('[#8]~[#16](~[#8])~[#8]', 0)OS(O)O
40('[#16]-[#8]', 0)S-O硫氧单键
41('[#6]#[#7]', 0)CTN碳氮三键
42('F', 0)F氟元素
43('[!#6;!#1;!H0]~*~[!#6;!#1;!H0]', 0)QHAQH
44('[!#1;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#35;!#53]', 0)other有元素非碳、非氢、非氮、非氧、非硅、非磷、非卤素
45('[#6]=[#6]~[#7]', 0)C=CN
46('Br', 0), Br溴元素
47('[#16]~*~[#7]', 0)SAN硫+任意+氮
48('[#8]~[!#6;!#1](~[#8])(~[#8])', 0)OQ(O)O
49('[!+0]', 0)charge电子
50('[#6]=[#6](~[#6])~[#6]', 0)C=C(C)C
51('[#6]~[#16]~[#8]', 0)CSO碳硫氧任意键连接
52('[#7]~[#7]', 0)NN氮氮任意连接
53('[!#6;!#1;!H0]~*~*~*~[!#6;!#1;!H0]', 0)QHAAAQH非碳基团接任意3元素接非碳基团
54('[!#6;!#1;!H0]~*~*~[!#6;!#1;!H0]', 0)QHAAQH非碳基团接任意2元素接非碳基团
55('[#8]~[#16]~[#8]', 0)OSO氧硫氧任意键
57('[#8R]', 0)O Heterocycle氧环
58('[!#6;!#1]~[#16]~[!#6;!#1]', 0)QSQ非碳基接硫接非碳基
59('[#16]!:*:*', 0)Snot%A%A%代表芳香键
60('[#16]=[#8]', 0)S=O硫氧双键
61('*~[#16](~*)~*', 0)AS(A)A
62('*@*!@*@*', 0)A$!A$A
63('[#7]=[#8]', 0)N=O
64('*@*!@[#16]', 0)A$A!S
65('c:n', 0)C%N
66('[#6]~[#6](~[#6])(~[#6])~*', 0)CC(C)(C)A
67('[!#6;!#1]~[#16]', 0)QS
68('[!#6;!#1;!H0]~[!#6;!#1;!H0]', 0)QHQH
69('[!#6;!#1]~[!#6;!#1;!H0]', 0)QH
70('[!#6;!#1]~[#7]~[!#6;!#1]', 0)QNQ
71('[#7]~[#8]', 0)NO
72('[#8]~*~*~[#8]', 0)OAAO
73('[#16]=*', 0)S=AS双键连接任意原子
74('[CH3]~*~[CH3]', 0)CH3ACH3
75('*!@[#7]@*', 0)A!N$A
76('[#6]=[#6](~*)~*', 0)C=C(A)A
77('[#7]~*~[#7]', 0)NAN
78('[#6]=[#7]', 0)C=N
79('[#7]~*~*~[#7]', 0)NAAN
80('[#7]~*~*~*~[#7]', 0)NAAAN
81('[#16]~*(~*)~*', 0)SA(A)A
82('*~[CH2]~[!#6;!#1;!H0]', 0)ACH2QH
83('[!#6;!#1]1~*~*~*~*~1', 0)QAAAA@1
84('[NH2]', 0)NH2氨基
85('[#6]~[#7](~[#6])~[#6]', 0)CN(C)C
86('[C;H2,H3][!#6;!#1][C;H2,H3]', 0)CH2QCH2
87('[F,Cl,Br,I]!@*@*', 0)X!A$AX代表卤素
88('[#16]', 0)S硫原子
89('[#8]~*~*~*~[#8]', 0)OAAAO
90

('[$([!#6;!#1;!H0]~*~*~[CH2]~*),

$([!#6;!#1;!H0;R]1@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~[R]1@[R]@[CH2;R]1)]',0)

QHAACH2A
91

('[$([!#6;!#1;!H0]~*~*~*~[CH2]~*),

$([!#6;!#1;!H0;R]1@[R]@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~[R]1@[R]@[R]@[CH2;R]1),

$([!#6;!#1;!H0]~*~[R]1@[R]@[CH2;R]1)]',0)

QHAAACH2A
92('[#8]~[#6](~[#7])~[#6]', 0)OC(N)C
93('[!#6;!#1]~[CH3]', 0)QCH3
94('[!#6;!#1]~[#7]', 0)QN
95('[#7]~*~*~[#8]', 0)NAAO
96('*1~*~*~*~*~1', 0)5 M ring5元环任意键任意原子
97('[#7]~*~*~*~[#8]', 0)NAAAO
98('[!#6;!#1]1~*~*~*~*~*~1', 0)QAAAAA@15元环接杂基团
99('[#6]=[#6]', 0)C=C
100('*~[CH2]~[#7]', 0)ACH2N
101

('[$([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1),

$([R]@1@[R]@[R]@[R]@[R]

@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1)]',0)

8M Ring or larger. This only handles up to ring sizes of 148元环或以上,最大14
102('[!#6;!#1]~[#8]', 0)QO
103('Cl', 0)CL氯原子
104('[!#6;!#1;!H0]~*~[CH2]~*', 0)QHACH2A
105('*@*(@*)@*', 0)A$A($A)$A
106('[!#6;!#1]~*(~[!#6;!#1])~[!#6;!#1]', 0)QA(Q)Q
107('[F,Cl,Br,I]~*(~*)~*', 0)XA(A)A
108('[CH3]~*~*~*~[CH2]~*', 0)CH3AAACH2A
109('*~[CH2]~[#8]', 0)ACH2O
110('[#7]~[#6]~[#8]', 0)NCO
111('[#7]~*~[CH2]~*', 0)NACH2A
112('*~*(~*)(~*)~*', 0)AA(A)(A)A
113('[#8]!:*:*', 0)Onot%A%A
114('[CH3]~[CH2]~*', 0)CH3CH2A
115('[CH3]~*~[CH2]~*', 0)CH3ACH2A
116('[$([CH3]~*~*~[CH2]~*),$([CH3]~*1~*~[CH2]1)]', 0)CH3AACH2A
117('[#7]~*~[#8]', 0)NAO
118('[$(*~[CH2]~[CH2]~*),$(*1~[CH2]~[CH2]1)]', 1)ACH2CH2A > 1
119('[#7]=*', 0)N=A
120('[!#6;R]', 1)Heterocyclic atom > 1杂环原子大于1
121('[#7;R]', 0)N Heterocycle氮环
122('*~[#7](~*)~*', 0)AN(A)A
123('[#8]~[#6]~[#8]', 0)OCO
124('[!#6;!#1]~[!#6;!#1]', 0)QQ
125('?', 0) Aromatic Ring > 1芳香环大于1
126('*!@[#8]!@*', 0)A!O!A
127('*@*!@[#8]', 1)A$A!O > 1
128

('[$(*~[CH2]~*~*~*~[CH2]~*),

$([R]1@[CH2;R]@[R]@[R]@[R]@[CH2;R]1),

$(*~[CH2]~[R]1@[R]@[R]@[CH2;R]1),

$(*~[CH2]~*~[R]1@[R]@[CH2;R]1)]',0)

ACH2AAACH2A
129

('[$(*~[CH2]~*~*~[CH2]~*),

$([R]1@[CH2]@[R]@[R]@[CH2;R]1),

$(*~[CH2]~[R]1@[R]@[CH2;R]1)]',0)

ACH2AACH2A
130('[!#6;!#1]~[!#6;!#1]', 1)QQ > 1
131('[!#6;!#1;!H0]', 1)QH > 1
132('[#8]~*~[CH2]~*', 0)OACH2A
133('*@*!@[#7]', 0)A$A!N
134('[F,Cl,Br,I]', 0)X (HALOGEN)卤素
135('[#7]!:*:*', 0)Nnot%A%A
136('[#8]=*', 1)O=A>1
137('[!C;!c;R]', 0)Heterocycle是否有杂环
138('[!#6;!#1]~[CH2]~*', 1)QCH2A>1
139('[O;!H0]', 0)OH氢氧根
140('[#8]', 3)O > 3氧原子大于3个
141('[CH3]', 2)CH3 > 2甲基大于两个
142('[#7]', 1)N > 1氮原子大于1个
143('*@*!@[#8]', 0)A$A!O
144('*!:*:*!:*', 0)Anot%A%Anot%A
145('*1~*~*~*~*~*~1', 1)6M ring > 16元环大于1
146('[#8]', 2)O > 2氧原子大于2个
147('[$(*~[CH2]~[CH2]~*),$([R]1@[CH2;R]@[CH2;R]1)]', 0)ACH2CH2A
148('*~[!#6;!#1](~*)~*', 0)AQ(A)A
149('[C;H3,H4]', 1)CH3 > 1甲基大于1个
150('*!@*@*!@*', 0)A!A$A!A
151('[#7;!H0]', 0)NH亚氨基
152('[#8]~[#6](~[#6])~[#6]', 0)OC(C)C
153('[!#6;!#1]~[CH2]~*', 0)QCH2A
154('[#6]=[#8]', 0)C=O
155('*!@[CH2]!@*', 0)A!CH2!A
156('[#7]~*(~*)~*', 0)NA(A)A
157('[#6]-[#8]', 0)C-O
158('[#6]-[#7]', 0)C-N
159('[#8]', 1)O>1氧原子大于1个
160('[C;H3,H4]', 0)CH3甲基
161('[#7]', 0)N氮原子
162('a', 0)Aromatic芳香结构
163('*1~*~*~*~*~*~1', 0)6M Ring6元环
164('[#8]', 0)O氧原子
165('[R]', 0)Ring        有无环
166('?', 0)Fragments FIX: this can't be done in SMARTSSMARTS编码下无意义

References:

  1. Durant JL, Leland BA, Henry DR, Nourse JG: Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 2002, 42:1273-1280.
  2. THE KEYS TO UNDERSTANDING MDL KEYSET TECHNOLOGY. https://www.3dsbiovia.com/products/pdf/keys-to-keyset-technology.pdf. Accessed Oct. 2019.
  3. RDKit. https://www.rdkit.org/
  4. O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. J Cheminformatics 2011, 3:33.
  5. The Open Babel Package. https://openbabel.org
  6. Chemistry Development Kit (CDK). Chemistry Development Kit.
  7. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Cherto M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminformatics 2017, 9:33.
  • 8
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值