1.What is MACCS keys?
The MACCS (Molecular ACCess System) keys [1,2] are one of the most commonly used structural keys. They are sometimes referred to as the MDL keys, named after the company that developed them [the MDL Information Systems (now BIOVIA)]. While there are two sets of MACCS keys (one with 960 keys and the other containing a subset of 166 keys), only the shorter fragment definitions are available to the public. These 166 public keys are implemented in popular open-source cheminformatics software packages, including RDKit [3], OpenBabel [4,5], CDK [6,7], etc.
2.What is the exact meaning of each bit?
The fragment definitions for the MACCS 166 keys can be found in this document:
rdkit/MACCSkeys.py at master · rdkit/rdkit · GitHub
Additionally,MACCS is developed based on SMART which is a language for describing molecular patterns, owing to this, the official document is adhered below in which the meaning of every symbol in SMARTS sequence is detailed.
Daylight Theory: SMARTS - A Language for Describing Molecular Patterns
The exact definition of the MACCS 166 keys is displayed in the chart below:
To understand the remark better, it is recomemded that following extra note the reader should acknowlege:
1.Q in chart means non-carbon group or atom
2.A means any group or atom
3.Some chemical sub-structure graphs are provided as example to explain the SAMRTS code more vividly.
S.N | SMARTS CODE | REMARK | CHINESE REMARK |
---|---|---|---|
1 | ('?', 0) | ISOTOPE | 有无同位素 |
2 | ('[#104]', 0) | limit the above def'n since the RDKit only accepts up to #104 | 目前RDKIT元素类型只更新到103号 |
3 | ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0) | IVa,Va,VIa Rows 4-6 | 4-6周期 14-16族元素,包含锗、砷、硒、锡、锑(tī)、碲、 铊、铅、铋 |
4 | ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0) | Actinide | 锕系元素 |
5 | ('[Sc,Ti,Y,Zr,Hf]', 0) | Group IIIB,IVB | 4-6 周期 鈧族 鈦族元素 |
6 | ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0) | Lanthanide | 镧系元素 |
7 | ('[V,Cr,Mn,Nb,Mo,Tc,Ta,W,Re]', 0) | Group VB,VIB,VIIB | 4-6周期 钒族、铬族、锰族元素 |
8 | ('[!#6;!#1]1~*~*~*~1', 0) | QAAA@1 | 非碳元非氢元4元环,任意键 |
9 | ('[Fe,Co,Ni,Ru,Rh,Pd,Os,Ir,Pt]', 0) | Group VIII | 4-6周期,铁族、钴族、镍族元素 |
10 | ('[Be,Mg,Ca,Sr,Ba,Ra]', 0) | Group IIa (Alkaline earth) | 碱土 |
11 | ('*1~*~*~*~1', 0) | 4M Ring | 通配4元环,任意键,任意元素 |
12 | ('[Cu,Zn,Ag,Cd,Au,Hg]', 0) | Group IB,IIB | 4-6周期,铜族、锌族元素 |
13 | ('[#8]~[#7](~[#6])~[#6]', 0) | ON(C)C | |
14 | ('[#16]-[#16]', 0) | S-S | 硫单键 |
15 | ('[#8]~[#6](~[#8])~[#8]', 0) | OC(O)O | |
16 | ('[!#6;!#1]1~*~*~1', 0) | QAA@1 | 非碳基 任意2基团 组成的逆时针三元环 |
17 | ('[#6]#[#6]', 0) | CTC | 双碳三键 |
18 | ('[#5,#13,#31,#49,#81]', 0) | Group IIIA | 除第7周期外的硼族元素 |
19 | ('*1~*~*~*~*~*~*~1', 0) | 7M Ring | 通配7元环,任意键,任意元素 |
20 | ('[#14]', 0) | Si | 硅元素 |
21 | ('[#6]=[#6](~[!#6;!#1])~[!#6;!#1]', 0) | C=C(Q)Q | 碳双键链接非碳非氢元素 |
22 | ('*1~*~*~1', 0) | 3M Ring | 3元环,任意元素,任意键 |
23 | ('[#7]~[#6](~[#8])~[#8]', 0) | NC(O)O | |
24 | ('[#7]-[#8]', 0) | N-O | 氮、氧单键 |
25 | ('[#7]~[#6](~[#7])~[#7]', 0) | NC(N)N | |
26 | ('[#6]=;@[#6](@*)@*', 0) | C$=C($A)$A | |
27 | ('[I]', 0) | I | 碘元素 |
28 | ('[!#6;!#1]~[CH2]~[!#6;!#1]', 0) | QCH2Q | 亚甲基与非碳连接 |
29 | ('[#15]', 0) | P | 磷元素 |
30 | ('[#6]~[!#6;!#1](~[#6])(~[#6])~*', 0) | CQ(C)(C)A | |
31 | ('[!#6;!#1]~[F,Cl,Br,I]', 0) | QX | 非碳基接卤素 |
32 | ('[#6]~[#16]~[#7]', 0) | CSN | 碳硫氮任意键连接 |
33 | ('[#7]~[#16]', 0) | NS | 氮硫任意键连接 |
34 | ('[CH2]=*', 0), | CH2=A | 亚甲基双键连接任意 |
35 | ('[Li,Na,K,Rb,Cs,Fr]', 0) | Group IA (Alkali Metal) | 碱金属族 |
36 | ('[#16R]', 0) | S Heterocycle | 硫环 |
37 | ('[#7]~[#6](~[#8])~[#7]', 0) | NC(O)N | |
38 | ('[#7]~[#6](~[#6])~[#7]', 0) | NC(C)N | |
39 | ('[#8]~[#16](~[#8])~[#8]', 0) | OS(O)O | |
40 | ('[#16]-[#8]', 0) | S-O | 硫氧单键 |
41 | ('[#6]#[#7]', 0) | CTN | 碳氮三键 |
42 | ('F', 0) | F | 氟元素 |
43 | ('[!#6;!#1;!H0]~*~[!#6;!#1;!H0]', 0) | QHAQH | |
44 | ('[!#1;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#35;!#53]', 0) | other | 有元素非碳、非氢、非氮、非氧、非硅、非磷、非卤素 |
45 | ('[#6]=[#6]~[#7]', 0) | C=CN | |
46 | ('Br', 0), | Br | 溴元素 |
47 | ('[#16]~*~[#7]', 0) | SAN | 硫+任意+氮 |
48 | ('[#8]~[!#6;!#1](~[#8])(~[#8])', 0) | OQ(O)O | |
49 | ('[!+0]', 0) | charge | 电子 |
50 | ('[#6]=[#6](~[#6])~[#6]', 0) | C=C(C)C | |
51 | ('[#6]~[#16]~[#8]', 0) | CSO | 碳硫氧任意键连接 |
52 | ('[#7]~[#7]', 0) | NN | 氮氮任意连接 |
53 | ('[!#6;!#1;!H0]~*~*~*~[!#6;!#1;!H0]', 0) | QHAAAQH | 非碳基团接任意3元素接非碳基团 |
54 | ('[!#6;!#1;!H0]~*~*~[!#6;!#1;!H0]', 0) | QHAAQH | 非碳基团接任意2元素接非碳基团 |
55 | ('[#8]~[#16]~[#8]', 0) | OSO | 氧硫氧任意键 |
57 | ('[#8R]', 0) | O Heterocycle | 氧环 |
58 | ('[!#6;!#1]~[#16]~[!#6;!#1]', 0) | QSQ | 非碳基接硫接非碳基 |
59 | ('[#16]!:*:*', 0) | Snot%A%A | %代表芳香键 |
60 | ('[#16]=[#8]', 0) | S=O | 硫氧双键 |
61 | ('*~[#16](~*)~*', 0) | AS(A)A | |
62 | ('*@*!@*@*', 0) | A$!A$A | |
63 | ('[#7]=[#8]', 0) | N=O | |
64 | ('*@*!@[#16]', 0) | A$A!S | |
65 | ('c:n', 0) | C%N | |
66 | ('[#6]~[#6](~[#6])(~[#6])~*', 0) | CC(C)(C)A | |
67 | ('[!#6;!#1]~[#16]', 0) | QS | |
68 | ('[!#6;!#1;!H0]~[!#6;!#1;!H0]', 0) | QHQH | |
69 | ('[!#6;!#1]~[!#6;!#1;!H0]', 0) | QH | |
70 | ('[!#6;!#1]~[#7]~[!#6;!#1]', 0) | QNQ | |
71 | ('[#7]~[#8]', 0) | NO | |
72 | ('[#8]~*~*~[#8]', 0) | OAAO | |
73 | ('[#16]=*', 0) | S=A | S双键连接任意原子 |
74 | ('[CH3]~*~[CH3]', 0) | CH3ACH3 | |
75 | ('*!@[#7]@*', 0) | A!N$A | |
76 | ('[#6]=[#6](~*)~*', 0) | C=C(A)A | |
77 | ('[#7]~*~[#7]', 0) | NAN | |
78 | ('[#6]=[#7]', 0) | C=N | |
79 | ('[#7]~*~*~[#7]', 0) | NAAN | |
80 | ('[#7]~*~*~*~[#7]', 0) | NAAAN | |
81 | ('[#16]~*(~*)~*', 0) | SA(A)A | |
82 | ('*~[CH2]~[!#6;!#1;!H0]', 0) | ACH2QH | |
83 | ('[!#6;!#1]1~*~*~*~*~1', 0) | QAAAA@1 | |
84 | ('[NH2]', 0) | NH2 | 氨基 |
85 | ('[#6]~[#7](~[#6])~[#6]', 0) | CN(C)C | |
86 | ('[C;H2,H3][!#6;!#1][C;H2,H3]', 0) | CH2QCH2 | |
87 | ('[F,Cl,Br,I]!@*@*', 0) | X!A$A | X代表卤素 |
88 | ('[#16]', 0) | S | 硫原子 |
89 | ('[#8]~*~*~*~[#8]', 0) | OAAAO | |
90 | ('[$([!#6;!#1;!H0]~*~*~[CH2]~*), $([!#6;!#1;!H0;R]1@[R]@[R]@[CH2;R]1), $([!#6;!#1;!H0]~[R]1@[R]@[CH2;R]1)]',0) | QHAACH2A | |
91 | ('[$([!#6;!#1;!H0]~*~*~*~[CH2]~*), $([!#6;!#1;!H0;R]1@[R]@[R]@[R]@[CH2;R]1), $([!#6;!#1;!H0]~[R]1@[R]@[R]@[CH2;R]1), $([!#6;!#1;!H0]~*~[R]1@[R]@[CH2;R]1)]',0) | QHAAACH2A | |
92 | ('[#8]~[#6](~[#7])~[#6]', 0) | OC(N)C | |
93 | ('[!#6;!#1]~[CH3]', 0) | QCH3 | |
94 | ('[!#6;!#1]~[#7]', 0) | QN | |
95 | ('[#7]~*~*~[#8]', 0) | NAAO | |
96 | ('*1~*~*~*~*~1', 0) | 5 M ring | 5元环任意键任意原子 |
97 | ('[#7]~*~*~*~[#8]', 0) | NAAAO | |
98 | ('[!#6;!#1]1~*~*~*~*~*~1', 0) | QAAAAA@1 | 5元环接杂基团 |
99 | ('[#6]=[#6]', 0) | C=C | |
100 | ('*~[CH2]~[#7]', 0) | ACH2N | |
101 | ('[$([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R] @[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R] @[R]@[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R] @[R]@[R]@[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R] @[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1), $([R]@1@[R]@[R]@[R]@[R] @[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]@[R]1)]',0) | 8M Ring or larger. This only handles up to ring sizes of 14 | 8元环或以上,最大14 |
102 | ('[!#6;!#1]~[#8]', 0) | QO | |
103 | ('Cl', 0) | CL | 氯原子 |
104 | ('[!#6;!#1;!H0]~*~[CH2]~*', 0) | QHACH2A | |
105 | ('*@*(@*)@*', 0) | A$A($A)$A | |
106 | ('[!#6;!#1]~*(~[!#6;!#1])~[!#6;!#1]', 0) | QA(Q)Q | |
107 | ('[F,Cl,Br,I]~*(~*)~*', 0) | XA(A)A | |
108 | ('[CH3]~*~*~*~[CH2]~*', 0) | CH3AAACH2A | |
109 | ('*~[CH2]~[#8]', 0) | ACH2O | |
110 | ('[#7]~[#6]~[#8]', 0) | NCO | |
111 | ('[#7]~*~[CH2]~*', 0) | NACH2A | |
112 | ('*~*(~*)(~*)~*', 0) | AA(A)(A)A | |
113 | ('[#8]!:*:*', 0) | Onot%A%A | |
114 | ('[CH3]~[CH2]~*', 0) | CH3CH2A | |
115 | ('[CH3]~*~[CH2]~*', 0) | CH3ACH2A | |
116 | ('[$([CH3]~*~*~[CH2]~*),$([CH3]~*1~*~[CH2]1)]', 0) | CH3AACH2A | |
117 | ('[#7]~*~[#8]', 0) | NAO | |
118 | ('[$(*~[CH2]~[CH2]~*),$(*1~[CH2]~[CH2]1)]', 1) | ACH2CH2A > 1 | |
119 | ('[#7]=*', 0) | N=A | |
120 | ('[!#6;R]', 1) | Heterocyclic atom > 1 | 杂环原子大于1 |
121 | ('[#7;R]', 0) | N Heterocycle | 氮环 |
122 | ('*~[#7](~*)~*', 0) | AN(A)A | |
123 | ('[#8]~[#6]~[#8]', 0) | OCO | |
124 | ('[!#6;!#1]~[!#6;!#1]', 0) | ||
125 | ('?', 0) | Aromatic Ring > 1 | 芳香环大于1 |
126 | ('*!@[#8]!@*', 0) | A!O!A | |
127 | ('*@*!@[#8]', 1) | A$A!O > 1 | |
128 | ('[$(*~[CH2]~*~*~*~[CH2]~*), $([R]1@[CH2;R]@[R]@[R]@[R]@[CH2;R]1), $(*~[CH2]~[R]1@[R]@[R]@[CH2;R]1), $(*~[CH2]~*~[R]1@[R]@[CH2;R]1)]',0) | ACH2AAACH2A | |
129 | ('[$(*~[CH2]~*~*~[CH2]~*), $([R]1@[CH2]@[R]@[R]@[CH2;R]1), $(*~[CH2]~[R]1@[R]@[CH2;R]1)]',0) | ACH2AACH2A | |
130 | ('[!#6;!#1]~[!#6;!#1]', 1) | QQ > 1 | |
131 | ('[!#6;!#1;!H0]', 1) | QH > 1 | |
132 | ('[#8]~*~[CH2]~*', 0) | OACH2A | |
133 | ('*@*!@[#7]', 0) | A$A!N | |
134 | ('[F,Cl,Br,I]', 0) | X (HALOGEN) | 卤素 |
135 | ('[#7]!:*:*', 0) | Nnot%A%A | |
136 | ('[#8]=*', 1) | O=A>1 | |
137 | ('[!C;!c;R]', 0) | Heterocycle | 是否有杂环 |
138 | ('[!#6;!#1]~[CH2]~*', 1) | QCH2A>1 | |
139 | ('[O;!H0]', 0) | OH | 氢氧根 |
140 | ('[#8]', 3) | O > 3 | 氧原子大于3个 |
141 | ('[CH3]', 2) | CH3 > 2 | 甲基大于两个 |
142 | ('[#7]', 1) | N > 1 | 氮原子大于1个 |
143 | ('*@*!@[#8]', 0) | A$A!O | |
144 | ('*!:*:*!:*', 0) | Anot%A%Anot%A | |
145 | ('*1~*~*~*~*~*~1', 1) | 6M ring > 1 | 6元环大于1 |
146 | ('[#8]', 2) | O > 2 | 氧原子大于2个 |
147 | ('[$(*~[CH2]~[CH2]~*),$([R]1@[CH2;R]@[CH2;R]1)]', 0) | ACH2CH2A | |
148 | ('*~[!#6;!#1](~*)~*', 0) | AQ(A)A | |
149 | ('[C;H3,H4]', 1) | CH3 > 1 | 甲基大于1个 |
150 | ('*!@*@*!@*', 0) | A!A$A!A | |
151 | ('[#7;!H0]', 0) | NH | 亚氨基 |
152 | ('[#8]~[#6](~[#6])~[#6]', 0) | OC(C)C | |
153 | ('[!#6;!#1]~[CH2]~*', 0) | QCH2A | |
154 | ('[#6]=[#8]', 0) | C=O | |
155 | ('*!@[CH2]!@*', 0) | A!CH2!A | |
156 | ('[#7]~*(~*)~*', 0) | NA(A)A | |
157 | ('[#6]-[#8]', 0) | C-O | |
158 | ('[#6]-[#7]', 0) | C-N | |
159 | ('[#8]', 1) | O>1 | 氧原子大于1个 |
160 | ('[C;H3,H4]', 0) | CH3 | 甲基 |
161 | ('[#7]', 0) | N | 氮原子 |
162 | ('a', 0) | Aromatic | 芳香结构 |
163 | ('*1~*~*~*~*~*~1', 0) | 6M Ring | 6元环 |
164 | ('[#8]', 0) | O | 氧原子 |
165 | ('[R]', 0) | Ring | 有无环 |
166 | ('?', 0) | Fragments FIX: this can't be done in SMARTS | SMARTS编码下无意义 |
References:
- Durant JL, Leland BA, Henry DR, Nourse JG: Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 2002, 42:1273-1280.
- THE KEYS TO UNDERSTANDING MDL KEYSET TECHNOLOGY. https://www.3dsbiovia.com/products/pdf/keys-to-keyset-technology.pdf. Accessed Oct. 2019.
- RDKit. https://www.rdkit.org/.
- O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. J Cheminformatics 2011, 3:33.
- The Open Babel Package. https://openbabel.org.
- Chemistry Development Kit (CDK). Chemistry Development Kit.
- Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Cherto M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminformatics 2017, 9:33.