Dataset Details
a All MoleculeNet datasets are split into training, validation and test subsets following a 80/10/10 ratio. Different splittings are recommended depending on each dataset's contents. For details of splitting methods please refer to the paper.
b Different classification and regress metrics are recommended based on previous works and dataset's contents:
ROC-AUC: Area Under Curve of Receiver Operating Characteristics
PRC-AUC: Area Under Curve of Precision Recall Curve
RMSE: Root-Mean-Square Error
MAE: Mean Absolute Error
For details of metrics please refer to the paper.
FreeSolv数据集部分内容
iupac | smiles | expt | calc |
4-methoxy-N,N-dimethyl-benzamide | CN(C)C(=O)c1ccc(cc1)OC | -11.01 | -9.625 |
methanesulfonyl chloride | CS(=O)(=O)Cl | -4.87 | -6.219 |
3-methylbut-1-ene | CC(C)C=C | 1.83 | 2.452 |
2-ethylpyrazine | CCc1cnccn1 | -5.45 | -5.809 |
heptan-1-ol | CCCCCCCO | -4.21 | -2.917 |
3,5-dimethylphenol | Cc1cc(cc(c1)O)C | -6.27 | -5.444 |
2,3-dimethylbutane | CC(C)C(C)C | 2.34 | 2.468 |
2-methylpentan-2-ol | CCCC(C)(C)O | -3.92 | -2.779 |
1,2-dimethylcyclohexane | C[C@@H]1CCCC[C@@H]1C | 1.58 | 1.685 |
HIV数据集部分内容
smiles | activity | HIV_active |
CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2 | CI | 0 |
C(=Cc1ccccc1)C1=[O+][Cu-3]2([O+]=C(C=Cc3ccccc3)CC(c3ccccc3)=[O+]2)[O+]=C(c2ccccc2)C1 | CI | 0 |
CC(=O)N1c2ccccc2Sc2c1ccc1ccccc21 | CI | 0 |
Nc1ccc(C=Cc2ccc(N)cc2S(=O)(=O)O)c(S(=O)(=O)O)c1 | CI | 0 |
O=S(=O)(O)CCS(=O)(=O)O | CI | 0 |
BBBP数据集部分内容
num | name | p_np | smiles |
1 | Propanolol | 1 | [Cl].CC(C)NCC(O)COc1cccc2ccccc12 |
2 | Terbutylchlorambucil | 1 | C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl |
3 | 40730 | 1 | c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO3)=O |
4 | 24 | 1 | C1CCN(CC1)Cc1cccc(c1)OCCCNC(=O)C |
5 | cloxacillin | 1 | Cc1onc(c2ccccc2Cl)c1C(=O)N[C@H]3[C@H]4SC(C)(C)[C@@H](N4C3=O)C(O)=O |
6 | cefoperazone | 1 | CCN1CCN(C(=O)N[C@@H](C(=O)N[C@H]2[C@H]3SCC(=C(N3C2=O)C(O)=O)CSc4nnnn4C)c5ccc(O)cc5)C(=O)C1=O |
7 | rolitetracycline | 1 | CN(C)[C@H]1[C@@H]2C[C@H]3C(=C(O)c4c(O)cccc4[C@@]3(C)O)C(=O)[C@]2(O)C(=O)\C(=C(/O)NCN5CCCC5)C1=O |
8 | ondansetron | 1 | Cn1c2CCC(Cn3ccnc3C)C(=O)c2c4ccccc14 |
9 | diltiazem | 1 | COc1ccc(cc1)[C@@H]2Sc3ccccc3N(CCN(C)C)C(=O)[C@@H]2OC(C)=O |
QM8数据集部分内容
smiles | E1-CC2 | E2-CC2 | f1-CC2 | f2-CC2 | E1-PBE0 | E2-PBE0 | f1-PBE0 | f2-PBE0 | E1-PBE0 | E2-PBE0 | f1-PBE0 | f2-PBE0 | E1-CAM | E2-CAM | f1-CAM | f2-CAM |
[H]C([H])([H])[H] | 0.432952 | 0.43296 | 0.249728 | 0.249736 | 0.430218 | 0.430236 | 0.181436 | 0.181502 | 0.430218 | 0.430236 | 0.181436 | 0.181502 | 0.409931 | 0.409939 | 0.1832 | 0.1832 |
[H]N([H])[H] | 0.26522 | 0.350081 | 0.067015 | 0.030049 | 0.268386 | 0.349106 | 0.040761 | 0.031641 | 0.268386 | 0.349106 | 0.040761 | 0.031641 | 0.253853 | 0.334481 | 0.0575 | 0.0238 |
[H]O[H] | 0.286537 | 0.363579 | 0.037755 | 0 | 0.291377 | 0.362091 | 0.019503 | 1E-08 | 0.291377 | 0.362091 | 0.019503 | 1E-08 | 0.278519 | 0.350074 | 0.0333 | 0 |
[H]C#C[H] | 0.358629 | 0.358629 | 0 | 0 | 0.256321 | 0.268469 | 0 | 0 | 0.256321 | 0.268469 | 0 | 0 | 0.244879 | 0.255051 | 0 | 0 |
[H]C#N | 0.319958 | 0.336074 | 0 | 0 | 0.295139 | 0.311657 | 0 | 0 | 0.295139 | 0.311657 | 0 | 0 | 0.283426 | 0.296993 | 0 | 0 |
[H]C([H])=O | 0.153914 | 0.291234 | 0 | 0.091023 | 0.148553 | 0.312962 | 0 | 0.157916 | 0.148553 | 0.312962 | 0 | 0.157916 | 0.146839 | 0.304442 | 0 | 0.0954 |
[H]C([H])([H])C([H])([H])[H] | 0.376138 | 0.376146 | 0 | 0 | 0.372867 | 0.372891 | 0 | 0 | 0.372867 | 0.372891 | 0 | 0 | 0.354965 | 0.354976 | 0 | 0 |
[H]OC([H])([H])[H] | 0.266691 | 0.333191 | 0.000944 | 0.071608 | 0.277884 | 0.331415 | 0.001311 | 0.056824 | 0.277884 | 0.331415 | 0.001311 | 0.056824 | 0.261225 | 0.325294 | 0.0003 | 0.0653 |
[H]C#CC([H])([H])[H] | 0.273389 | 0.28575 | 0 | 0.001194 | 0.251415 | 0.26275 | 0 | 0.001653 | 0.251415 | 0.26275 | 0 | 0.001653 | 0.243832 | 0.253357 | 0 | 0.0009 |