'''
by wufeil
DeepChem的内置数据集及使用方法
Tutorial 3: An Introduction To MoleculeNet
Deepchem含有大量的内置数据集,主要是一些分子的数据,所以,数据集的名字为:MoleculeNet (分子网络)
数据集一直在更新,目前已经有46个数据集,
deepchem的数据集包含在molnet模块内, 使用load_+模块名称加载数据
如下:
'''
import deepchem as dc
tasks, datasets, transformer = dc.molnet.load_delaney(featurizer='GraphConv', splitter='random')
trainset, validset, testset = datasets
for X, y, w, ids in testset.itersamples():
'''
X-特征
y-标签
w-样本权重
ids- ID
注:
datasets已经将训练集、验证集、测试集分开了
数据集testset不是可以直接迭代的对象,要testset.itersamples()才可以
具体请见:https://blog.csdn.net/wufeil7/article/details/110631024?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522160732181019721940220137%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=160732181019721940220137&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_v2~rank_v29-3-110631024.pc_v2_rank_blog_default&utm_term=deepchem&spm=1018.2118.3001.4450
'''
print(X, y, w, ids)
Deepchem中内置的数据集: