我知道这个问题已经过时了,但如果有人想做类似的事情,请扩展一下ahmedhosny's答案:
新的tensorflow数据集API能够使用python生成器创建数据集对象,因此除了scikit learn的KFold选项外,还可以使用KFold.split()生成器创建数据集:import numpy as np
from sklearn.model_selection import LeaveOneOut,KFold
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()
from sklearn.datasets import load_iris
data = load_iris()
X=data['data']
y=data['target']
def make_dataset(X_data,y_data,n_splits):
def gen():
for train_index, test_index in KFold(n_splits).split(X_data):
X_train, X_test = X_data[train_index], X_data[test_index]
y_train, y_test = y_data[train_index], y_data[test_index]
yield X_train,y_train,X_test,y_test
return tf.data.Dataset.from_generator(gen, (tf.float64,tf.float64,tf.float64,tf.float64))
dataset=make_dataset(X,y,10)
然后可以在基于图的tensorflow中或使用紧急执行来迭代数据集。使用紧急执行:for X_train,y_train,X_test,y_test in tfe.Iterator(dataset):
....