如果你想把数据集分成两半,你可以使用numpy.random.shuffle或numpy.random.permutation如果你需要跟踪索引:
import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
numpy.random.shuffle(x)
training, test = x[:80,:], x[80:,:]
要么
import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
indices = numpy.random.permutation(x.shape[0])
training_idx, test_idx = indices[:80], indices[80:]
training, test = x[training_idx,:], x[test_idx,:]
import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
training_idx = numpy.random.randint(x.shape[0], size=80)
test_idx = numpy.random.randint(x.shape[0], size=20)
training, test = x[training_idx,:], x[test_idx,:]
最后,sklearn包含几个交叉验证方法(k折,留n出,分层k折,…)。对于文档,您可能需要查看示例或最新的git存储库,但代码是坚实的。