在scikit-learn中,您可以获得工具train_test_split
from sklearn.cross_validation import train_test_split
from sklearn import datasets
# Use Age and Weight to predict a value for the food someone chooses
X_train, X_test, y_train, y_test = train_test_split(table['Age', 'Weight'],
table['Food Choice'],
test_size=0.25)
# Another example using the sklearn pre-loaded datasets:
iris = datasets.load_iris()
X_iris, y_iris = iris.data, iris.target
X, y = X_iris[:, :2], y_iris
X_train, X_test, y_train, y_test = train_test_split(X, y)
这会将数据分解为
>培训投入
>评估数据的输入
>输出训练数据
>评估数据的输出
分别.您还可以添加关键字参数:test_size = 0.25以更改用于培训和测试的数据百分比
要拆分单个数据集,您可以使用这样的调用来获得40%的测试数据:
>>> data = np.arange(700).reshape((100, 7))
>>> training, testing = train_test_split(data, test_size=0.4)
>>> print len(data)
100
>>> print len(training)
60
>>> print len(testing)
40