利用scikit-learn库中提供的train_test_split()函数即可进行划分。
"""
划分训练集、验证集、测试集
划分函数用: sklearn库中的train_test_split()函数
这里用iris数据集做例子。
按照 8:1:1 的比例来划分
"""
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# 加载鸢尾花数据集,即Iris数据集
X = load_iris().data
y = load_iris().target
# 分割出训练集(X_train, y_train)
X_train, X_res, y_train, y_res = train_test_split(X, y, train_size=0.8)
# 分割出验证集和测试集(X_test
X_valid, X_test, y_valid, y_test = train_test_split(X_res, y_res, test_size=0.5)
# 打印输出效果
print("X_train.shape = ", X_train.shape, " X_test.shape = ", y_train.shape)
print("X_valid.shape = ", X_valid.shape, " X_test.shape = ", y_valid.shape)
print("X_train.shape = ", X_test.shape, " X_test.shape = ", y_test.shape)
output:
X_train.shape = (120, 4) X_test.shape = (120,)
X_valid.shape = (15, 4) X_test.shape = (15,)
X_train.shape = (15, 4) X_test.shape = (15,)
参考地址:
https://towardsdatascience.com/how-to-split-data-into-three-sets-train-validation-and-test-and-why-e50d22d3e54c