将数据集划分为 训练集、验证集和测试集
data::DataFrame 格式的数据
ratio_train:训练集的比例
ratio_test:测试集的比例
ratio_val:验证集的比例
from sklearn.model_selection import train_test_split
def train_test_val_split(data, ratio_train, ratio_test, ratio_val):
train, middle = train_test_split(data, train_size=ratio_train, test_size=ratio_test + ratio_val)
ratio = ratio_val/(1-ratio_train)
test, validation = train_test_split(middle, test_size=ratio)
return train, test, validation
使用:
按照6:2:2的比例划分数据
train, test, validation = train_test_val_split(data, 0.6, 0.2, 0.2)