需求
现在有一训练集,在机器学习中,需要将其按比例划分为训练集和测试集,我们知道划分后是四个变量,然,如何合并也是个问题。在实验中发现三种方法可以用用,先写出来以供参考。
代码
way1:
from sklearn.model_selection import train_test_split
messages_train, messages_test, y_train, y_test = train_test_split(messages, y, test_size=0.25, random_state=1000)
mess_train = pd.DataFrame(messages_train, columns=['message'])
label_train = pd.DataFrame(y_train, columns=['label'])
mess_test = pd.DataFrame(messages_test, columns=['message'])
label_test = pd.DataFrame(y_test, columns=['label'])
train_data = '../data/first/split/train_data.csv'
test_data = '../data/first/split/test_data.csv'
pd.concat([label_train,mess_train], axis=1).to_csv(train_data