机器学习中对于数据的随机划分
train_test_split的引用: from sklearn.model_selection import train_test_split
例如:
import numpy as np
from sklearn.model_selection import train_test_split
X,y=np.arange(10).reshape((5,2)),range
X=np.array([[0,1],[2,3],[4,5],[6,7],[8,9]])
y=[0,1,2,3,4]
print(X)
print(y)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=0)
print(X_train)
print(X_test)
print(y_train)
print(y_test)
#X_train为训练集,X_test为测试集,y_train为训练集,y_test为测试集
train_test_split 将数据集划分成X_train,X_test,y_train,y_test 四部分
train_test_split (train_data,train_target,test_size=0.4, random_state=0)
train_data:所要划分的样本整体
train_target所要划分的样本分类结果
test_size:测试样本占比,如果是整数的话就是样本数量
random_state=0:随机数种子 这里的random_state就是为了保证程序每次运行都分割一样的训练集和测试集。
思考题:random=0和random=1有什么区别?