X = array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
y = [0,1,2,3,4]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42,shuffle=True)#多次运行,四个子集不变
>>> X_train
>>> array([[4, 5],
[0, 1],
[6, 7]])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=None,shuffle=True)
>>> X_train
>>> array([[0, 1],
[6, 7],
[8, 9]])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42,shuffle=False)#多次运行,四个子集不变
>>> X_train
>>> array([[0, 1],
[2, 3],
[4, 5]])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=None,shuffle=False)#结果同random_state=42,shuffle=False
综上,
当shuffle=True且randomstate 取整数,划分得到的是乱序的子集,且多次运行语句(保持randomstate值不变),得到的四个子集不变。当shuffle=True且randomstate =None,划分得到的是乱序的子集,且多次运行语句,得到的四个子集变化。
当shuffle=False,randomstate 不影响划分结果,划分得到的是顺序的子集,
结论:为保证打乱且每次实验的划分一致,只需设定random_state为整数(0~42),shuffle函数中默认=True