train_test_split
模型原型
class sklearn.model_selection.train_test_split(arrays, *options)
参数
- *arrays : 一个或多个数据集
- test_size : 指定测试集的大小
- 浮点数:测试集占原始数据集的比例
- 整数:测试集的大小
- None:测试集大小=原始数据集大小-训练数据集大小
- train_size : 指定训练集的大小
- 浮点数:训练集占原始数据集的比例
- 整数:训练集的大小
- None:训练集大小=原始数据集大小-测试数据集大小
- random_state
- shuffle
- stratify : 采样的标记数组
返回值
- 一个列表,依次给出一个或多个数据集划分的结果,每个数据集都划分为两部分:训练集,测试集
示例
from sklearn.model_selection import train_test_split
X=[
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
]
y=[1,1,0,0,1,1,0,0]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=0)
print('X_train:%s\nX_test:%s\ny_train:%s\ny_test:%s'%(X_train,X_test,y_train,y_test))
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=0,stratify=y)
print('\nStratify:\nX_train:%s\nX_test:%s\ny_train:%s\ny_test:%s'%(X_train,X_test,y_train,y_test))
KFold
模型原型
class sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)
参数
- n_splits
- shuffle
- random_state
方法
- get_n_splits([X,y,groups])
- split(X[,y,groups])
示例
from sklearn.model_selection import KFold
import numpy as np
X=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74],
[81,82,83,84]
])
y=np.array([1,1,0,0,1,1,0,0,1])
folder=KFold(random_state=0,shuffle=False)
for train_index,test_index in folder.split(X,y):
print('Train Index:%s\nTest Index:%s\nX_train:\n%s\nX_test:\n%s\n'%
(train_index,test_index,X[train_index],X[test_index]))
shuffle_folder=KFold(random_state=0,shuffle=True)
for train_index,test_index in shuffle_folder.split(X,y):
print('Shuffled\nTrain Index:%s\n
Test Index:%s\nX_train:\n%s\nX_test:\n%s\n'%
(train_index,test_index,X[train_index],X[test_index]))