参考转自该链接:http://blog.csdn.net/ztchun/article/details/71169530
机器学习中的k折交叉验证:
1.
fromsklearn.model_selection
importKFold
方法
该方法选择的时候,选择k个样本做测试,其余做训练
代码示例:
fromsklearn.model_selection import KFold
importnumpy as np
X =np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y =np.array([1, 2, 3, 4])
kf = KFold(n_splits=2)
fortrain_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:",test_index)
X_train, X_test = X[train_index],X[test_index]
y_train, y_test = y[train_index], y[test_index]
2.
fromsklearn.model_selection
importStratifiedKFold
方法
将k折数据按照百分比划分数据集,每个类别百分比在训练集和测试集中都是一样,不会出现某个类别只出现在训练集或者测试集的情况
3.
fromsklearn.model_selection
importtrain_test_split
随机根据比例划分训练集和测试集
X_train,X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)