为何使用训练集-CSDN博客

本文链接：https://blog.csdn.net/niukai1768/article/details/79850069

why use Training set

用于检查过拟合
对模型在一个独立数据集的表现

How

分离训练集&测试集

sklearn版本	3.17	3.18
包引入	from sklearn import cross_validation	from sklearn.model_selection import train_test_split
函数调用	right-aligned	train_test_spli

from sklearn.model_selection import train_test_split

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(
iris.data, iris.target, test_size=0.4, random_state=0)

k折交叉检验

把数据集分成连续的k份
其中k-1份作为训练集，1份作为测试集

>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)  
KFold(n_splits=2, random_state=None, shuffle=False)

##调用了kf的split方法，返回俩个ndarray对象，代表着引用
>>> for train_index, test_index in kf.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]