Python中用于交叉验证的方式很多,调包的话可以使用sklearn的model_selectionr
如果你想自己写代码来划分数据集,ShuffleSplit就派上用场了
学习参考用,欢迎指正。
-
函数用途
根据已有数据集的元素总数,按照给定参数生成随机的索引集合 -
函数用法
包的引用与k折交叉验证类似:`from sklearn.model_selection import ShuffleSplit
for train_idx, test_idx in ShuffleSplit(len(X), 100, .3):
X_train, X_test = X[train_idx], X[test_idx]
Y_train, Y_test = Y[train_idx], Y[test_idx]
r = rf.fit(X_train, Y_train)`
更正一下,上面的代码在新版sklearn中会报错,稍作改动即可:
for train_ids, test_ids in ShuffleSplit(random_state=0).split(x):
x_train, x_test = x[train_ids], x[test_ids]
y_train, y_test = y[train_ids], y[test_ids]