模型选择和评估主要是在sklearn.model_selection
这个模块里面.这里只会列出概述和常见函数的用法,更加详细的可以到sklearn.model_selection: Model Selection 来看
一.概览
Splitter Classes
model_selection.KFold([n_splits, shuffle, …]) K-Folds cross-validator
model_selection.GroupKFold([n_splits]) K-fold iterator variant with non-overlapping groups.
model_selection.StratifiedKFold([n_splits, …]) Stratified K-Folds cross-validator
model_selection.LeaveOneGroupOut() Leave One Group Out cross-validator
model_selection.LeavePGroupsOut(n_groups) Leave P Group(s) Out cross-validator
model_selection.LeaveOneOut() Leave-One-Out cross-validator
model_selection.LeavePOut(p) Leave-P-Out cross-validator
model_selection.ShuffleSplit([n_splits, …]) Random permutation cross-validator
model_selection.GroupShuffleSplit([…]) Shuffle-Group(s)-Out cross-validation iterator
model_selection.StratifiedShuffleSplit([…]) Stratified ShuffleSplit cross-validator
model_selection.PredefinedSplit(test_fold) Predefined split cross-validator
model_selection.TimeSeriesSplit([n_splits]) Time Series cross-validator
分割函数
model_selection.train_test_split(*arrays, …)
把数组或者矩阵随机划分为子训练集和子测试集.
model_selection.check_cv([cv, y, classifier]) Input checker utility for building a cross-validator
超参数优化器(Hyper-parameter optimizers)
model_selection.GridSearchCV(estimator, …) Exhaustive search over specified parameter values for an estimator.
model_selection.RandomizedSearchCV(…[, …]) Randomized search on hyper parameters.
model_selection.ParameterGrid(param_grid) Grid of parameters with a discrete number of values for each.
model_selection.ParameterSampler(…[, …]) Generator on parameters sampled from given distributions.
model_selection.fit_grid_point(X, y, …[, …]) Run fit on one set of parameters.
Model validation
model_selection.cross_val_score(estimator, X) :通过交叉验证生成模型得分
model_selection.cross_val_predict(estimator, X) Generate cross-validated estimates for each input data point
model_selection.permutation_test_score(…) Evaluate the significance of a cross-validated score with permutations
model_selection.learning_curve(estimator, X, y) Learning curve.
model_selection.validation_curve(estimator, …) Validation curve.
一.分割函数
函数原型:
sklearn.model_selection.train_test_split(*arrays, **options)
作用:
把数组或者矩阵随机划分为子训练集和子测试集.返回的是一个列表,列表的长度是arrays这个长度的两倍(因为要分别划分出一个训练集和测试集,自然增长了两倍).要是输入时稀疏(sparse)的,那么输出就会是scipy.sparse.csr_matrix
类型,不然输出类型和输入的类型是一样的.参数:
*arrays :可以索引的序列,允许的输入可以使lists,ndarray,scipy-sparse matrices或者是pandas的dataframe
test_size : float, int, or None类型 (默认是None),如果是float类型, 应该介于0.0和1.0之间,表示数据集划分到测试集中的比例
如果是int类型,表示测试集样本的数量.
要是为None, 就自动根据train_size的值来进行补全