文章目录
2 .model_selection
超参数搜索
1.model_selection.GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise-deprecating’, return_train_score=False)
网格搜索
参数:
param_grid
:指定参数空间,以字典形式给出。若有多个参数空间则用list框起来。
scoring
:评价准则,若不指定则默认学习器自带的评价准则
n_jobs
:指定要并行计算的线程数,默认为None即1,如果设定为-1则表示使用全部cpu。
iid
refit
verbose
pre_dispatch
error_score
return_train_score
属性:
cv_results_
:返回网格搜索的结果
best_estimator_
:返回最优的学习器
best_params_
:返回最优的参数
best_score_
:返回最优的评价值
2.model_selection.RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, n_jobs=None, iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’, random_state=None, error_score=’raise-deprecating’, return_train_score=False)
随机搜索
参数:
estimator
:略
param_distributions
:参数的分布,写法和上面的param_grid
相似,字典值里是一个随机分布,如果给的是一个list则默认均匀分布。
n_iter
scoring
:略
n_jobs
:略
iid
refit
cv
:略
verbose
pre_dispatch
random_state
:略
error_score
return_train_score
属性同GridSearchCV
五、.pipeline
1.make_pipeline
六、.datasets
1.load_iris
七、.feature_extraction特征提取
7.1
.text.CountVectorizer(input=’content’, encoding=’utf-8’, decode_error=’strict’, strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, stop_words=None, token_pattern=’(?u)\b\w\w+\b’, ngram_range=(1, 1), analyzer=’word’, max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class ‘numpy.int64’>)
对单列文本做标记和计数。
八、.feature_selection
.VarianceThreshold(threshold=0.0)
:过滤法-方差阈值.SelectKBest(score_func=<function f_classif>, k=10)
score_func
:指定过滤法中的评价准则,默认为f值。
.SelectPercentile(score_func=<function f_classif>, percentile=10)
.SelectFpr(score_func=<function f_classif>, alpha=0.05)
.SelectFdr(score_func=<function f_classif>, alpha=0.05)
.SelectFwe(score_func=<function f_classif>, alpha=0.05)
.GenericUnivariateSelect(score_func=<function f_classif>, mode=’percentile’, param=1e-05)
:通用的特征筛选器。.f_regression(X, y, center=True)
.mutual_info_regression(X, y, discrete_features=’auto’, n_neighbors=3, copy=True, random_state=None)
:互信息。.chi2(X, y)
.f_classif(X, y)
.mutual_info_classif(X, y, discrete_features=’auto’, n_neighbors=3, copy=True, random_state=None)
.RFE(estimator, n_features_to_select=None, step=1, verbose=0)
:RFE嵌入法。.SelectFromModel(estimator, threshold=None, prefit=False, norm_order=1, max_features=None)
:自选模型嵌入法。