(本文没有任何参考价值,纯属个人笔记)
sklearn.feature_extraction
- DictVectorizer : 特征转化,将特征名和特征值组合形成新的特征名
- image
- img_to_graph
- grid_to_graph
- text : { CountVectorizer=>统计每篇文章词频次数, TfidfVectorizer => 计算词的重要性}
- FeatureHasher
sklearn.preprocessing
- 'Binarizer',
- 'FunctionTransformer',
- 'Imputer', : 缺失值补充 ,im = Imputer(missing_values='NaN', strategy='mean', axis=0)
- 'KernelCenterer',
- 'LabelBinarizer',
- 'LabelEncoder',
- 'MultiLabelBinarizer',
- 'MinMaxScaler', : 归一化,把值限定在一定范围里 比如 MinMaxScaler(feature_range=(0, 1)),最小的值为 0 最大的值为1,其余的按比例缩减
- 'MaxAbsScaler',
- 'QuantileTransformer',
- 'Normalizer',
- 'OneHotEncoder',
- 'RobustScaler',
- 'StandardScaler', : 标准化,数据中有如果出现异常点,比如十分夸张的极值,势必会对预测结果准确率造成较大影响,通过标准化计算标准差,去除异常值的点。
- 'add_dummy_feature',
- 'PolynomialFeatures',
- 'binarize',
- 'normalize',
- 'scale',
- 'robust_scale',
- 'maxabs_scale',
- 'minmax_scale',
- 'label_binarize',
- 'quantile_transform',
sklearn.feature_selection
- GenericUnivariateSelect
- RFE
- RFECV
- SelectFdr
- SelectFpr
- SelectFwe
- SelectKBest
- SelectFromModel
- SelectPercentile
- VarianceThreshold : 过滤式,过滤掉方差比较小的特征,方差比较小说明,该特征变化幅度比较小。
- chi2
- f_classif
- f_oneway
- f_regression
- mutual_info_classif
- mutual_info_regression
sklearn.decomposition
- DictionaryLearning
- FastICA
- IncrementalPCA
- KernelPCA
- MiniBatchDictionaryLearning
- MiniBatchSparsePCA
- NMF
- PCA : 主成分分析,用于简化数据集。
- RandomizedPCA
- SparseCoder
- SparsePCA
- dict_learning
- dict_learning_online
- fastica
- non_negative_factorization
- randomized_svd
- sparse_encode
- FactorAnalysis
- TruncatedSVD
- LatentDirichletAllocation
sklearn.neighbors
- BallTree
- DistanceMetric
- KDTree
- KNeighborsClassifier
- KNeighborsRegressor : K临近值算法实体
- NearestCentroid
- NearestNeighbors
- RadiusNeighborsClassifier
- RadiusNeighborsRegressor
- kneighbors_graph
- radius_neighbors_graph
- KernelDensity
- LSHForest
- LocalOutlierFactor
sklearn.model_selection
- BaseCrossValidator
- GridSearchCV : 交叉验证
- TimeSeriesSplit
- KFold
- GroupKFold
- GroupShuffleSplit
- LeaveOneGroupOut
- LeaveOneOut
- LeavePGroupsOut
- LeavePOut
- RepeatedKFold
- RepeatedStratifiedKFold
- ParameterGrid
- ParameterSampler
- PredefinedSplit
- RandomizedSearchCV
- ShuffleSplit
- StratifiedKFold
- StratifiedShuffleSplit
- check_cv
- cross_val_predict
- cross_val_score
- cross_validate
- fit_grid_point
- learning_curve
- permutation_test_score
- train_test_split : 训练集和测试集拆分
- validation_curve
sklearn.tree
- DecisionTreeClassifier : 决策树
- DecisionTreeRegressor
- ExtraTreeClassifier
- ExtraTreeRegressor
- export_graphviz : 决策树的可视化
sklearn.ensemble
- BaseEnsemble
- RandomForestClassifier : 随机森林算法,使用交叉验证,网格搜索确定最优参数组合。
- RandomForestRegressor
- RandomTreesEmbedding
- ExtraTreesClassifier
- ExtraTreesRegressor
- BaggingClassifier
- BaggingRegressor
- IsolationForest
- GradientBoostingClassifier
- GradientBoostingRegressor
- AdaBoostClassifier
- AdaBoostRegressor
- VotingClassifier
- bagging
- forest
- gradient_boosting
- partial_dependence
- weight_boosting
sklearn.naive_bayes
- BernoulliNB
- GaussianNB
- MultinomialNB : 朴素贝叶斯分类算法
sklearn.linear_model
- ARDRegression
- BayesianRidge
- ElasticNet
- ElasticNetCV
- Hinge
- Huber
- HuberRegressor
- Lars
- LarsCV
- Lasso
- LassoCV
- LassoLars
- LassoLarsCV
- LassoLarsIC
- LinearRegression :线性回归, 采用正规方程求最优解https://zhuanlan.zhihu.com/p/87866712
- Log
- LogisticRegression
- LogisticRegressionCV
- ModifiedHuber
- MultiTaskElasticNet
- MultiTaskElasticNetCV
- MultiTaskLasso
- MultiTaskLassoCV
- OrthogonalMatchingPursuit
- OrthogonalMatchingPursuitCV
- PassiveAggressiveClassifier
- PassiveAggressiveRegressor
- Perceptron
- RandomizedLasso
- RandomizedLogisticRegression
- Ridge : 岭回归(使用正则化的线性回归)https://zhuanlan.zhihu.com/p/87866712
- RidgeCV
- RidgeClassifier
- RidgeClassifierCV
- SGDClassifier
- SGDRegressor :线性回归, 采用最小二乘法计算最优值,https://zhuanlan.zhihu.com/p/87866712
- SquaredLoss
- TheilSenRegressor
- enet_path
- lars_path
- lasso_path
- lasso_stability_path
- logistic_regression_path
- orthogonal_mp
- orthogonal_mp_gram
- ridge_regression
- RANSACRegressor
sklearn.metrics
- accuracy_score
- adjusted_mutual_info_score
- adjusted_rand_score
- auc
- average_precision_score
- calinski_harabaz_score
- classification_report
- cluster
- cohen_kappa_score
- completeness_score
- confusion_matrix
- consensus_score
- coverage_error
- euclidean_distances
- explained_variance_score
- f1_score
- fbeta_score
- fowlkes_mallows_score
- get_scorer
- hamming_loss
- hinge_loss
- homogeneity_completeness_v_measure
- homogeneity_score
- jaccard_similarity_score
- label_ranking_average_precision_score
- label_ranking_loss
- log_loss
- make_scorer
- matthews_corrcoef
- mean_absolute_error
- mean_squared_error : 计算均方误差
- mean_squared_log_error
- median_absolute_error
- mutual_info_score
- normalized_mutual_info_score
- pairwise_distances
- pairwise_distances_argmin
- pairwise_distances_argmin_min
- pairwise_kernels
- precision_recall_curve
- precision_recall_fscore_support
- precision_score
- r2_score
- recall_score
- roc_auc_score
- roc_curve
- SCORERS
- silhouette_samples
- silhouette_score
- v_measure_score
- zero_one_loss
- brier_score_loss
- dcg_score
- ndcg_score