Python sklearn模型选择、使用那些事儿
一. 主要功能
1.classification 分类
2.Regression 回归
3.Clustering 聚类
4.Dimensionality reduction 降维
5.Model selection 模型选择
6.Preprocessing 预处理
二. 常用模块
1.sklearn.model_selection: Model Selection
2.sklearn.datasets: Datasets
3.sklearn.multiclass: Multiclass and multilabel classification
4.sklearn.multioutput: Multioutput regression and classification
5.sklearn.naive_bayes: Naive Bayes
6.sklearn.neighbors: Nearest Neighbors
7.sklearn.neural_network: Neural network models
8.sklearn.preprocessing: Preprocessing and Normalization
9.sklearn.semi_supervised: Semi-Supervised Learning
10.sklearn.svm: Support Vector Machines
11.sklearn.tree: Decision Tree
...
三. 数据预处理
from sklearn import preprocessing
1.将数据转化为标准正态分布(均值为0,方差为1)
preprocessing.scale(X,axis=0, with_mean=True, with_std=True, copy=True)
2.将数据在缩放在固定区间,默认缩放到区间 [0, 1]
preprocessing.minmax_scale(X,feature_range=(0, 1), axis=0, copy=True)
3.数据的缩放比例为绝对值最大值,并保留正负号,即在区间 [-1.0, 1.0] 内
(唯一可用于稀疏数据 scipy.sparse的标准化)
preprocessing.maxabs_scale(X,axis=0, copy=True)
四. 数据集
常用的是将数据集分为训练集和测试集
from sklearn.