‘’’
sklearn类提供了缺失值处理的基本策略,比如使用缺失值数值所在行或者列的均值,中位数,众数来替换缺失值,该类也兼容不同额缺失值编码
‘’’
import numpy as np
from sklearn.preprocessing import Imputer
'''
missing_values : integer or "NaN", optional (default="NaN")
The placeholder for the missing values. All occurrences of
`missing_values` will be imputed. For missing values encoded as np.nan,
use the string value "NaN".
strategy : string, optional (default="mean")
The imputation strategy.
- If "mean", then replace missing values using the mean along
the axis.
- If "median", then replace missing values using the median along
the axis.
- If "most_frequent", then replace missing using the most frequent
value along the axis.
axis : integer, optional (default=0)
The axis along which to impute.
- If `axis=0`, then impute along columns.
- If `axis=1`, then impute along rows.
'''
imp = Imputer(missing_values="NaN",strategy='mean',axis=0)
imp.fit([[1,2],[np.nan,3],[7,6]])
X = [[np.nan, 2], [6, np.nan], [7, 6]]
print(imp.transform(X))
[[4. 2. ]
[6. 3.66666667]
[7. 6. ]]