sklearn.impute.KNNImputer
Imputation for completing missing values using k-Nearest Neighbors.
Each sample’s missing values are imputed using the mean value from n_neighbors
nearest neighbors found in the training set. Two samples are close if the features that neither is missing are close.
class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5,
weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)
1、Parameters
missing_values:number, string, np.nan or None, default=`np.nan`
The placeholder for the missing values. All occurrences of missing_values
will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values
should be set to np.nan
, since pd.NA
will be converted to np.nan
.
n_neighbors:int, default=5
Number of neighboring samples to use for imputation.
weights:{‘uniform’, ‘distance’} or callable, default=’uniform’
Weight function used in prediction. Possible values:
-
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
-
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
-
callable : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
metric:{‘nan_euclidean’} or callable, default=’nan_euclidean’
Distance metric for searching neighbors. Possible values:
-
‘nan_euclidean’
-
callable : a user-defined function which conforms to the definition of
_pairwise_callable(X, Y, metric, **kwds)
. The function accepts two arrays, X and Y, and amissing_values
keyword inkwds
and returns a scalar distance value.
copy:bool, default=True
If True, a copy of X will be created. If False, imputation will be done in-place whenever possible.
add_indicator:bool, default=False
If True, a MissingIndicator
transform will stack onto the output of the imputer’s transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won’t appear on the missing indicator even if there are missing values at transform/test time.
2、Examples
import numpy as np
from sklearn.impute import KNNImputer
X = [[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]]
imputer = KNNImputer(n_neighbors=2)
imputer.fit_transform(X)
END
----------------------------------------------------------------
欢迎关注我的公众号!