基于KNNImputer缺失值填充

最新推荐文章于 2025-04-11 15:50:25 发布

Echo-z

最新推荐文章于 2025-04-11 15:50:25 发布

阅读量3.7k

点赞数 1

分类专栏：机器学习算法文章标签： python 机器学习数据挖掘

本文链接：https://blog.csdn.net/weixin_39948381/article/details/108352471

版权

机器学习算法专栏收录该内容

12 篇文章

订阅专栏

sklearn.impute.KNNImputer

Imputation for completing missing values using k-Nearest Neighbors.

Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set. Two samples are close if the features that neither is missing are close.

class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5, 
weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)

1、Parameters

missing_values：number, string, np.nan or None, default=`np.nan`

The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.

n_neighbors：int, default=5

Number of neighboring samples to use for imputation.

weights：{‘uniform’, ‘distance’} or callable, default=’uniform’

Weight function used in prediction. Possible values:

‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
callable : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

metric：{‘nan_euclidean’} or callable, default=’nan_euclidean’

Distance metric for searching neighbors. Possible values:

‘nan_euclidean’
callable : a user-defined function which conforms to the definition of _pairwise_callable(X, Y, metric, **kwds). The function accepts two arrays, X and Y, and a missing_values keyword in kwds and returns a scalar distance value.

copy：bool, default=True

If True, a copy of X will be created. If False, imputation will be done in-place whenever possible.

add_indicator：bool, default=False

If True, a MissingIndicator transform will stack onto the output of the imputer’s transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won’t appear on the missing indicator even if there are missing values at transform/test time.

2、Examples

import numpy as np
from sklearn.impute import KNNImputer
X = [[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]]
imputer = KNNImputer(n_neighbors=2)
imputer.fit_transform(X)