过采样建模中遇到的问题

最新推荐文章于 2023-08-18 15:37:52 发布

Python伊甸园

最新推荐文章于 2023-08-18 15:37:52 发布

阅读量758

点赞数

分类专栏：数据处理

本文链接：https://blog.csdn.net/weixin_42830697/article/details/105288049

版权

数据处理专栏收录该内容

2 篇文章 0 订阅

订阅专栏

过采样建模中遇到的问题

一、问题概述：

由于过采样建模中使用了SMOTE算法，因此需要计算距离，如果说原始数据中存在空值时，此时计算距离将报错，所以需要将原始数据的空值进行处理，这样才能保证SMOTE算法运行成功。

二、问题解决：

使用sklearn中的

from sklearn.preprocessing import Imputer

模块儿，可以快速对空值数据进行处理，此模块儿使用方式如下：

class sklearn.preprocessing.Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)

各个参数含义：

1.missing_values: integer or “NaN”, optional (default=”NaN”)
2.strategy : string, optional (default=”mean”)
The imputation strategy.
If “mean”, then replace missing values using the mean along the axis. 使用平均值代替
If “median”, then replace missing values using the median along the axis.使用中值代替
If “most_frequent”, then replace missing using the most frequent value along the axis.使用众数代替，也就是出现次数最多的数
axis: 默认为 axis=0
axis = 0, 按列处理
aixs =1 , 按行处理

三、使用示例

#部分数据可能有空值，因此使用Imputer进行处理
from sklearn.preprocessing import Imputer

#必须初始化一个Imputer,参数可以修改
imp =Imputer(missing_values="NaN", strategy="most_frequent",axis=0 )
#将数据框df按照上述规则进行处理，处理后结果为df_train_tsf
df_train_tsf = imp.fit_transform(df)

注意点：Imputer只接受一个DF数据结构。

Python伊甸园

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
过采样建模中遇到的问题

过采样建模中遇到的问题一、问题概述：由于过采样建模中使用了SMOTE算法，因此需要计算距离，如果说原始数据中存在空值时，此时计算距离将报错，所以需要将原始数据的空值进行处理，这样才能保证SMOTE算法运行成功。二、问题解决：使用sklearn中的from sklearn.preprocessing import Imputer模块儿，可以快速对空值数据进行处理，此模块儿使用...
复制链接

扫一扫