如何用pandas处理缺失值

最新推荐文章于 2024-08-08 22:47:19 发布

zoujiahui_2018

最新推荐文章于 2024-08-08 22:47:19 发布

阅读量2.4k

点赞数 1

分类专栏： python

本文链接：https://blog.csdn.net/qq_18055167/article/details/105123831

版权

python 专栏收录该内容

69 篇文章 2 订阅

订阅专栏

文章目录

对Series和单变量情况的缺失值判断
对dataframe缺失值的判断
将dataframe含有NaN的行或列去掉
对缺失值进行简单填充
对缺失值进行插补

对Series和单变量情况的缺失值判断

import numpy as np
#这里是之前的代码......
#若要判断变量Number是不是nan
np.isnan(number)

对dataframe缺失值的判断

缺失值在统计分析中经常被用到，在R语言中，is.na()、is.nan()和is.infinite()可分别用来识别缺失值、不可能值和无穷值。在python中空值为None, 在java中空值为null，但是到pandas中空值被显示为NaN。另外，pandas中使用df.isnull()或者df.isna()来判断是否为缺失值。
在这里插入图片描述

将dataframe含有NaN的行或列去掉

在pandas中可以使用df.dropna(axis=0)或者df.dropna(axis=1)将带有缺失值的行列去掉
在这里插入图片描述

对缺失值进行简单填充

df.fillna(0)#用0填补所有缺失值
df.fillna(method=‘ffill’) #用前一个值填充
df.fillna(method=‘bfill’) #用后一个值填充
df.fillna({‘a’:100,‘b’:200,‘c’:300})#不同的列用不同的值填充

对缺失值进行插补

在旧版的sklearn中，利用skearn.preprocessing中的Imputer方法可以对数据进行插补。
在新版的sklearn中，调用impute里面的SimpleImputer函数。

class sklearn.imputer.SimpleImputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)

参数：

missing_values: integer or “NaN”, optional (default=”NaN”)
strategy : string, optional (default=”mean”)
The imputation strategy.
If “mean”, then replace missing values using the mean along the axis. 使用平均值代替
If “median”, then replace missing values using the median along the axis.使用中值代替
If “most_frequent”, then replace missing using the most frequent value along the axis.使用众数代替，也就是出现次数最多的数
axis: 默认为 axis=0
axis = 0, 按列处理
aixs =1 , 按行处理

注意：

Imputer 只接受DataFrame类型
Dataframe 中必须全部为数值属性

可以参考：
https://blog.csdn.net/dss_dssssd/article/details/82831240?depth_1-

在这里插入图片描述