第十篇，数据分析之pandas的处理缺失值

最新推荐文章于 2023-07-08 21:37:59 发布

萌新求大佬

最新推荐文章于 2023-07-08 21:37:59 发布

阅读量311

点赞数 1

分类专栏：数据分析文章标签： Numpy python 数据清洗 pandas

本文链接：https://blog.csdn.net/weixin_43779803/article/details/102874175

版权

数据分析专栏收录该内容

14 篇文章 1 订阅

订阅专栏

在python中可能会遇到，None表示空的对象，空也是一个对象。

Numpy中的缺失数据

a = np.array([1,2,3,None])
b = np.array([5,6,7,8])
print(a.dtype)
print(b.dtype)

object
int32

因为a中有一个None对象，所以访问元素类型的时候返回的是一个object类型，这个时候我们Numpy就提供了一个表示空缺数据的方法np.nan

c = np.array([1,np.nan,3,4])
print(c.dtype)

float64

对比a和c发现两个的结果是完全不同的数据类型，一个是object，另一个是浮点数类型，也因为这样np.nan才可以像数字一样进行各种运算。

print(1-np.nan)
print(1+np.nan)
print(1*np.nan)

nan
nan
nan

可以发现所以数跟np.nan进行数字运算得到的结果都是nan，假如真的想要计算的话，这里可以使用到一个方法自动帮我们导入一个数字。

print(np.nansum(c))
print(np.nanmean(c))

8.0
2.6666666666666665

Pandas处理缺失数据

a = pd.Series([1,np.nan,2,None,3])
print(a)
print(a.sum())

0    1.0
1    NaN
2    2.0
3    NaN
4    3.0
dtype: float64
6.0

你会发现Pandas它会把空值缺失值都转换为NaN，而且这里使用数字运算，会默认把NaN当作0相加，这里Pandas还提供了4个针对缺失数据进行操作的函数，如下：

print(a.isnull())
print(a.notnull())
print(a[a.notnull()])
print(a.dropna())

0    False
1     True
2    False
3     True
4    False
dtype: bool
0     True
1    False
2     True
3    False
4     True
dtype: bool
0    1.0
2    2.0
4    3.0
dtype: float64
0    1.0
2    2.0
4    3.0
dtype: float64

isnull()和notnull()返回的是布尔型数据，用返回的对象作为下标，可对原有数据进行筛选，dropna()的作用是删除所有包含NaN数据的行（默认是行，也可以使用axis=1来删除列），并返回不含NaN的新对象，具体操作可以自行查看帮助文档。

print(a)
print(a.fillna(123))
print(a.fillna(method='ffill'))
print(a.fillna(method='bfill'))

0    1.0
1    NaN
2    2.0
3    NaN
4    3.0
dtype: float64
0      1.0
1    123.0
2      2.0
3    123.0
4      3.0
dtype: float64
0    1.0
1    1.0
2    2.0
3    2.0
4    3.0
dtype: float64
0    1.0
1    2.0
2    2.0
3    3.0
4    3.0
dtype: float64