DataFrame空值的判断和处理

六月闻君

已于 2024-03-06 09:49:04 修改

阅读量1.3k

点赞数 17

分类专栏： Python 文章标签： python

于 2024-03-06 09:47:04 首次发布

本文链接：https://blog.csdn.net/qq_39065491/article/details/136495892

版权

Python 专栏收录该内容

76 篇文章 13 订阅

订阅专栏

DataFrame空值的判断和处理

遇到DataFrame查询空值的判断和处理的问题，对dataframe的索引查询定位的理解有进了一步。

1.数据

代码如下：

df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10), 
                  columns=list('ABCD')) 
df.index.name='date'

df.loc['2023-01-03','D'] = np.nan
df.loc['2023-01-06','D'] = np.nan
print(df)

数据结果：

                   A         B         C         D
date                                              
2023-01-01  0.266526  0.106686  0.222914  2.526318
2023-01-02  0.202342  0.563700  0.898678  1.441522
2023-01-03  1.396257  0.469726  0.265863       NaN
2023-01-04  0.297822  0.341915  1.665816  0.838837
2023-01-05  0.158795  0.183489  0.078849  1.550911
2023-01-06  0.760694  0.650400  0.528726       NaN
2023-01-07  1.503186  1.518545  0.354635  0.444899
2023-01-08  1.454724  0.599582  0.336439  0.444891
2023-01-09  1.290514  0.213664  1.852228  0.577434
2023-01-10  0.252777  0.894848  1.002851  0.252982

在程序处理过程中，需要把df中是空值的数据设置为0 。

2.问题

for i in df.index :
    if df.loc[i,'D'].isnull() :
        df.loc[i,'D'] = 0
        print(i.date(),df.loc[i,'D'])

报错：

  if df.loc[i,'D'].isnull() :
AttributeError: ‘numpy.float64’ object has no attribute ‘isnull’

看看效果：

print(type (df.loc['2023-01-03','D'] ))
<class 'numpy.float64'>

果然是numpy.float64

换一个写法：

print(type(df.loc[df.index==‘2023-01-03’,‘D’]))

结果

<class 'pandas.core.series.Series'>

可以看到数据类型发生变化了，不是numpy.float64

那么再看看是否可以使用isnull方法：

print(df.loc[df.index==‘2023-01-03’,‘D’].isnull())

结果：

date
2023-01-03    True
Freq: D, Name: D, dtype: bool

可以使用isnull方法了。

print(type(df.loc[df.index==‘2023-01-03’,‘D’].isnull()))

结果：

<class 'pandas.core.series.Series'>

isnull判断后，返回的数据还是Series类型。

由于Series只有一个值：

print(df.loc[df.index==‘2023-01-03’,‘D’].isnull().values[0])

结果：

True

终于得到想要的结果了。

3.理解

（1）索引直接定位后返回数据类型不同

print(type (df.loc[‘2023-01-03’,‘D’] ))
print(type (df.loc[‘2023-01-03’,[‘A’,‘D’]]))

结果：

<class 'numpy.float64'>
<class 'pandas.core.series.Series'>

一个数据，就是float64 ，多个数据，自动变成了Series 。
神奇！

Series就可以使用isnull 方法了。

df.loc[‘2023-01-03’,[‘A’,‘D’]].isnull()

结果：

A    False
D     True
Name: 2023-01-03 00:00:00, dtype: bool

再进一步：

df.loc[‘2023-01-03’,[‘A’,‘D’]].isnull().values

结果：

array([False,  True])

数据类型肯定是numpy.ndarray

df.loc[‘2023-01-03’,[‘A’,‘D’]].isnull().values[1]

结果：

True

（2）索引条件定位后返回数据类型不同

print(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]])

结果：

                   A   D
date                    
2023-01-03  1.405551 NaN

条件查询，数据类型居然变化了！！！

print(type(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]]))

结果：

<class 'pandas.core.frame.DataFrame'>

神奇！

进一步测试：

print(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]].isnull())

结果

                A     D
date                   
2023-01-03  False  True

还是DataFrame类型！

再进一步：

print(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]].isnull().values)

结果：

[[False  True]]

print(type(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]].isnull().values))

数据类型是<class ‘numpy.ndarray’>

取值：

print(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]].isnull().values[0])
print(df.loc[df.index==‘2023-01-03’,[‘A’,‘D’]].isnull().values[0][1])

结果：

[False  True]
True

4.解决

两种方式：

for i in df.index :
    if df.loc[df.index == i,'D'].isnull().values[0] :
        df.loc[i,'D'] = 0
    print(i.date(),df.loc[i,'D'])

df.loc['2023-01-03','D'] = np.nan
df.loc['2023-01-06','D'] = np.nan

for i in df.index :
    if pd.isnull(df.loc[i,'D']):
        df.loc[i,'D'] = 1
    print(i.date(),df.loc[i,'D'])

可以自行理解一下。

注意：