np.nan设置为丢失的数据
例:
import pandas as pd
import numpy as np
dates = pd.date_range('20210101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index = dates,columns = ['a','b','c','d'])
df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
print(df)
输出:
a b c d
2021-01-01 0 NaN 2.0 3
2021-01-02 4 5.0 NaN 7
2021-01-03 8 9.0 10.0 11
2021-01-04 12 13.0 14.0 15
2021-01-05 16 17.0 18.0 19
2021-01-06 20 21.0 22.0 23
处理方法:df.dropna
例:
print(df.dropna(axis=0,how='any'))#any是任何nan都丢掉,all是全部是nan才丢掉
输出:
a b c d
2021-01-03 8 9.0 10.0 11
2021-01-04 12 13.0 14.0 15
2021-01-05 16 17.0 18.0 19
2021-01-06 20 21.0 22.0 23
填充:fillna
例:
print(df.fillna(value=0))
输出:
a b c d
2021-01-01 0 0.0 2.0 3
2021-01-02 4 5.0 0.0 7
2021-01-03 8 9.0 10.0 11
2021-01-04 12 13.0 14.0 15
2021-01-05 16 17.0 18.0 19
2021-01-06 20 21.0 22.0 23
isnull判断是否为丢失数据
例:
print(df.isnull())
输出:
a b c d
2021-01-01 False True False False
2021-01-02 False False True False
2021-01-03 False False False False
2021-01-04 False False False False
2021-01-05 False False False False
2021-01-06 False False False False
判断所有的数据中是否至少有一个丢失
例:
print(np.any(df.isnull()))
输出:
True