numpy pandas 缺失值处理

最新推荐文章于 2024-05-19 01:00:00 发布

assassin_sword

最新推荐文章于 2024-05-19 01:00:00 发布

阅读量1.4k

点赞数

分类专栏： Python数据分析

本文链接：https://blog.csdn.net/weixin_41521681/article/details/106099467

版权

Python数据分析专栏收录该内容

75 篇文章 10 订阅

订阅专栏

index_nan = df_info.index[ np.isnan(df_info["type"]) ]
df_info = df_info.loc[df_info.index.drop(index_nan), ]
df_info = df_info.loc[~np.isnan(df_info["type"]), ]
del index_nan

更简单的写法

df_info = df_info.loc[~np.isnan(df_info["type"]), :]
df_info = df_info.loc[np.invert(np.isnan(df_info["type"])), :]

列表推导式
a = [i for i in a if i is not np.nan]

pandas 缺失值处理

#coding=utf-8
import numpy as np
import pandas as pd

#创建DataFrame
df = pd.DataFrame(np.arange(12, 32).reshape((5, 4)), index=["a", "b", "c", "d", "e"], columns=["WW", "XX", "YY", "ZZ"])
df.loc[["b"],["YY"]] = np.nan   # NaN是float类型，对应列会自动变成float类型。
df.loc[["d"],["XX"]] = np.nan
print(df)

  WW    XX    YY  ZZ
a  12  13.0  14.0  15
b  16  17.0   NaN  19
c  20  21.0  22.0  23
d  24   NaN  26.0  27
e  28  29.0  30.0  31

pandas判断是否是NaN
print(pd.isnull(df))

      WW     XX     YY     ZZ
a  False  False  False  False
b  False  False   True  False
c  False  False  False  False
d  False   True  False  False
e  False  False  False  False

pandas判断是否不是NaN
print(pd.notnull(df))

     WW     XX     YY    ZZ
a  True   True   True  True
b  True   True  False  True
c  True   True   True  True
d  True  False   True  True
e  True   True   True  True

也可以只判断某一列的NaN
print(pd.notnull(df["XX"]))

a     True
b     True
c     True
d    False
e     True
Name: XX, dtype: bool

布尔索引
print(df[pd.notnull(df["YY"])]) # 选出"YY"列不为NaN的所有行

   WW    XX    YY  ZZ
a  12  13.0  14.0  15
c  20  21.0  22.0  23
d  24   NaN  26.0  27
e  28  29.0  30.0  31

NaN的处理方式一：直接删除
df1 = df.dropna(axis=0) # axis=0表示删除行,axis=1表示删除列
print(df1)
```
   WW    XX    YY  ZZ
a  12  13.0  14.0  15
c  20  21.0  22.0  23
e  28  29.0  30.0  31
```
df2 = df.dropna(axis=0, how="all") # how="all"表示某行(列)全为NaN才会删除。how="any"表示只要有一个NaN就会删除(默认)。
df.dropna(axis=0, inplace=True) # inplace=True表示原地修改，修改后的结果直接作用于原df。默认False
NaN的处理方式二：填充
df2 = df.fillna(100) # 填充成100
- 填充平均值
  df3 = df.fillna(df.mean()) # df.mean()表示每一列的平均值（Series类型）。 df.median()中位数
- 可以只填充某一列
  df4 = df["YY"].fillna(df["YY"].mean()) # df.mean()表示每一列的平均值（Series类型）
  df["YY"] = df["YY"].fillna(df["YY"].mean()) # 只填充某一列，再赋值给原df的对应列。
计算均值时，NaN不会参与计算，但0会
df[df==0] = np.nan # df.mean()计算均值时，NaN不会参与计算，但0会

assassin_sword

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
numpy pandas 缺失值处理

index_nan = df_info.index[ np.isnan(df_info["type"]) ]df_info = df_info.loc[df_info.index.drop(index_nan), ]df_info = df_info.loc[~np.isnan(df_info["type"]), ]del index_nan更简单的写法df_info = df_info.loc[~np.isnan(df_info["type"]), :]df_info = df_info.l
复制链接

扫一扫