df.dropna() 过滤数据中的缺失数据

最新推荐文章于 2024-09-18 11:52:30 发布

一只山

最新推荐文章于 2024-09-18 11:52:30 发布

阅读量4k

点赞数 3

文章标签： python 数据分析

本文链接：https://blog.csdn.net/qq_44721834/article/details/121948192

版权

DataFrame.dropna（axis = 0，how = 'any'，thresh = None，subset = None，inplace = False）

参数：

axis：{0 or ‘index’, 1 or ‘columns’}, default 0，确定是否删除包含缺失值的行或列，在1.0.0版中进行了更改：将元组或列表传递到多个轴上。只允许一个轴。

how：{‘any’, ‘all’}, default ‘any’，当我们有至少一个NA或全部NA时，确定是否从DataFrame中删除行或列，'any'：如果存在任何NA值，则删除该行或列，'all'：如果所有值均为NA，则删除该行或列。

thresh：int, optional，需要许多非NA值。

subset：array-like, optional，要考虑的其他轴上的标签，例如，如果要删除行，这些标签将是要包括的列的列表。

inplace：bool, 默认为False。

官网案例

代码：

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})
df

输出：

       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

过滤掉有缺失数据

代码：

df.dropna()

输出：

     name        toy       born
1  Batman  Batmobile 1940-04-25

但此时df的值是，没有更改，因为inplace=False：

       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

剩下几个参数

df.dropna(axis='columns')          #删除有缺失值的列
df.dropna(how='all')               #将所有元素都缺失的行删除
df.dropna(thresh=2)                #仅保留至少具有2个非NA值的行
df.dropna(subset=['name', 'born']) #在name和born列中查找缺失值，一旦有缺失值就删除行
df.dropna(inplace=True)            #确认覆盖原数据

对缺失值的类型解释一下

缺失值	类型	说明
None	NoneType	None不同于空列表和空字符串，是一种单独的格式
NaN	Float	NaN是numpy\pandas下的，不是Python原生的，Not a Number的简称。
Null	-	在Python中没有这个NULL，NULL主要是在C语言中，在Python中对应的就是None
NaT	Datatime	非时间空值，Not a Time
“”	str	空字符串