pandas删除某列有空值的行_pandas中处理缺失值:dropna

分析数据免不了遇到很多空值的情况,如果想去除这些空值,pandas设置了专门的函数:dropna(),下面将对dropna()进行详细的介绍


dropna()

c06afe2a98bfd11bef7d9c7e4b5993cf.png

需要重点掌握的知识点:

  • 第一点需要确定的参数就是axis,0:行,1:列
  • 当inplace=True时,how建议设置为"all"
  • 建议采用默认返回新对象的方法,不要对原始数据进行修改
  • subset建议每次都用上,更有针对性
  • thresh为非空的值得数量,小于该数量将会被删除

首先需要判断是否含有空值:

  • isna()

df.isna()

结果

1b21724b68d5628bec700c81a8f9fd78.png
  • isnull()

df.isnull()

结果

ef20ba91c62a6ff9ca8afe49e04d8631.png

判断是否全部为空:

  • isna().any()or isnull.any(),两个函数是一样的

df.isnull().any()

结果:

6fb406d519ca79e827bafc2f20c0ebc0.png

判断某一列是否为空:

df['toy'].isnull()

df['toy'].isnull().any()

eee66cd8830eb55d399c681508ae0d35.png

下面正式学习:dropna()

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)[source]

axis :为轴方向 : 默认为axis=0

当axis=0,当某行出现缺失值时,将该行丢弃并返回

当axis=1,当某列出现缺失值时,将改列丢弃并返回

how :确定缺失值的个数:缺省时为how=‘any’

how=‘any’ ,表明只要某行或者列出现缺失值就将该行列丢弃

how=‘all’ ,表明某行列全部为缺失值才将其丢弃

thresh:阈值设定

当行列中非缺省值的数量少于给定的值就将该行丢弃

subset:部分标签中删除某行列

subset = [ 'a','d'] 即丢弃子列 a d 中含有缺失值的行

iniplace: bool取值,默认False

当inplace= True, 即对原数据操作,没有返回值


实例学习:

pd.dropna():

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],

"toy": [np.nan, 'Batmobile', 'Bullwhip'],

"born": [pd.NaT, pd.Timestamp("1940-04-25"),

pd.NaT]})

rs=df.dropna()

print(df)

print("="*40)

print(rs)

默认设置情况下:

结果:

b6603af8a6fc71383a1a0d6bfcf78854.png

axis:

默认为0,删除含有缺失值的行,axis=1删除含有缺失值的列

rs=df.dropna(axis=1)

结果:

4b3210ac79ceb6698e05ab2a0ee423b3.png

how:默认为‘any’

how=‘any’ ,表明只要某行或者列出现缺失值就将该行列丢弃

how=‘all’ ,表明某行列全部为缺失值才将其丢弃

重新构建数据,增加一列和一行空值:

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],

"toy": [np.nan, 'Batmobile', 'Bullwhip'],

"born": [pd.NaT, pd.Timestamp("1940-04-25"),

pd.NaT]})

df["p"]=np.nan

df.loc["4"]= np.nan

how="all":

axis=1:

rs=df.dropna(axis=1,how="all")

结果:

cd452fce9879801ecf041181baca06a0.png

axis=0:

rs=df.dropna(axis=0,how="all")

结果

813463e26a2d4acc5043850d99f71663.png

thresh:

行或列至少保留的非空值的数量,关键是非空的数量

传入一个整数值,当行或列低于该值时删除,大于等于时不删除

当没行至少有一个不是空值时保留,全部为空时删除

rs=df.dropna(axis=0,thresh=1)

结果

e1debc897df583584a62f2e86e32850b.png

当每行至少有2个不是空值时保留,全部为空时删除

rs=df.dropna(axis=0,thresh=2)

结果

0e5b9e155e1bc01c7b077145a12450e8.png

subset:注意,只能删除行,需要给定列标签,不能删除列

subset = [ 'a','d'] 即丢弃子列 a d 中含有缺失值的行

删除toy中含有空值的行,

rs=df.dropna(axis=0,subset=["toy"])

结果

0f9def99cda27e0963a3584faef98179.png

iniplace:

默认返回新的对象,如果需要对原始数据进行修改,可以设置为:True

print(df)

print("="*40)

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],

"toy": [np.nan, 'Batmobile', 'Bullwhip'],

"born": [pd.NaT, pd.Timestamp("1940-04-25"),

pd.NaT]})

df["p"]=np.nan

df.loc["4"]= np.nan

# rs=df.dropna(axis=0,subset=["toy"])

df.dropna(axis=0,subset=["toy"],inplace=True)

print(df)

结果

3ff5c381fc073db7dfc423c72aead3dd.png

推荐学习链接:https://blog.csdn.net/ping0912/article/details/86296365

英文版解释:

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)[source]

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.
  • 1, or ‘columns’ : Drop columns which contain missing value.

Deprecated since version 0.23.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how : {‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.
  • ‘all’ : If all values are NA, drop that row or column.

thresh : int, optional

Require that many non-NA values.

subset : array-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace : bool, default False

If True, do operation inplace and return None.

表情包
插入表情
评论将由博主筛选后显示,对所有人可见 | 还能输入1000个字符
相关推荐
©️2020 CSDN 皮肤主题: 1024 设计师:白松林 返回首页