pandas missing缺失值简易笔记

Statistics and Delete

Statistics

df.isna()
df.isna.mean()

Delete

df.dropna(axis, how, subset)
df.dropna(axis, thresh)

Usage 1
subset is a list of indexes(or columns).
how is a str in [‘any’, ‘all’],
it indicates
when any or all indexes(or columns) in subset,
delete this column(or row).

Usage 2
thresh is a threshold value,
when the num of not nan is lower than thresh,
delete this row(or column).

Fill and Interpolate

Fill

s.fillna(method, limit)
s.fillna(value)
s.fillna(dict[index:value])

method is a str in [‘ffill’, ‘bfill’],
filling back with previous values or filling forward with following values.
limit is the max of consecutively filling, default no limit.

Interpolate

s.interpolate(method, limit_direction, limit)

method : str in [‘linear’, ‘index’, ‘nearest’], default ‘linear’. Interpolation technique to use.
limit : int, optional. Maximum number of consecutive NaNs to fill.
limit_direction : {{‘forward’, ‘backward’, ‘both’}}, Optional. Consecutive NaNs will be filled in this direction.

Nullable

Normal Missing

In normal series, np.nan is NaN, a float type.
In time series, np.nan is converted to NaT, a datatime type.

np.nan == np.nan  # False
np.nan is None  # False
np.nan is False  # False

pd.Series([np.nan]).equals(pd.Series([np.nan])) # True

For equals method, np.nan is equivalent to another np.nan.

Nullable Series

A new nan class pd.NA, which has not a fixed type.
The bool calculations of pd.NA are more logical.

pd.NA | True # True
pd.NA & False # False
~pd.NA # pd.NA, if result is uncertain

The arithmetic operation of pd.NA

pd.NA ** 0 # 1
1 ** pd.NA # 1

Three nullable types of series, Int64, boolean, string.

pd.Series([np.nan, 1], dtype='Int64')
pd.Series([np.nan, True], dtype = 'boolean')
pd.Series([np.nan, 'my_str'], dtype = 'string')

A calculation of nullable series returns a nullable series as much as possible.

When reading files, we could use

df = pd.read_csv(file_name)
df = df.convert_dtypes()

to make it nullable.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值