原始数据
假设有以下df:
df = pd.DataFrame({'a': [1,2,3,np.nan,4,5], 'b': [4,5,np.nan,6,7,np.nan], 'c': [np.nan,np.nan,'what','how','why',np.nan]})
#
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 NaN what
3 NaN 6.0 how
4 4.0 7.0 why
5 5.0 NaN NaN
删除指定值的行
直接上代码:
# 删除列a中值为5的行:
df = df[df.a!=5]
#
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 NaN what
3 NaN 6.0 how
4 4.0 7.0 why
# 删除列a中值为5且b列中值为4的行:
df = df[(df.a!=5)&(df.b!=4)]
#
a b c
1 2.0 5.0 NaN
2 3.0 NaN what
3 NaN 6.0 how
4 4.0 7.0 why
# 使用索引删除
df = df.drop(df.index[[1, 3, 5]])
#
a b c
0 1.0 4.0 NaN
2 3.0 NaN what
4 4.0 7.0 why
删除值为Nan的行
使用dropna方法,原型:
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
其中:
- axis:指定删除行或列,默认为0,即删除行,1为列
- how:以何种方式删除,any是只要有一个Nan就删除,all是全部为Nan才删除
- thresh:阈值,Nan的个数至少为该值时才会删除
- subset:指定列
- inplace:是否原地置换
示例如下:
# 删除所有含有Nan的行
df.dropna()
#
a b c
4 4.0 7.0 why
# 删除至少有2个Nan的行
df.dropna(thresh=2)
#
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 NaN what
3 NaN 6.0 how
4 4.0 7.0 why
# 删除列a中值为Nan的行
df.dropna(subset=['a'])
#
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 NaN what
4 4.0 7.0 why
5 5.0 NaN NaN