错误示例:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})
df = df[df.notnull().any(axis = 0)]
print df
完整报错:
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
出现此问题的原因是Series索引列未对齐
df[ ]是寻找基于行的索引,而不是列的索引
解决办法:
# 使用loc函数
print (df.notnull().any(axis = 0))
a True
b True
c True
d False
dtype: bool
df = df.loc[:, df.notnull().any(axis = 0)]
或者修改行标签之后用 [ ] 选择
print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')
df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)
或
print (df.dropna(axis=1, how='all'))