1.处理丢失数据
frame.dropna(axis=0, how=‘any’)
axis=0代表处理行,axis=1代表处理列
how='any’表示有NaN就处理,how='all’表示一行都是NaN才处理
frame.fillna(value=0)
把表中NaN的部分都换成0
frame.isnull()
返回False表示数据没有丢失,返回Ture表示有数据是NaN
若表格很大,不好观察是否有True,可以用np.any(frame.isnull())==True,返回Ture表示有数据是NaN
import numpy as np
import pandas as pd
data = {
'name': ['zhao', 'qian', 'sun', 'li', 'wang'],
'bir': [1996, 1997, 1998, 1999, 2000],
'old': [25, 24, 23, 22, 21]
}
frame = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e'])
frame['gener'] = pd.Series(['g', 'b', 'g', 'b', 'g'], index=['a', 'b', 'c', 'd', 'e'])
frame.iloc[1, 2] = np.nan
frame.iloc[3, 2] = np.nan
print(frame)
print(frame.dropna(axis=0, how='any'))
print(frame.fillna(value=0))
print(frame.isnull())
print(np.any(frame.isnull()) == Ture)
print(frame.dropna(axis=0, how=‘any’))
print(frame.isnull())
print(np.any(frame.isnull()) == Ture)