你可能不想滤除缺失数据(有可能会丢弃跟它有关的其他数
据),⽽是希望通过其他⽅式填补那些
“
空洞
”
。对于⼤多数情况
⽽⾔,
fillna
⽅法是最主要的函数。通过⼀个常数调⽤
fillna
就会将
缺失值替换为那个常数值:
df原始数据:
0 1 2
0 1.043855 NaN NaN
1 0.735691 NaN NaN
2 1.323692 NaN 0.446414
3 -0.022705 NaN -0.652726
4 -1.527293 -0.701998 -1.485825
5 0.574229 0.053224 1.534053
6 -0.741115 -0.273540 -0.090691
0 1.043855 NaN NaN
1 0.735691 NaN NaN
2 1.323692 NaN 0.446414
3 -0.022705 NaN -0.652726
4 -1.527293 -0.701998 -1.485825
5 0.574229 0.053224 1.534053
6 -0.741115 -0.273540 -0.090691
data1 = df.fillna(0) print(data1)
若是通过⼀个字典调⽤
fillna
,就可以实现对不同的列填充不同的
值:
df.fillna({1: 0.5, 2: 0})
Out[34]:
0 1 2
0 -0.204708 0.500000 0.000000
1 -0.555730 0.500000 0.000000
2 0.092908 0.500000 0.769023
3 1.246435 0.500000 -1.296221
4 0.274992 0.228913 1.352917
5 0.886429 -2.001637 -0.371843
6 1.669025 -0.438570 -0.539741
fillna
默认会返回新对象,但也可以对现有对象进⾏就地修改:
In [35]: _ = df.fillna(0, inplace=True)
In [36]: df
Out[36]:
0 1 2
0 -0.204708 0.000000 0.000000
1 -0.555730 0.000000 0.000000
2 0.092908 0.000000 0.769023
3 1.246435 0.000000 -1.296221
4 0.274992 0.228913 1.352917
5 0.886429 -2.001637 -0.371843
6 1.669025 -0.438570 -0.539741
对
reindexing
有效的那些插值⽅法也可⽤于
fillna
:
In [37]: df = pd.DataFrame(np.random.randn(6, 3))
In [38]: df.iloc[2:, 1] = NA
In [39]: df.iloc[4:, 2] = NA
In [40]: df
Out[40]:
0 1 2
0 0.476985 3.248944 -1.021228
1 -0.577087 0.124121 0.302614
2 0.523772 NaN 1.343810
3 -0.713544 NaN -2.370232
4 -1.860761 NaN NaN
5 -1.265934 NaN NaN
263
In [41]: df.fillna(method='ffill')
Out[41]:
0 1 2
0 0.476985 3.248944 -1.021228
1 -0.577087 0.124121 0.302614
2 0.523772 0.124121 1.343810
3 -0.713544 0.124121 -2.370232
4 -1.860761 0.124121 -2.370232
5 -1.265934 0.124121 -2.370232
In [42]: df.fillna(method='ffill', limit=2)
Out[42]:
0 1 2
0 0.476985 3.248944 -1.021228
1 -0.577087 0.124121 0.302614
2 0.523772 0.124121 1.343810
3 -0.713544 0.124121 -2.370232
4 -1.860761 NaN -2.370232
5 -1.265934 NaN -2.370232
只要有些创新,你就可以利⽤
fillna
实现许多别的功能。⽐如说,
你可以传⼊
Series
的平均值或中位数:
In [43]: data = pd.Series([1., NA, 3.5, NA, 7])
In [44]: data.fillna(data.mean())
Out[44]:
0 1.000000
1 3.833333
2 3.500000
3 3.833333
4 7.000000