#填充缺失值#可以用缺失值前面的有效值来从前往后填充
data = pd.Series([1, np.nan,2,None,3], index=list('abcde'))print(data.fillna(method='ffill'))#也可以用缺失值后面的有效值来从后往前填充print(data.fillna(method='bfill'))#DataFrame 的操作方法与 Series 类似,只是在填充时需要设置坐标轴参数 axisprint(df.fillna(method='ffill',axis=1))
a 1.0
b 1.0
c 2.0
d 2.0
e 3.0
dtype: float64
a 1.0
b 2.0
c 2.0
d 3.0
e 3.0
dtype: float64
0 1 2
0 1.0 1.0 2.0
1 2.0 3.0 5.0
2 NaN 4.0 6.0
#pandas多级索引#笨办法
index =[('California',2000),('California',2010),('New York',2000),('New York',2010),('Texas',2000),('Texas',2010)]
populations =[33871648,37253956,18976457,19378102,20851820,25145561]
pop = pd.Series(populations, index=index)#用元组创建一个多级索引
index = pd.MultiIndex.from_tuples(index)#如果将前面创建的 pop 的索引重置(reindex)为 MultiIndex,就会看到层级索引
pop = pop.reindex(index)print(pop)#现在可以直接用第二个索引获取 2010 年的全部数据,与 Pandas 的切片查询用法一致print(pop[:,2010])
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64
California 37253956
New York 19378102
Texas 25145561
dtype: int64
2000 2010
California 33871648 37253956
New York 18976457 19378102
Texas 20851820 25145561
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64
California 2000 total 33871648
under18 9267089
2010 total 37253956
under18 9284094
New York 2000 total 18976457
under18 4687374
2010 total 19378102
under18 4318033
Texas 2000 total 20851820
under18 5906301
2010 total 25145561
under18 6879014
dtype: int64
total under18
California 2000 33871648 9267089
2010 37253956 9284094
New York 2000 18976457 4687374
2010 19378102 4318033
Texas 2000 20851820 5906301
2010 25145561 6879014
subject Bob Guido Sue
type HR Temp HR Temp HR Temp
year visit
2013 1 25.0 36.7 40.0 38.5 40.0 38.5
2 32.0 35.0 42.0 37.8 47.0 36.1
2014 1 37.0 35.3 48.0 37.4 40.0 37.0
2 34.0 37.0 42.0 36.3 46.0 36.9
subject Bob Guido Sue
type HR HR HR
year visit
2013 1 25.0 40.0 40.0
2014 1 37.0 48.0 40.0
#局部切片和许多其他相似的操作都要求 MultiIndex 的各级索引是有序的(即按照字典顺序由 A 至 Z)。为此,Pandas 提供了许多便捷操作完成排序,如 sort_index() 和 sortlevel() 方法。
index = pd.MultiIndex.from_product([['a','c','b'],[1,2]])
data = pd.Series(np.random.rand(6), index=index)
data.index.names =['char','int']print(data)#data['a': 'b'] 如果MultiIndex 不是有序的索引,那么大多数切片操作都会失败。
data = data.sort_index()print(data)
char int
a 1 0.567379
2 0.095427
c 1 0.958445
2 0.151906
b 1 0.543022
2 0.908223
dtype: float64
char int
a 1 0.567379
2 0.095427
b 1 0.543022
2 0.908223
c 1 0.958445
2 0.151906
dtype: float64
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64
2000 2010
California 33871648 37253956
New York 18976457 19378102
Texas 20851820 25145561
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64
state year population
0 California 2000 33871648
1 California 2010 37253956
2 New York 2000 18976457
3 New York 2010 19378102
4 Texas 2000 20851820
5 Texas 2010 25145561
population
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561