pandas最重要的一个功能是,它可以对不同索引的对象进行算数运算。在将对象相加时,如果存在不同的索引对,则结果的索引就是该索引对的并集。
s1 = pd.Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
# a 7.3
# c -2.5
# d 3.4
# e 1.5
s2 = pd.Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g'])
# a -2.1
# c 3.6
# e -1.5
# f 4.0
# g 3.1
print s1+s2
# a 5.2
# c 1.1
# d NaN
# e 0.0
# f NaN
# g NaN
自动的数据对齐操作在不重叠的索引处引入了NA值。缺失值会在算数运算过程中传播。
df1 = pd.DataFrame(np.arange(9).reshape((3,3)),columns=list('bcd'),index=['Ohio','Texas','Colorado'])
# b c d
# Ohio 0 1 2
# Texas 3 4 5
# Colorado 6 7 8
df2 = pd.DataFrame(np.arange(12).reshape((4,3)),columns=list('bcd'),index=['Utah','Ohio','Texas','Oregon'])
# b c d
# Utah 0 1 2
# Ohio 3 4 5
# Texas 6 7 8
# Oregon 9 10 11
df1+df2
# b c d
# Colorado NaN NaN NaN
# Ohio 3.0 5.0 7.0
# Oregon NaN NaN NaN
# Texas 9.0 11.0 13.0
# Utah NaN NaN NaN
使用df1的add方法,传入df2以及一个fill_value参数:
df1.add(df2,fill_value=0)
# b c d
# Colorado 6.0 7.0 8.0
# Ohio 3.0 5.0 7.0
# Oregon 9.0 10.0 11.0
# Texas 9.0 11.0 13.0
# Utah 0.0 1.0 2.0
与此类似,在对Series或DataFrame重新索引时,也可以指定一个填充值:
df1.reindex(index=df2.index,fill_value=0)
# b c d
# Utah 0 0 0
# Ohio 0 1 2
# Texas 3 4 5
# Oregon 0 0 0