Pandas的DataFrame的算术运算
DataFrame类型和Series类型一样也支持算法运算,但DataFrame是二维的,DataFrame的算术运算的两个数据可以都是DataFrame,也可有一个是数值scalar。
1).如果其中一个是数值,那么这个数值会和DataFrame的每个位置上的数据进行相应的运算。
import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
print df * 2
print df + 2
2).参与运算的如果是两个DataFrame,有可能所有的行、列是一致的,那么运算时对应行列的位置进行相应的算术运算,若行列没有对齐,那么填值NaN。
import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df.iloc[:,1:3]
print s2
print df + s2
结果为:
a b c d
0 0.483973 0.645901 -1.035946 0.195398
1 -0.008440 -0.433560 -1.179151 0.840267
2 0.399064 0.621388 -1.935247 -0.064402
3 1.096569 0.739594 -0.795671 -1.431564
4 0.169745 -0.713899 1.513129 -0.977025
b c
0 0.645901 -1.035946
1 -0.433560 -1.179151
2 0.621388 -1.935247
3 0.739594 -0.795671
4 -0.713899 1.513129
a b c d
0 NaN 1.291802 -2.071893 NaN
1 NaN -0.867120 -2.358301 NaN
2 NaN 1.242777 -3.870493 NaN
3 NaN 1.479188 -1.591342 NaN
4 NaN -1.427797 3.026257 NaN
3). 如果参与运算的一个是DataFrame,另一个是Series,那么pandas会对Series进行行方向的广播,然后做相应的运算。
import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df.iloc[0]
print s2
print df + s2
执行结果:
a b c d
0 -1.238851 -2.682975 1.127531 -1.205118
1 -0.164544 -0.811380 1.418037 0.356827
2 0.322918 -0.818707 0.428460 -1.142152
3 -0.205018 1.837780 -0.353513 1.731527
4 1.395693 0.377382 0.746702 0.757560
a -1.238851
b -2.682975
c 1.127531
d -1.205118
Name: 0, dtype: float64
a b c d
0 -2.477701 -5.365949 2.255061 -2.410237
1 -1.403394 -3.494354 2.545568 -0.848292
2 -0.915933 -3.501682 1.555990 -2.347271
3 -1.443869 -0.845195 0.774018 0.526409
4 0.156842 -2.305593 1.874232 -0.447559
4). 参与运算的两个DataFrame并非完全一样,即行列个数和行列名有可能都不同,那么有对应上的就做运算,无填充NaN。
import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df[1:4][["b", "d"]]
print s2
print df - s2
程序结果:
a b c d
0 -0.642915 -0.607192 -0.297931 0.732260
1 0.797971 0.366959 0.017239 -0.448221
2 -0.061617 1.880258 0.351112 0.600822
3 -0.398104 -1.161508 -2.210417 -0.127446
4 0.485083 0.279539 1.316857 0.052885
b d
1 0.366959 -0.448221
2 1.880258 0.600822
3 -1.161508 -0.127446
a b c d
0 NaN NaN NaN NaN
1 NaN 0 NaN 0
2 NaN 0 NaN 0
3 NaN 0 NaN 0
4 NaN NaN NaN NaN
5). 列方向也有相应的计算处理方式。如果是列方向的运算,一个是dataFrame,另一个是Series,首先将Series沿列方向广播,然后运算。
import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df['a']
print s2
print df.sub(s2, axis = 0)
执行结果:
a b c d
0 2.110223 0.470813 0.671169 -1.005801
1 -0.566596 0.507211 0.639038 0.140981
2 -0.447541 0.467905 -0.877711 -1.020221
3 1.068080 0.866918 -0.284191 -0.888743
4 1.033273 -1.125950 0.537627 -0.803254
0 2.110223
1 -0.566596
2 -0.447541
3 1.068080
4 1.033273
Name: a, dtype: float64
a b c d
0 0 -1.639410 -1.439054 -3.116024
1 0 1.073806 1.205634 0.707577
2 0 0.915446 -0.430171 -0.572681
3 0 -0.201162 -1.352270 -1.956822
4 0 -2.159223 -0.495646 -1.836526