算术和广播
Series
建立两个一维数据
s1 = pd.Series([4.2,2.6, 5.4, -1.9], index=list('acde'))
s2 = pd.Series([-2.3, 1.2, 5.6, 7.2, 3.4], index= list('acefg'))
s1
a 4.2
c 2.6
d 5.4
e -1.9
dtype: float64
s2
a -2.3
c 1.2
e 5.6
f 7.2
g 3.4
dtype: float64
对数据进行算术
df1.add(df2, fill_value=0) #以df2为基础 df2中没有的 为nan
b c d e
five 6.0 NaN 7.0 8.0
one 0.0 1.0 2.0 NaN
six 9.0 NaN 10.0 11.0
three 9.0 7.0 12.0 5.0
two 3.0 4.0 6.0 2.0
对数据从新定义列 按照df2的列为准
df1.reindex(columns=df2.columns, fill_value=0) # 也可以这么干
b d e
one 0 2 0
two 3 5 0
three 6 8 0
类似add的方法还有:
add:加法
sub:减法
div:除法
floordiv:整除
mul:乘法
pow:幂次方
numpy
a = np.arange(12).reshape(3,4)
a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
a[0] # 取a的第一行,这是一个一维数组
array([0, 1, 2, 3])
a - a[0] # 二维数组减一维数组,在行方向上进行了广播
array([[0, 0, 0, 0],
[4, 4, 4, 4],
[8, 8, 8, 8]])
DateFrame
DataFrame和Series之间的操作是类似的:
df = pd.DataFrame(np.arange(12).reshape(4,3),columns=list('bde'),index=['one','two','three','four'])
s = df.iloc[0] # 取df的第一行生成一个Series
df
b d e
one 0 1 2
two 3 4 5
three 6 7 8
four 9 10 11
s
b 0
d 1
e 2
Name: one, dtype: int32
df - s # 减法会广播
b d e
one 0 0 0
two 3 3 3
three 6 6 6
four 9 9 9
#---------------------------------------------------------
s2 = pd.Series(range(3), index=list('bef'))
df + s2 # 如果存在不匹配的列索引,则引入缺失值
b d e f
one 0.0 NaN 3.0 NaN
two 3.0 NaN 6.0 NaN
three 6.0 NaN 9.0 NaN
four 9.0 NaN 12.0 NaN
#---------------------------------------------------------
s3 = df['d'] # 取df的一列
s3
one 1
two 4
three 7
four 10
Name: d, dtype: int32
df.sub(s3, axis='index') # 指定按列进行广播
b d e
one -1 0 1
two -1 0 1
three -1 0 1
four -1 0 1
函数和映射
apply (max/min,axis=0)列中最大/小值
apply (max/min,axis=1)行中最大/小值
建立维度表
#一些Numpy的通用函数对Pandas对象也有效:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('bde'),index = ['one','two','three','four'])
>>>
b d e
one -1.194842 -1.372962 0.723438
two 0.180274 -0.117977 -0.172359
three 0.115074 -0.586764 0.570921
four 1.095042 0.721313 -0.287133
#-------------------------------------------------------------
函数映射
#取df中每列的最大值与最小值差
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('bde'),index = ['one','two','three','four'])
f = lambda x: x.max() - x.min()
df.apply(f,axis='columns')