import pandas as pd
a=pd.DataFrame({'name':['cindy','alice','ared'],'grade':[80,90,75],'age':[18,19,20]})
b=pd.DataFrame({'name':['cindy','alice','bob','ared'],'grade':[95,70,60,50]})#表a的grade列计算极差
f=lambda x:x.max()-x.min()print('a的grade列的极差:')print(pd.DataFrame(a['grade']).apply(f,axis=0))#默认按照索引相加,不匹配的索计算引结果为NaNprint('a+b:\n',a+b)print('a与b的grade列相加:\n',pd.DataFrame(a['grade']+b['grade']))print(a.loc[:,['grade']]+b.loc[:,['grade']])#不同索引的相同字段值对应的指标相加:筛选并重置索引,然后计算print('a的ared+b的ared:\n',pd.DataFrame(a.loc[a['name']=='ared',['grade']]).reset_index(drop=True)+pd.DataFrame(b.loc[b['name']=='ared',['grade']]).reset_index(drop=True))#将计算结果的空值填充为0print('将计算结果的空值填充为0:\n',(a+b).fillna(0))#将不存在的索引对应的值填充为0,然后计算print('将不存在的索引对应的值填充为0,然后计算:\n',pd.DataFrame(a['grade'].add(b['grade'],fill_value=0)))
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
a的grade列的极差:
grade 15
dtype: int64
a+b:
age grade name
0 NaN 175.0 cindycindy
1 NaN 160.0 alicealice
2 NaN 135.0 aredbob
3 NaN NaN NaN
a与b的grade列相加:
grade
0175.01160.02135.03 NaN
grade
0175.01160.02135.03 NaN
a的ared+b的ared:
grade
0125
将计算结果的空值填充为0:
age grade name
00.0175.0 cindycindy
10.0160.0 alicealice
20.0135.0 aredbob
30.00.00
将不存在的索引对应的值填充为0,然后计算:
grade
0175.01160.02135.0350.0
实例:计算两表中相同名称对应的指标和
import pandas as pd
a=pd.DataFrame({'name':['cindy','alice','ared'],'grade':[80,90,75],'age':[18,19,20]})
b=pd.DataFrame({'name':['cindy','alice','bob','ared'],'grade':[95,70,60,50]})print('法1:按索引计算两表的指标列的和,然后合并两表,将合并结果与指标列的和再通过索引合并')
a_b=pd.merge(a,b,how='right',left_index=True,right_index=True,suffixes=('_a','_b')).fillna(0)print('a右连接b:\n',a_b)print('将不存在的索引对应的值填充为0,然后计算:\n',pd.merge(a_b,pd.DataFrame(a['grade'].add(b['grade'],fill_value=0)),left_index=True,right_index=True))print('\n法2:直接合并表,并新增求和字段')
a_b['grade_sum']=a_b['grade_a']+a_b['grade_b']print(a_b)
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
法1:按索引计算两表的指标列的和,然后合并两表,将合并结果与指标列的和再通过索引合并
a右连接b:
name_a grade_a age name_b grade_b
0 cindy 80.018.0 cindy 951 alice 90.019.0 alice 702 ared 75.020.0 bob 60300.00.0 ared 50
将不存在的索引对应的值填充为0,然后计算:
name_a grade_a age name_b grade_b grade
0 cindy 80.018.0 cindy 95175.01 alice 90.019.0 alice 70160.02 ared 75.020.0 bob 60135.0300.00.0 ared 5050.0
法2:直接合并表,并新增求和字段
name_a grade_a age name_b grade_b grade_sum
0 cindy 80.018.0 cindy 95175.01 alice 90.019.0 alice 70160.02 ared 75.020.0 bob 60135.0300.00.0 ared 5050.0
实例:分类统计
import pandas as pd
import numpy as np
a=pd.DataFrame({'sex':['female','female','male','female'],'name':['cindy','alice','ared','cindy'],'grade':[80,90,75,70],'sale':[18,19,20,23]})print('各人最大值:\n',a.groupby('name').max())print('各性别各人最小值:\n',a.groupby(['sex','name']).min())print('各性别各人销售总额:\n',a.groupby(['sex','name'])['sale'].sum())print('各性别各人的指标个数、指标和、指标均值:\n',a.groupby(['sex','name'])['grade','sale'].agg([len,np.sum,np.mean]))print('透视:\n',pd.pivot_table(a,index=['name'],columns=['sex'],values=['grade','sale'],aggfunc=[np.sum,np.mean],fill_value=0,margins=True))print('透视总分、平均销售:\n',pd.pivot_table(a,index=['name'],columns=['sex'],values=['grade','sale'],aggfunc={'grade':np.sum,'sale':np.mean},fill_value=0,margins=True))
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
各人最大值:
sex grade sale
name
alice female 9019
ared male 7520
cindy female 8023
各性别各人最小值:
grade sale
sex name
female alice 9019
cindy 7018
male ared 7520
各性别各人销售总额:
sex name
female alice 19
cindy 41
male ared 20
Name: sale, dtype: int64
各性别各人的指标个数、指标和、指标均值:
grade sale
lensum mean lensum mean
sex name
female alice 1909011919.0
cindy 21507524120.5
male ared 1757512020.0
透视:
sum mean
grade sale grade sale
sex female male All female male All female male All female male All
name
alice 900901901990090.019.0019.0
ared 075750202007575.00.02020.0
cindy 15001504104175075.020.5020.5
All 24075315602080807578.020.02020.0
透视总分、平均分、平均销售:
grade sale
sex female male All female male All
name
alice 9009019.0019.0
ared 075750.02020.0
cindy 150015020.5020.5
All 2407531520.02020.0