pandas 讲清楚pivot_table和margins
1. DataFrame.pivot_table
参数 (values=None, index=None, columns=None, aggfunc=‘mean’, fill_value=None, margins=False, dropna=True, margins_name=‘All’, observed=False)
Create a spreadsheet-style pivot table as a DataFrame.
参数们控制的区域如下图 :
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
print(df)
# 需求 : 将AB列作为index,C做为columns,对D列进行求和,用0填充nan
# 1. 指定了values,然后aggfunc会对这顶的列进行聚合
pt = df.pivot_table(index=['A','B'],columns='C',values=['D'],aggfunc=np.sum,fill_value=0)
print(pt)
# 2. 不指定values,通过aggfunc字典的方式在字典中指定对哪列进行聚合
pt = df.pivot_table(index=['A','B'],columns='C',aggfunc={'D':np.sum},fill_value=0)
print(pt)
# 需求 : 针对D进行最大值, E进行最小值显示
pt = df.pivot_table(index=['A','B'],values=['D','E'],aggfunc={'D':max,'E':min},fill_value=0)
print(pt)
2. 当我们遇到用margins=True时候, 报错 KeyError的情况
# 采用上方的列子
pt_error = df.pivot_table(index=['A','B'],columns='C',aggfunc={'D':np.sum},fill_value=0,margins=True)
print(pt_error) # KeyError: 'E'
# 这里是对整个df进行透视,但是在df中还包含了E列,在pt_error中没用到,报错
# 试验1: 这里pt_error用到E之后,不报错
pt_error = df.pivot_table(index=['A','B'],columns='C',aggfunc={'D':np.sum,'E':np.sum},fill_value=0,margins=True)
print(pt_error)
'''
C large small All large small All
A B
bar one 4 5 9 6 8 14
two 7 6 13 9 9 18
foo one 4 1 5 9 2 11
two 0 6 6 0 11 11
All 15 18 33 24 30 54
'''
# 试验2: 对df丢掉E后进行函数调用
pt_error = df.drop('E',axis=1).pivot_table(index=['A','B'],columns='C',aggfunc={'D':np.sum},fill_value=0,margins=True)
print(pt_error)
'''
C large small All
A B
bar one 4 5 9
two 7 6 13
foo one 4 1 5
two 0 6 6
All 15 18 33
'''
# 所以margins会对整个函数调用前的df进行操作,避免报错keyError,需要对df进行drop
3. 只需要最后行上的margins
# 不指定columns
pt = df.pivot_table(index=['A','B'],values=['D','E'],aggfunc={'D':max,'E':min},margins=True,fill_value=0)
print(pt)
'''
D E
A B
bar one 5 6
two 7 9
foo one 2 2
two 3 5
All 7 2
'''